class: center, middle, inverse, title-slide .title[ # Anomaly Detection in Image Time Series Using Explainable AI (XAI) ] .author[ ###
Priyanga Dilini Talagala
] .institute[ ###
43rd International Symposium on Forecasting
28-07-2023
] .date[ ###
Slides available at retinalab.netlify.app
<i class="fas fa-globe faa-ring animated faa-fast " style=" color:white;"></i>
Slides created via the R package xaringan
] --- class: inverse, center ## Motivation: Forest Coverage of Rondônia, Brazil <img src="fig/1_deforestation.gif" alt="Alt Text" style="width: 60%; height: 10%;"> Allows to identify deforested areas and assess the rate of forest lost --- class: inverse, middle, center ## Motivation: Undersea Volcano Eruption near Tonga <img src="fig/2_volcano.gif" alt="Alt Text" style="width: 40%; height: 40%;"> Allows us to track the eruption plumes and identify areas at risk <!-- class: inverse, middle, center <!-- https://www.sciencealert.com/stunning-images-from-space-reveal-the-extent-of-australia-s-bushfire-crisis--> <!--## Motivation: Australia's Bushfire in 2020 <img src="fig/3_bushfire.gif" alt="Alt Text" style="width: 60%; height: 50%;"> Monitoring and assessing the spread and intensity of bushfire--> --- class: inverse, center .pull-left[ ### Forest Coverage of Rondônia, Brazil <img src="fig/1_deforestation.gif" alt="Alt Text" height="300"> ].pull-right[ ### Undersea Volcano Eruption near Tonga <img src="fig/2_volcano.gif" alt="Alt Text" style="width: 60%; height: 60%;"> ] -- Exhibit a distinct typical behaviour that remains relatively static over a large period at the initial phase and then they display usual behaviour that deviates from the usual static patterns. --- class: inverse, center, middle ## What is an anomaly ? - By definition, anomalies are rare in comparison to a system's typical behaviour. -- - We define an anomaly as an observation that is very unlikely given the forecast distribution. --- ### Aim - To develop a novel framework that detects and interprets anomalies in image streams. -- ### Main Assumptions - Anomalies show a significant deviation from the typical behavior of a given system. -- - A representative dataset of the system’s typical behavior is available to define a model for the typical behavior of the image streams generated by a given system. --- ### Proposed Algorithm - **Off-line Phase**: Forecast a data driven anomalous threshold for system's typical behavior -- - **On-line Phases**: Testing newly arrived data using the forecasted anomalous threshold --- ### Off-line Phase <img src="fig/4_flow1.png" width="90%" style="display: block; margin: auto;" /> --- ### Off-line Phase <img src="fig/5_flow2.png" width="90%" style="display: block; margin: auto;" /> --- ### Off-line Phase <img src="fig/6_flow3.png" width="90%" style="display: block; margin: auto;" /> - Improve the efficiency and performance of the subsequent steps --- ### Off-line Phase <img src="fig/7_flow4.png" width="90%" style="display: block; margin: auto;" /> --- ### Off-line Phase <img src="fig/8_flow5.png" width="90%" style="display: block; margin: auto;" /> --- ### Off-line Phase <img src="fig/9_flow6.png" width="90%" style="display: block; margin: auto;" /> --- ### Off-line Phase - Anomalous threshold calculation <img src="fig/10_flow7.png" width="90%" style="display: block; margin: auto;" /> --- ### Off-line Phase - Anomalous threshold calculation <img src="fig/11_flow8.png" width="90%" style="display: block; margin: auto;" /> --- ### Off-line Phase - Anomalous threshold calculation <img src="fig/12_flow9.png" width="90%" style="display: block; margin: auto;" /> --- .pull-left[ #### Anomalous threshold calculation ##### Spacing theorem (Weissman, 1978) - Let `\(X_{1}, X_{2}, ..., X_{T}\)` be a sample from a distribution function `\(F\)`. - Let `\(X_{1:T} \geq X_{2:T} \geq ... \geq X_{T:T}\)` be the order statistics. - The available data are `\(X_{1:T}, X_{2:T}, ..., X_{k:T}\)` for some fixed `\(k\)`. - Let `\(D_{i,T} = X_{i:T} - X_{i+1:T},\)` `\((i = 1,2,..., k)\)` be the spacing between successive order statistics. ].pull-right[ - If `\(F\)` is in the maximum domain of attraction of the Gumbel distribution, then the spacing `\(D_{i,n}\)` are asymptotically independent and exponentially distributed with mean proportional to `\(i^{-1}\)`. <img src="fig/19_EVT.png" width="100%" style="display: block; margin: auto;" /> ] --- class: inverse, center, middle <img src="fig/paper.png" width="100%" style="display: block; margin: auto;" /> --- ### On-line Phase <img src="fig/13_flow10.png" width="90%" style="display: block; margin: auto;" /> --- ### On-line Phase <img src="fig/14_flow11.png" width="100%" style="display: block; margin: auto;" /> --- ### On-line Phase <img src="fig/15_flow12.png" width="100%" style="display: block; margin: auto;" /> --- ### On-line Phase <img src="fig/16_flow13.png" width="100%" style="display: block; margin: auto;" /> --- <img src="fig/22_output.png" width="100%" style="display: block; margin: auto;" /> --- <img src="fig/23_output.png" width="100%" style="display: block; margin: auto;" /> --- <img src="fig/24_output.png" width="100%" style="display: block; margin: auto;" /> --- ## Performance Evaluation | | Conventional ML Framework | Xception | VGG16 | DenseNet121 | ResNet50 | InceptionV3 | |:-----------:|:----------------------:|:--------:|:------:|:-----------:|:--------:|:-----------:| | Accuracy | 79.6% | 70.09% | 79.57% | 79.57% | 92.92% | **99.60%** | | Sensitivity | 1.0 | 0.958 | 1.0 | 1.0 | 1.0 | **1.0** | | Specificity | 0.388 | 0.188 | 0.388 | 0.388 | 0.788 | **0.988** | | PPV | 0.77 | 0.702 | 0.765 | 0.765 | 0.904 | **0.994** | | NPV | 1.0 | 0.691 | 1.0 | 1.0 | 1.0 | **1.0** | | F1-Score | 0.87 | 0.810 | 0.867 | 0.867 | 0.950 | **0.997** | | G-Mean | 0.62 | 0.424 | 0.623 | 0.623 | 0.888 | **0.994** | --- #### Conventional Machine Learning Framework <img src="fig/25_ML.png" width="95%" style="display: block; margin: auto;" /> --- ## Performance Evaluation | | Conventional ML Framework | Xception | VGG16 | DenseNet121 | ResNet50 | InceptionV3 | |:-----------:|:----------------------:|:--------:|:------:|:-----------:|:--------:|:-----------:| | Accuracy | 79.6% | 70.09% | 79.57% | 79.57% | 92.92% | **99.60%** | | Sensitivity | 1.0 | 0.958 | 1.0 | 1.0 | 1.0 | **1.0** | | Specificity | 0.388 | 0.188 | 0.388 | 0.388 | 0.788 | **0.988** | | PPV | 0.77 | 0.702 | 0.765 | 0.765 | 0.904 | **0.994** | | NPV | 1.0 | 0.691 | 1.0 | 1.0 | 1.0 | **1.0** | | F1-Score | 0.87 | 0.810 | 0.867 | 0.867 | 0.950 | **0.997** | | G-Mean | 0.62 | 0.424 | 0.623 | 0.623 | 0.888 | **0.994** | --- class: inverse, middle, center # XAI role in the framework --- ### XAI role in the framework <img src="fig/17_flow14.png" width="70%" style="display: block; margin: auto;" /> - Focus: Produce interpretations in local scope as post-hoc explanations for the black-box feature extraction models. --- ### XAI role in the framework <img src="fig/18_flow15.png" width="70%" style="display: block; margin: auto;" /> - Freeze the feature layer - Add a classification layer, on top of the frozen feature layer, and train the model on the new labeled data set --- ## Explainable AI Module - LIME <img src="fig/26_Lime.png" width="90%" style="display: block; margin: auto;" /> --- ## Explainable AI Module - SHAP <img src="fig/27_Shap.png" width="90%" style="display: block; margin: auto;" /> --- .pull-left[ ## Research Gap - Binary classification problem (Low Generalizability) - Ignore the inter-dependency between the images - Manual anomalous thresholds and unrealistic assumptions - focus on the classification task, lack of explainability - Class imbalance problem ].pull-right[ ## Main Contribution - One class classification problem (High Generalizability) - Integrated computer vision and time series forecasting - A data driven anomalous threshold using EVT theory - Novel framework that integrated computer vision, time series forecasting and explainable AI - One class classification problem ] --- ## What Next? - Focus: To interpret the influence of different features on the final output layer's predictions -- - Assume that the impact of the newly added hidden layers is negligible for interpretability, as transfer learning-based feature extractors are already expected to capture relevant information -- - The internal structure of a model, including its hidden layers, plays a significant role in determining the model's behavior and predictions -- - Sensitivity analysis - To identify how different configurations of the hidden layer affect the model's predictions -- - From perturbation-based techniques to **backpropagation based XAI technique** such as Layer-wise Relevance Propagation (LRP) to understand the internal representations and computations within the network --- class: center, middle, inverse # Thank you
priyangad@uom.lk
pridiltal
https://retinalab.netlify.app/ </br> (Slides available) This work was supported in part by RETINA research lab funded by the OWSD, a program unit of United Nations Educational, Scientific and Cultural Organization (UNESCO). --- --- <img src="fig/time.png" width="90%" style="display: block; margin: auto;" /> --- ## Challenges of Traditional Machine Learning Module - Extraction of meaningful features is challenging. - Curse of Dimensionality and class imbalance. - Compromised computational performance during feature extraction. - Dynamic nature of the anomalous class --- #### Image Pre-processing - Crop Images: - To clear out distractions such as the image series description that was present at the bottom of the image dataset. - Normalize Images: - To ensure that each pixel has a similar data distribution to convergence faster while training the network. - Contrast Stretching function: - Different CNN based feature extractions architectures are trained on pre-specified resolutions. - To achieve optimal performance and to reduce heavy computation while increasing the processing speed, resized to fit the CNN architecture’s input requirement --- #### Traditional Machine Learning Module - Feature Extraction - First Order Statistical Descriptors: - To extract texture features present in the image and are calculated by inputting each pixel of the image sequentially. - Features: Mean, Average Contrast, Skewness, and Kurtosis. - Gabor Wavelet Transformation: - To extract Gabor Wavelet features from the image series, and it uses bandpass filters which are called Gabor filters. - Edge Detection Method: - Detect the edges in the images. - Canny Edge Detector, Edge Roberts detector, Edge Sobel detector, Edge Scharr detector, and the Edge Prewitt detector. --- #### Traditional Machine Learning Module - Feature Extraction - GLCM (Gray Level Coocurrence Matrix) Method: - Extraction of texture-based second order statistical features from an image. - Considers the relationships among three or more pixels. - Features: Dissimilarity, Correlation, Homogeneity, Energy, Contrast and Entropy. --- ## Dimension Reduction: - Added a Global Max Pooling 2D layer => single feature vector for each image in the dataset. Get the average of feature vector for each image => forms a univariate feature series. --- ### Backpropagation-Based Techniques: - Backpropagation-based techniques, such as Layer-wise Relevance Propagation (LRP) mentioned earlier, **focus on understanding the importance of individual neurons or features within the neural network** - Backpropagation-based techniques provide a more detailed and fine-grained understanding of the model's decision process, as they consider the internal representations and computations within the network --- --- - VGG19: Introduced in 2014 by the Visual Geometry Group (VGG) at the University of Oxford. - ResNet-50: Introduced in 2015 by Microsoft Research as part of the ResNet series. - InceptionV3: Introduced in 2015 as an upgrade to the original Inception architecture, also known as GoogLeNet. - Xception: Introduced in 2016 by Google Research as an extension of the Inception architecture. - DenseNet-121: Introduced in 2016 by the authors of DenseNet at Cornell University. --- - Xception: Xception is an extension of Inception with depthwise separable convolutions, improving efficiency and achieving competitive performance on various tasks. - VGG19: VGG19 is a deep CNN architecture with 19 layers, utilizing 3x3 convolutional filters and max-pooling, known for its simplicity and effectiveness. - DenseNet-121: DenseNet-121 is a CNN architecture with dense connections between layers, enabling direct information flow, feature reuse, and strong performance with fewer parameters. - ResNet-50: ResNet-50 is a deep CNN architecture with 50 layers, incorporating residual blocks to address the vanishing gradient problem, leading to improved training and performance. - InceptionV3: InceptionV3 is a CNN architecture with multi-scale feature extraction, utilizing parallel convolutional layers of different sizes and 1x1 convolutions for efficiency.