Anomaly Detection in Image Time Series Using Explainable AI (XAI)

.title[
# Anomaly Detection in Image Time Series Using Explainable AI (XAI)
]
.author[
### Priyanga Dilini Talagala 
]
.institute[
### 43rd International Symposium on Forecasting 28-07-2023
]
.date[
### Slides available at retinalab.netlify.app Slides created via the R package xaringan
]

---

## Motivation: Forest Coverage of Rondônia, Brazil

Allows to identify deforested areas and assess the rate of forest lost

---

## Motivation: Undersea Volcano Eruption near Tonga

Allows us to track the eruption plumes and identify areas at risk

<!--

<!--## Motivation: Australia's Bushfire in 2020

Monitoring and assessing the spread and intensity of bushfire-->
---

### Forest Coverage of Rondônia, Brazil

].pull-right[

### Undersea Volcano Eruption near Tonga
<img src="fig/2_volcano.gif" alt="Alt Text" style="width: 60%; height: 60%;">

]

Exhibit a distinct typical behaviour that remains relatively static over a large period at the initial phase and then they display usual behaviour that deviates from the usual static patterns.

---

## What is an anomaly ?

- By definition, anomalies are rare in comparison to a system's typical behaviour.

- We define an anomaly as an observation that is very unlikely given the forecast distribution.

---

### Aim

- To develop a novel framework that detects and interprets anomalies in image streams.

### Main Assumptions

- Anomalies show a significant deviation from the typical behavior of a given system.

- A representative dataset of the system’s typical behavior is available to define a model for the typical behavior of the image streams generated by a given system.

---

### Proposed Algorithm

- **Off-line Phase**: Forecast a data driven anomalous threshold for system's typical behavior

- **On-line Phases**: Testing newly arrived data using the forecasted anomalous threshold

---
### Off-line Phase

---

### Off-line Phase

---

### Off-line Phase
<img src="fig/6_flow3.png" width="90%" style="display: block; margin: auto;" />

- Improve the efficiency and performance of the subsequent steps
---

### Off-line Phase
<img src="fig/7_flow4.png" width="90%" style="display: block; margin: auto;" />

---

### Off-line Phase
<img src="fig/8_flow5.png" width="90%" style="display: block; margin: auto;" />

---

### Off-line Phase
<img src="fig/9_flow6.png" width="90%" style="display: block; margin: auto;" />

---

### Off-line Phase - Anomalous threshold calculation
<img src="fig/10_flow7.png" width="90%" style="display: block; margin: auto;" />

---

### Off-line Phase - Anomalous threshold calculation
<img src="fig/11_flow8.png" width="90%" style="display: block; margin: auto;" />

---

### Off-line Phase - Anomalous threshold calculation
<img src="fig/12_flow9.png" width="90%" style="display: block; margin: auto;" />
---

#### Anomalous threshold calculation
##### Spacing theorem (Weissman, 1978)

- Let `\(X_{1}, X_{2}, ..., X_{T}\)` be a sample from a distribution function `\(F\)`.

- Let `\(X_{1:T} \geq X_{2:T} \geq ... \geq X_{T:T}\)` be the order statistics.

- The available data are `\(X_{1:T}, X_{2:T},  ..., X_{k:T}\)` for some fixed `\(k\)`.

- Let `\(D_{i,T} = X_{i:T} - X_{i+1:T},\)` `\((i = 1,2,..., k)\)` be the spacing between successive order statistics.

].pull-right[

- If `\(F\)` is in the maximum domain of attraction of the Gumbel distribution, then the spacing `\(D_{i,n}\)` are asymptotically independent and exponentially distributed with mean proportional to `\(i^{-1}\)`.

]
---
class: inverse, center, middle

---

### On-line Phase
<img src="fig/13_flow10.png" width="90%" style="display: block; margin: auto;" />

---

### On-line Phase
<img src="fig/14_flow11.png" width="100%" style="display: block; margin: auto;" />

---

### On-line Phase
<img src="fig/15_flow12.png" width="100%" style="display: block; margin: auto;" />

---

### On-line Phase
<img src="fig/16_flow13.png" width="100%" style="display: block; margin: auto;" />

---
<img src="fig/22_output.png" width="100%" style="display: block; margin: auto;" />
---
<img src="fig/23_output.png" width="100%" style="display: block; margin: auto;" />
---
<img src="fig/24_output.png" width="100%" style="display: block; margin: auto;" />

---

## Performance Evaluation

|             | Conventional  ML Framework | Xception |  VGG16 | DenseNet121 | ResNet50 | InceptionV3 |
|:-----------:|:----------------------:|:--------:|:------:|:-----------:|:--------:|:-----------:|
|   Accuracy  |          79.6%         |  70.09%  | 79.57% |    79.57%   |  92.92%  |    **99.60%**   |
| Sensitivity |           1.0          |   0.958  |   1.0  |     1.0     |    1.0   |     **1.0**     |
| Specificity |          0.388         |   0.188  |  0.388 |    0.388    |   0.788  |    **0.988**    |
|     PPV     |          0.77          |   0.702  |  0.765 |    0.765    |   0.904  |    **0.994**    |
|     NPV     |           1.0          |   0.691  |   1.0  |     1.0     |    1.0   |     **1.0**     |
|   F1-Score  |          0.87          |   0.810  |  0.867 |    0.867    |   0.950  |    **0.997**    |
|    G-Mean   |          0.62          |   0.424  |  0.623 |    0.623    |   0.888  |    **0.994**    |

---
#### Conventional Machine Learning Framework
<img src="fig/25_ML.png" width="95%" style="display: block; margin: auto;" />

---
## Performance Evaluation

---
class: inverse, middle, center

# XAI role in the framework
---
### XAI role in the framework

- Focus: Produce interpretations in local scope as post-hoc explanations for the black-box feature extraction models.
---
### XAI role in the framework

- Freeze the feature layer

- Add a classification layer, on top of the frozen feature layer, and train the model on the new labeled data set

---
## Explainable AI Module - LIME

---

## Explainable AI Module - SHAP

---

## Research Gap

- Binary classification problem (Low Generalizability)

- Ignore the inter-dependency between the images

- Manual  anomalous thresholds and unrealistic assumptions

- focus on the classification task, lack of explainability

- Class imbalance problem

].pull-right[

## Main Contribution

-  One class classification problem  (High Generalizability)

- Integrated computer vision and time series forecasting

- A data driven anomalous threshold using EVT theory

- Novel framework that integrated computer vision, time series
forecasting and explainable AI

- One class classification problem 
]

---

## What Next?

- Focus: To interpret the influence of different features on the final output layer's predictions

- Assume that the impact of the newly added hidden layers is negligible for interpretability, as transfer learning-based feature extractors are already expected to capture relevant information
--

- The internal structure of a model, including its hidden layers, plays a significant role in determining the model's behavior and predictions

- Sensitivity analysis - To identify how different configurations of the hidden layer affect the model's predictions

- From  perturbation-based techniques to  **backpropagation based XAI technique** such as Layer-wise Relevance Propagation (LRP)  to understand the internal representations and computations within the network

---
class: center, middle, inverse

# Thank you

priyangad@uom.lk

pridiltal

https://retinalab.netlify.app/ 
(Slides available)

This work was supported in part by RETINA research lab funded by the OWSD, a program unit of United Nations Educational, Scientific and Cultural Organization (UNESCO).

---

---
<img src="fig/time.png" width="90%" style="display: block; margin: auto;" />

---

## Challenges of Traditional Machine Learning Module

- Extraction of meaningful features is challenging.

- Curse of Dimensionality and class imbalance.

- Compromised computational performance during feature extraction.

- Dynamic nature of the anomalous class

---

#### Image Pre-processing

- Crop Images:

- To clear out distractions such as the image series description that was present at the bottom of the image dataset.

- Normalize Images:

- To ensure that each pixel has a similar data distribution to convergence faster while training the network.

- Contrast Stretching function:

- Different CNN based feature extractions architectures are trained on pre-specified resolutions.

- To achieve optimal performance and to reduce heavy computation while increasing the processing speed, resized to fit the CNN architecture’s input requirement
---

#### Traditional Machine Learning Module - Feature Extraction

- First Order Statistical Descriptors:
   - To extract texture features present in the image and are calculated by inputting each pixel of the image sequentially.
   - Features: Mean, Average Contrast, Skewness, and Kurtosis.

- Gabor Wavelet Transformation:

- To extract Gabor Wavelet features from the image series, and it uses bandpass filters which are called Gabor filters.

- Edge Detection Method:
   
   - Detect the edges in the images.
   - Canny Edge Detector, Edge Roberts detector, Edge Sobel detector, Edge Scharr detector, and the Edge Prewitt detector.
---

#### Traditional Machine Learning Module - Feature Extraction
- GLCM (Gray Level Coocurrence Matrix) Method:

- Extraction of texture-based second order statistical features from an image.
  - Considers the relationships among three or more pixels. 
  - Features: Dissimilarity, Correlation, Homogeneity, Energy, Contrast and Entropy.

---

## Dimension Reduction:

- Added a Global Max Pooling 2D layer => single feature vector for each image in the dataset.

Get the average of feature vector for each image => forms a univariate feature series.

---

### Backpropagation-Based Techniques:

- Backpropagation-based techniques, such as Layer-wise Relevance Propagation (LRP) mentioned earlier, **focus on understanding the importance of individual neurons or features within the neural network**

- Backpropagation-based techniques provide a more detailed and fine-grained understanding of the model's decision process, as they consider the internal representations and computations within the network

--- 
---

- VGG19: Introduced in 2014 by the Visual Geometry Group (VGG) at the University of Oxford.

- ResNet-50: Introduced in 2015 by Microsoft Research as part of the ResNet series.

- InceptionV3: Introduced in 2015 as an upgrade to the original Inception architecture, also known as GoogLeNet.

- Xception: Introduced in 2016 by Google Research as an extension of the Inception architecture.

- DenseNet-121: Introduced in 2016 by the authors of DenseNet at Cornell University.

---

- Xception: Xception is an extension of Inception with depthwise separable convolutions, improving efficiency and achieving competitive performance on various tasks.

- VGG19: VGG19 is a deep CNN architecture with 19 layers, utilizing 3x3 convolutional filters and max-pooling, known for its simplicity and effectiveness.

- DenseNet-121: DenseNet-121 is a CNN architecture with dense connections between layers, enabling direct information flow, feature reuse, and strong performance with fewer parameters.

- ResNet-50: ResNet-50 is a deep CNN architecture with 50 layers, incorporating residual blocks to address the vanishing gradient problem, leading to improved training and performance.

- InceptionV3: InceptionV3 is a CNN architecture with multi-scale feature extraction, utilizing parallel convolutional layers of different sizes and 1x1 convolutions for efficiency.