A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks

인테져드림 2023. 3. 1. 12:31

논문 소개

여기 논문에서는 training samples (in-distribution) 와 statistically(out-of-distribution) 그리고 adversarially 하게 다른 samples 를 모두 detection 하는 방법을 (여태 둘다 하는 방법은 없었다고 함) 기존 흔히 사용했던 posterior softmax classification 방식을 대신하여 feature space 내 distance-based classification (specifically, "Mahalonobis distance") 방식으로 처음 소개했다.

URL : https://proceedings.neurips.cc/paper/2018/file/abdeb6f575ac5c6676b747bca8d09cc2-Paper.pdf (NeurIPS, 2018)

해당 논문을 통해 얻고자 하는 것

Out-of-distribution 을 detection 하는 방법이 정말 다양한데, 그 중 하나인 distance 기반의 방식은 구체적으로 어떤 식으로 작동하는지를 이해하기 위해 해당 논문을 읽게 되었다. 그래서 "어떻게 distance 를 계산해 OOD samples 를 detection 하는지" 가 해당 논문에서 얻고자 하는 것이고, 그것을 중심으로 논문 리뷰를 할 것이다.

논문 리뷰 스타트

1. Motivation

Deep neural networks(DNN) 의 real-world 내 deployment 에 중요한 것은 "well-calibrated predictive uncertainty" 다. 이건 DNN 모델이 "정확하게 (데이터 혹은 모델에서) uncertainty 한 상황이라는 걸 예측할 줄 아는 것" 을 뜻한다.

하지만 여기선 data uncertainty 만을 다루고 predictive uncertainty ≒ detecting abnormal samples 로 간주할 수 있다. 그리고 abnormal samples 는 OOD samples(def. semantic shift 한 samples 로 training classes 안에 classify 할 수 없는 것들), adversarial samples(def. classification 하기 어렵게 더럽혀진 것들(e.g. noise 추가)) 모두를 포함한다.

그런데 이 당시 기준으로 여태 OOD & adversarial samples 를 같이 detection 하는 universal detector 가 없었다고 한다.

2. Method

2.1. Overview

1. 시작에 앞서, 여기서 제안하는 universal abnormal sample detection 방법의 추상적인 아이디어는 다음과 같다.

"Test samples 에 대한 feature space 위에서의 probability density 를 잘 estimate 하고 나서 generative distance-based classification 방식을 통해 문제를 해결한다."

또 가설 하나를 제시한다.

(1) 가설: Pre-trained features 는 class-conditional Gaussian distribution 에 well-fit 되었을 것이다.

(2) 근거: 해당 features 를 통한 softmax classification 계산이 GDA* 기반의 posterior distribution 계산과 동일하기 때문이다.

* GDA: Gaussian Discriminant Analysis, which wants to fit parameters of {mu, sigma, phi}.

+나중에 위 근거를 구체화해서 어떻게 가설을 뒷받침했는지 알아본다 (2.2 가설 증명).

2. (가설이 참임을 증명했다고 하고) 어떻게 가설을 활용해 abnormal sample detection 하는지 알아보자.

0. 가설에 의해 pre-trained features f(x) 를 GDA model 에 적용 가능하다(= 각 클래스마다 서로 다른 정규분포를 구하고, 이들을 서로 비교해 클래스를 구분함).

1. GDA model 의 paramters {mu, sigma} 를 MLE 로 estimate 하면 f(x) 는 다음 multivariate Gaussian distribution 을 follow 한다는 걸 알 수 있다.

f(x) | y=i ~ N(mu_i, sigma)*

*(a) mu_i: class label i 일 때, training samples 의 mean

*(b) sigma: 모든 class 에서 covariance matrix 가 동일하다고 가정해 subscript 가 없음

2. 클래스별로 f(x) 가 follow 하는 multivariate Gaussian distribution 의 parameters {mu, sigma} 를 가지고 Mahalanobis distance 를 계산한다. 이 때, distance 가 최소인 클래스는 posterior distribution p(y | x) 이 최대인 클래스와 동일하다*.

*어떻게 서로가 동일한지는 여기에서 "Mahalanobis distance 실제 유도" 부분을 참고하면 된다.

3. Mahalanobis distance 최소값은 confidence score 로써 input x 가 어떤 클래스일지 가장 자신할 때 어느 정도로 자신하는지를 나타내고, 더 나아가 자신하는 정도가 기준치(threshold) 에 못 미쳤을 때 abnormal sample 로 판단한다.

3. 자 그러면, 추상적으로 제시한 아이디어가 어떻게 실질적으로 적용되었는지 알아보자.

(1) "Feature space 위에서의 probability density 를 잘 estimate 하고 나서" → Pre-trained features 를 GDA model 에 적용해 클래스별로 어떤 multivariate Gaussian distribution 에 follow 할지 estimation 한다.

(2) "Generative distance-based classification 방식을 통해 문제를 해결한다" → 각 클래스별 multivariate Gaussian distribution 의 parameters {mu, sigma} 를 가지고 Mahalanobis distance 를 계산하고 최소값을 confidence score 로 사용해 input 의 abnormal sample 여부를 판단한다.

2.2. 가설 증명

1. Softmax classification

- f(x): pre-trained features

- {w_c, b_c}: softmax classifier's parameters

Class label c 에 대한 posterior probability 를 pre-trained features f(x) 에 대한 softmax classification 으로 구한다.

2. GDA

- Assumption

a. h(x) in R^n

b. P(h(x) | y) is a multivariate Gaussion distribution (i.e. h(x) | y=c ~ N(mu_c, sigma))

P(h(x) | y=c)·P(y=c) / P(h(x)) = P(y=c | h(x)) (1)

P(y=c | h(x)) = P(y=c | x) (2)

(1) Bayes rule 에 의해 posterior distribution P(y=c | h(x)) 을 구할 수 있다.

(2) 각 input x 는 서로 다른 h(x) 로 표현되기 때문에 P(y=c | x) 로 나타내도 문제 없고, 이는 이전에 f(x) 에 대한 softmax classification 과 동일하다.

*위에서 언급한 가설의 근거는 1번 softmax classification 계산과 2번 posterior distribution 계산이 서로 동일하다는 것이다.

마지막으로, softmax classification 과 GDA 에서 서로 동일한 P(y=c | x) 를 나타내기 때문에 "h(x) | y=c ~ N(mu_c, sigma)" 라는 GDA's assumption 을 pre-trained features f(x) 에도 똑같이 적용할 수 있다. 그러므로 pre-trained features f(x) 가 class-conditional Gaussian distribution 에 well-fit 되었을 거란 가설이 참이라 말할 수 있다.

3. Experimental supports

t-SNE 에서 구한 CIFAR-10 test samples 의 feature embeddings 를 visualise 한 것이다. 각 class 별로 서로 다른 색깔을 칠한 건데 서로 잘 구분되어 있는 것을 통해 class-conditional Gaussian distribution 이 pre-trained features 에 well-fit 하다는 걸 알 수 있다.

2.3. Mahalonobis distance-based confidence score

클래스별 mean(mu), covariance(sigma) 를 training samples 를 가지고 estimation

Pre-trained features f(x) 가 각 클래스별로 multivariate Gaussian distribution 을 follow 한다는 정보를 통해, 각 클래스별 training samples 로 mean, covariance 를 구할 수 있다.

위 수식는 Mahalanobis distance 가 가장 작을 때의 클래스 c 를 리턴한다. 이는 test input x 가 가장 가까운 클래스가 무엇인지를 리턴하고, 해당 클래스는 x 에 대한 예측 클래스 label 에 해당한다.

Mahalanobis distance 를 사용해 어떻게 confidence score 를 계산하는지를 나타낸다. 최소 Mahalanobis distance 에 minus 를 추가한 것과 동일하고 이는 미리 정한 기준치(threshold) 와 비교하여 test input x 가 abnormal sample 인지 여부를 판단하는데 사용된다.

*이전에는 흔히 output space 에서 posterior softmax distribution 을 abnormal samples 를 characterise 하는데 사용했는데 여기선 representation space 에서 클래스별 probability density 를 사용했다. 이것의 장점은 특정 label 에 over-fitting 되는, 이로 인해 틀린 label 에 over-confident 할 수 있는 이전 방식의 문제를 고려하지 않아도 된다는 점이다.

*그리고 representation space 에서 abnormal samples 를 detection 하는 접근도 이전에 있었는데 모두 유클리드 거리 계산법을 사용했다.

서로 다른 OOD detector 간에 performance 비교: Softmax vs. Euclidean vs. Mahalanobis

위 (c) 그림은 ROC curve metric 을 가지고 서로 다른 OOD detector 를 비교한 것을 보여준다. Mahalanobis (blue) 가 다른 detector 에 비해 curve 아래 넓이가 더 큰 것으로 보이는데, 이는 abnormal samples 를 예측이 맞았을 때가 예측이 틀렸을 때보다 더 확신에 찬 예측을 했음을 나타낸다.

2.4. Input pre-processing for calibration

보통 calibration 을 생각하면 softmax score over classes 를 "고르게" 만들어 label-overfitting 문제를 완화시키고 wrong prediction 에 over-confident 하는 현상을 줄인다고 생각한다. 그런데 여기 calibration 방식은 좀 다르다. 예측된 class label 에 한해서 softmax score 를 increase 하겠다는 의도다. (= Mahalanobis distance 를 minimise 한다.) "예측에 더 확신을 갖게 한다" 라는 모토로 등장한 것으로 보인다. 이와 동시에 만약 예측이 잘못되었을 경우에 예측에 확신을 갖는건 위험하지 않을까하는 생각이 든다.

1. 어떻게 softmax score 를 increase 하는가? (= Mahalanobis distance 를 minimise 하는가?)

Input x 에 변동을 주어 Mahalanobis distance -M(x) 를 minimise 한다.

Input x 에 small perturbation 을 부여해 pre-processing 한다. 위 그림과 같이 gradient descent 원리와 동일하게 Mahalanobis distance -M(x) 가 최소가 되는 방향으로 input x 를 이동시킨다는 것이다.

Input x 에 small perturbation 을 부여하는 것을 수식화한 것이다.

여기서 -M(x) 의 {mu, sigma} 는 fixed constant 로, input x 는 parameter 로 간주한다. 그리고 x 에 대한 -M(x) 의 편미분을 계산한다. 왜냐하면 여기서 목적은 input x 를 이동시켜 -M(x) 의 최소값을 찾는 것이기 때문이다.

- sign 함수: 1 (if x > 0), 0 (if x=0), -1 (if x < 0)

2. 예측에 확신을 갖는게 어쩌면 위험하지 않을까?

여기선 abnormal sample 여부 예측 결과가 맞든 틀리든 상관없이 input pre-processing 을 부여하는게 문제 없다고 말한다. Input x 가 ID data 이냐, OOD data 이냐에 따라 gradient norm 이 서로 달라 input pre-processing effect 에 차이가 있을 거라고 뒷받침한다. 그리고 이로 인해 ID data 에 대한 confidence score 와 OOD data 에 대한 confidence score 간에 차이가 극명해져 abnormal sample detection 이 좀 더 효율적으로 가능하다고 말한다.

OOD data (green) 일 때와 ID data (red) 일 때의 input pre-processing 의 effect 차이를 시각화함.

내용 출처:

- https://arxiv.org/pdf/1706.02690.pdf

2.5. Class-incremental learning

언제든 new class 의 데이터가 등장해도 새로운 mean, covariance 를 계산하고 기존 covariance 를 업데이트할 수 있어 class-incremental learning 이 가능하다.

3. Experimental results

1. OOD detection

서로 다른 데이터셋을 사용해 ID, OOD 데이터를 구분했고, 서로 다른 confidence score 계산 방법에서 OOD detection performance 를 비교했다. (Baseline vs. ODIN vs. Mahalanobis)

- Baseline: maximum value of the posterior softmax distribution 으로 confidence score 계산

- ODIN: Baseline + input pre-processing by adding small perturbations + output processing by temperature scaling

Extreme settings 아래 OOD detection 를 비교해 얼마나 Mahalanobis distance-based confidence score 계산 방법이 robust 한지를 보여줌.

2. Adversarial attack detection

서로 다른 adversarial attack detectors 에서 adversarial attack detection performance 를 비교한 결과다. (KD+PU vs. LID vs. Mahalanobis) 그리고 "Detection of known attack" 은 all normal and adversarial pairs 가 주어진 setting 에서 detection 을 확인하고 "Detection of unknown attack" 은 simple attacks 가 먼저 주어져 학습에 사용되고 complex attacks 에 generalise to 하는지를 확인한다.

- KD+PU: Logistic regression detector based on the combination of kernel density(KD) and predictive uncertainty(PU)

- LID: Local intrinsic dimensionality

개인적으로 느낀점: 사실 Mahalanobis distance 로 OOD detection 하는걸 알고 있었던 터라 해당 논문의 contribution 에 큰 감흥이 없었다. 이게 다인가? 하는 조금 허무한 감정도 들었다. 또 한편으로 논문 출판된 시기를 감안해 이전에는 해당 아이디어를 전혀 생각하지 못했다면 또 충분히 의미가 있을 수 있겠다 생각했다. 이전에도 든 생각이었는데 Mahalanobis 는 평균과의 거리를 계산하는 데 있어 유클리드 계산법과 동일한 방법을 사용한다. 따라서 나는 features 가 high-dimensional 일 때 결국 Mahalanobis 거리 계산법이 유클리드와 마찬가지로 문제가 생긴다고 생각한다. 그래서 high-dimensional features 에 적합한 새로운 distance 계산 방법이 필요하지 않을까 생각한다.

이미지 출처:

- https://proceedings.neurips.cc/paper/2018/file/abdeb6f575ac5c6676b747bca8d09cc2-Paper.pdf

Last update: ~~2023. 03. 01.~~

2023. 03. 25. 01:40pm(1)

Written by Taejun Lim

(1) [2.2 가설 증명]에서 class-conditional multivariate Gaussion distribution 를 따르는 함수를 f'(x) 로 가정했는데 pre-trained features f(x) 의 derivative 와 혼동할 수 있을 것 같아 h(x) 로 표기를 변경했다.