목표 검출 모델의 성능 평가를 위한 mAP 계산 방법

이 문서에서는 목표 검출 모델의 정확도를 평가하기 위해 평균 정밀도 (mAP)를 어떻게 계산하는지 설명합니다. mAP는 예측 박스와 실제 박스 간의 일치 정도를 기반으로 점수를 산출하며, 점수가 높을수록 모델의 검출 정확도가 뛰어납니다.

이전에 혼동 행렬, 정확도, 정밀도 및 재현율에 대해 살펴보았습니다. 또한 scikit-learn 라이브러리를 사용해 이 지표들을 계산하는 방법도 익혔습니다. 이번에는 이러한 개념을 확장하여 정밀도와 재현율을 활용해 mAP를 계산하는 과정을 자세히 다룹니다.

1. 예측 점수에서 클래스 레이블 추출

모델은 각 객체에 대해 실수형 예측 점수를 반환합니다. 이 점수를 기반으로 클래스 레이블(예: "객체", "배경")로 변환하려면 임계값을 설정해야 합니다. 점수가 임계값 이상이면 긍정 클래스로 분류하고, 그렇지 않으면 부정 클래스로 처리합니다.

import numpy as np

y_true = ["object", "background", "background", "object", "object", "object", "background", "object", "background", "object"]
pred_scores = [0.7, 0.3, 0.5, 0.6, 0.55, 0.9, 0.4, 0.2, 0.4, 0.3]

threshold = 0.5
y_pred = ["object" if score >= threshold else "background" for score in pred_scores]
print(y_pred)

결과:

['object', 'background', 'object', 'object', 'object', 'object', 'background', 'background', 'background', 'background']

이제 y_true와 y_pred가 모두 준비되었으므로, 혼동 행렬, 정밀도, 재현율 등을 계산할 수 있습니다.

2. 정밀도-재현율 곡선 (PR Curve)

정밀도는 모델이 예측한 객체 중 실제로 올바른 비율이며, 재현율은 전체 실제 객체 중 모델이 잘 포착한 비율입니다. 두 지표는 상호 보완적이며, 한쪽이 높아지면 다른 쪽이 낮아지는 경향이 있습니다.

이러한 균형을 시각화하기 위해 정밀도-재현율 곡선을 그립니다. 이를 위해 다양한 임계값을 적용해 정밀도와 재현율을 반복적으로 계산합니다.

def precision_recall_curve(y_true, pred_scores, thresholds):
    precisions = []
    recalls = []
    
    for th in thresholds:
        y_pred = ["object" if score >= th else "background" for score in pred_scores]
        
        precision = sklearn.metrics.precision_score(y_true=y_true, y_pred=y_pred, pos_label="object")
        recall = sklearn.metrics.recall_score(y_true=y_true, y_pred=y_pred, pos_label="object")
        
        precisions.append(precision)
        recalls.append(recall)
    
    return precisions, recalls

# 예시 데이터
y_true = ["object", "background", "background", "object", "object", "object", "background", "object", "background", "object"] * 2
pred_scores = [0.7, 0.3, 0.5, 0.6, 0.55, 0.9, 0.4, 0.2, 0.4, 0.3] * 2
thresholds = np.arange(0.2, 0.7, 0.05)

precisions, recalls = precision_recall_curve(y_true, pred_scores, thresholds)

# 그래프 출력
import matplotlib.pyplot as plt
plt.plot(recalls, precisions, linewidth=2, color='red')
plt.xlabel("Recall", fontsize=12, fontweight='bold')
plt.ylabel("Precision", fontsize=12, fontweight='bold')
plt.title("Precision-Recall Curve", fontsize=14, fontweight='bold')
plt.show()

이 곡선을 통해 최적의 임계값을 선택할 수 있으며, 일반적으로 F1 스코어를 기준으로 판단합니다. F1은 정밀도와 재현율의 조화평균이며,

f1_scores = 2 * (np.array(precisions) * np.array(recalls)) / (np.array(precisions) + np.array(recalls))
best_f1_idx = np.argmax(f1_scores)
best_threshold = thresholds[best_f1_idx]

3. 평균 정밀도 (AP) 계산

AP는 정밀도-재현율 곡선 아래의 면적을 계산한 값으로, 모든 임계값에 대한 정밀도의 가중 평균입니다. 재현율이 증가할 때마다 정밀도를 고려해 적절한 가중치를 부여합니다.

precisions.append(1.0)
recalls.append(0.0)

precisions = np.array(precisions)
recalls = np.array(recalls)

AP = np.sum((recalls[:-1] - recalls[1:]) * precisions[:-1])
print(f"Average Precision (AP): {AP:.4f}")

4. 교차 유사도 (IoU) – 검출 정확도 평가 핵심

목표 검출에서 예측 박스와 실제 박스의 일치 정도를 수치적으로 평가하기 위해 IoU (Intersection over Union)를 사용합니다. 이 값은 두 박스의 겹침 영역과 전체 영역의 비율입니다.

def intersection_over_union(gt_box, pred_box):
    # 좌상단 및 우하단 좌표 계산
    x1 = max(gt_box[0], pred_box[0])
    y1 = max(gt_box[1], pred_box[1])
    x2 = min(gt_box[0] + gt_box[2], pred_box[0] + pred_box[2])
    y2 = min(gt_box[1] + gt_box[3], pred_box[1] + pred_box[3])
    
    # 겹침 영역 계산
    inter_width = max(0, x2 - x1)
    inter_height = max(0, y2 - y1)
    intersection = inter_width * inter_height
    
    # 전체 영역 계산
    union = gt_box[2] * gt_box[3] + pred_box[2] * pred_box[3] - intersection
    
    iou = intersection / union
    return iou, intersection, union

# 예시: 고양이 이미지의 실제 및 예측 박스
gt_box = [320, 220, 680, 900]
pred_box = [500, 320, 550, 700]

iou, intersect, union = intersection_over_union(gt_box, pred_box)
print(f"IoU: {iou:.2f}, Intersection: {intersect}, Union: {union}")

IoU 값이 0.5 이상이면 예측이 의미 있는 것으로 간주되며, 이를 기준으로 검출 결과를 긍정/부정으로 분류합니다.

5. mAP 계산 – 다중 클래스 평가

실제 데이터셋은 여러 클래스를 포함합니다. 각 클래스별로 AP를 계산한 후, 평균을 취하면 전체 모델의 성능을 나타내는 mAP가 도출됩니다.

# 클래스 1
y_true_1 = ["object", "background", "object", "background", "object", "object", "object", "background", "object", "background"]
pred_scores_1 = [0.7, 0.3, 0.5, 0.6, 0.55, 0.9, 0.75, 0.2, 0.8, 0.3]

# 클래스 2
y_true_2 = ["background", "object", "object", "background", "background", "object", "object", "object", "background", "object"]
pred_scores_2 = [0.32, 0.9, 0.5, 0.1, 0.25, 0.9, 0.55, 0.3, 0.35, 0.85]

# 각 클래스별로 AP 계산
ap1 = compute_ap(y_true_1, pred_scores_1, thresholds)
ap2 = compute_ap(y_true_2, pred_scores_2, thresholds)

# mAP = 평균 AP
mAP = (ap1 + ap2) / 2
print(f"Mean Average Precision (mAP): {mAP:.4f}")

이처럼, 각 클래스의 정밀도-재현율 곡선을 기반으로 한 AP를 평균 내면, 종합적인 모델 성능을 객관적으로 평가할 수 있습니다.

태그: map IoU object detection precision-recall curve AP

6월 17일 20:47에 게시됨

괴물 클럽