파이썬으로 구현하는 퍼셉트론 알고리즘 완벽 가이드

1. 이론적 배경

1.1. 퍼셉트론의 기본 개념

퍼셉트론은 생물학적 뉴런의 동작 원리를 모방한 알고리즘입니다. 뉴런은 주변 뉴런들로부터 일정 수준 이상의 자극을 받으면 활성화되어 신호를 전달하는데, 퍼셉트론도 이와 유사하게 동작합니다.

입력 벡터가 주어졌을 때, 각 입력값에 대응하는 가중치를 곱한 후 합산하여 활성화 함수를 통과시킵니다. 예를 들어, 4차원 입력(X1, X2, X3, X4)이 있다면 각각의 가중치(W1, W2, W3, W4)와 곱해져 하나의 값으로 합산됩니다.

이 알고리즘의 핵심 방정식은 다음과 같습니다:

y = activation(w1*x1 + w2*x2 + w3*x3 + w4*x4 + bias)

여기서 activation 함수는 step function으로, 합산 결과가 0보다 크면 1, 그렇지 않으면 -1을 출력합니다. 퍼셉트론은 본질적으로 이진 분류를 위한 선형 알고리즘입니다.

1.2. 손실 함수와 학습

퍼셉트론의 손실 함수는 오분류된 샘플들의 가중 합으로 정의됩니다. 오분류된 점이 있다는 것은 현재의 가중치와 입력의 곱이 잘못된 부호를 가진다는 의미입니다.

손실 함수는 다음과 같이 표현됩니다:

L(θ) = -Σ(y_i * (θ·x_i))  (단, i는 오분류된 샘플의 인덱스)

경사 하강법을 사용하여 이 손실 함수를 최소화합니다. 데이터셋이 선형 분리 가능하다면, 알고리즘은 반복적으로 파라미터를 업데이트하여 완벽한 분류 경계를 찾습니다.

2. 구현

2.1. 환경 설정

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.preprocessing import StandardScaler, MinMaxScaler

plt.style.use('ggplot')
plt.rcParams.update({
    'font.size': 14,
    'axes.labelsize': 12,
    'axes.titlesize': 12,
    'figure.figsize': (10, 8),
    'axes.grid': True
})

2.2. 활성화 함수

def step_activation(z):
    return 1.0 if z > 0 else -1.0

2.3. 선형 분리 가능 데이터

X, y = datasets.make_blobs(
    n_samples=150, n_features=2,
    centers=2, cluster_std=3.20
)
y = np.where(y == 0, -1, 1)  # 0을 -1로 변환

# 데이터 정규화
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# 시각화
plt.figure(figsize=(10, 8))
plt.scatter(X_scaled[y == -1, 0], X_scaled[y == -1, 1], 
           marker='^', color='red', label='Class -1')
plt.scatter(X_scaled[y == 1, 0], X_scaled[y == 1, 1], 
           marker='s', color='blue', label='Class 1')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('선형 분리 가능 데이터')
plt.legend()
plt.show()

2.4. 퍼셉트론 학습 함수

def perceptron_train(X, y, learning_rate=0.01, epochs=100):
    """
    퍼셉트론 학습 함수
    
    Parameters:
    X: 입력 데이터
    y: 타겟 레이블 (-1 또는 1)
    learning_rate: 학습률
    epochs: 에폭 수
    
    Returns:
    theta: 학습된 가중치
    error_history: 에폭별 오분류 수
    """
    m, n = X.shape
    
    # 바이어스를 포함한 가중치 초기화
    weights = np.zeros((n + 1, 1))
    
    # 오분류 기록을 저장할 리스트
    misclassification_history = []
    loss_history = []
    
    for epoch in range(epochs):
        misclassified = 0
        
        for idx, sample in enumerate(X):
            # 바이어스 항 추가
            sample_with_bias = np.insert(sample, 0, 1).reshape(-1, 1)
            
            # 예측값 계산
            prediction = step_activation(np.dot(sample_with_bias.T, weights))
            
            # 오분류된 경우 가중치 업데이트
            if prediction != y[idx]:
                weights += learning_rate * (y[idx] - prediction) * sample_with_bias
                misclassified += 1
        
        # 손실 계산
        x1, x2 = X[:, 0], X[:, 1]
        loss = (weights[1] * x1 + weights[2] * x2 + weights[0]) * y
        loss_history.append(loss.mean())
        
        misclassification_history.append(misclassified)
    
    return weights, misclassification_history, loss_history

2.5. 결정 경계 시각화

def plot_decision_boundary(X, y, weights):
    """
    결정 경계를 시각화하는 함수
    
    Parameters:
    X: 입력 데이터
    y: 타겟 레이블
    weights: 학습된 가중치
    """
    x1_min, x1_max = X[:, 0].min(), X[:, 0].max()
    
    # 결정 경계 계산: w0 + w1*x1 + w2*x2 = 0
    slope = -weights[1] / weights[2]
    intercept = -weights[0] / weights[2]
    
    x1_line = np.array([x1_min, x1_max])
    x2_line = slope * x1_line + intercept
    
    plt.figure(figsize=(10, 8))
    plt.scatter(X[y == -1, 0], X[y == -1, 1], 
               marker='^', color='red', label='Class -1')
    plt.scatter(X[y == 1, 0], X[y == 1, 1], 
               marker='s', color='blue', label='Class 1')
    plt.plot(x1_line, x2_line, 'y-', linewidth=2, label='Decision Boundary')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('퍼셉트론 결정 경계')
    plt.legend()
    plt.show()

2.6. 선형 분리 데이터 학습 결과

learning_rate = 0.005
epochs = 200

weights, misclassification_history, loss_history = perceptron_train(
    X_scaled, y, learning_rate, epochs
)

plot_decision_boundary(X_scaled, y, weights)

# 학습 과정 시각화
plt.figure(figsize=(12, 6))
plt.plot(range(len(misclassification_history)), misclassification_history)
plt.xlabel('Epoch')
plt.ylabel('오분류된 샘플 수')
plt.title('에폭별 오분류 개수 변화')
plt.show()

2.7. 비선형 데이터 처리

# 비선형 분리 데이터 생성
X_nonlinear, y_nonlinear = datasets.make_blobs(
    n_samples=150, n_features=2,
    centers=2, cluster_std=5.0
)
y_nonlinear = np.where(y_nonlinear == 0, -1, 1)

X_nonlinear_scaled = scaler.fit_transform(X_nonlinear)

# 여러 하이퍼파라미터 조합으로 학습
from sklearn.linear_model import Perceptron
from sklearn.model_selection import GridSearchCV

# 그리드 서치로 최적 하이퍼파라미터 탐색
param_grid = {
    'eta0': np.linspace(0.0001, 1, 10),
    'max_iter': [10, 50, 100, 500]
}

perceptron = Perceptron()
grid_search = GridSearchCV(perceptron, param_grid, cv=3, scoring='accuracy')
grid_search.fit(X_nonlinear_scaled, y_nonlinear)

print(f"최적 하이퍼파라미터: {grid_search.best_params_}")
print(f"최고 정확도: {grid_search.best_score_:.3f}")

# 최적 모델로 예측
best_model = grid_search.best_estimator_
print(f"테스트 정확도: {best_model.score(X_nonlinear_scaled, y_nonlinear):.3f}")

3. 핵심 요약

속도: 퍼셉트론은 단순한 선형 연산과 계단 함수만 사용하므로 매우 빠르게 학습됩니다.
한계: 데이터가 선형 분리 가능하지 않으면 손실 함수가 수렴하지 않습니다. 이러한 경우 특징 변환을 적용하거나 다른 알고리즘을 고려해야 합니다.
하이퍼파라미터 튜닝: 학습률과 에폭 수를 적절히 조정하면 성능을 크게 향상시킬 수 있습니다.

태그: 퍼셉트론 머신러닝 파이썬 선형분류 경사하강법

6월 17일 22:26에 게시됨

괴물 클럽