GPUStack 클러스터 구축 가이드

환경 요구사항

server 및 worker 노드에 도커 사전 설치 필요. docker-compose 기반 배포 방식 사용. server 노드는 GPUStack 서비스만 실행하며, worker 노드에 Huawei Ascend 드라이버, 런타임, 펌웨어 설치.

GPUStack 설치 문서: https://docs.gpustack.ai/latest/installation/requirements/

서버 사양:

  • gpustack-server: ctyunos 23.01 x86_64
  • gpustack-worker: ctyunos 23.01 aarch64
소프트웨어 버전다운로드 링크
GPUStack-v2.0docker pull docker.1ms.run/gpustack/gpustack:v2.0
Docker-26.1.3https://download.docker.com/linux/static/stable/
Ascend-docker-runtime-7.3.0https://gitcode.com/Ascend/mind-cluster/releases
Ascend-hdk-910b-npu-driver-25.5.0https://www.hiascend.com/hardware/firmware-drivers/community
Ascend-hdk-910b-npu-firmware-7.8.0.5https://www.hiascend.com/hardware/firmware-drivers/community

로컬 YUM 저장소 설정

server/worker 노드 공통 작업:

# ISO 마운트
mount -o loop /path/to/ctyunos-23.01-aarch64-dvd.iso /mnt

# 저장소 설정
cat > /etc/yum.repos.d/local.repo <<EOF
[local-repo]
name=Local Repository
baseurl=file:///mnt
enabled=1
gpgcheck=0
EOF

# 부팅 시 자동 마운트
echo "/path/to/ctyunos.iso /mnt iso9660 loop,defaults 0 0" >> /etc/fstab
yum clean all
yum makecache

NFS 서버 설정

server 노드:

yum install -y nfs-utils
systemctl enable --now nfs-server
mkdir -p /shared/nfs
echo "/shared/nfs 192.168.0.0/16(rw,sync,no_root_squash)" > /etc/exports
exportfs -ra

worker 노드:

yum install -y nfs-utils
mkdir -p /shared/nfs
echo "server_ip:/shared/nfs /shared/nfs nfs defaults 0 0" >> /etc/fstab
mount -a

서버 노드 GPUStack 설치

docker-compose.yaml:

services:
  gpustack:
    image: registry_ip:8090/gpustack/gpustack:v2.0-amd64
    container_name: gpustack-master
    restart: unless-stopped
    ports:
      - "9090:80"
      - "10161:10161"
    volumes:
      - /shared/nfs/models:/usr/local/models
      - /gpustack/data:/var/lib/gpustack
    environment:
      - GPUSTACK_LOG_LEVEL=info
      - TZ=Asia/Seoul
    command: >
      --system-default-container-registry registry_ip:8090

실행: docker-compose up -d
웹 접속: http://server_ip:9090 (초기 비밀번호: /gpustack/data/initial_admin_password)

워커 노드 설정

사용자 생성 및 종속성 설치:

groupadd -r ascgrp
useradd -g ascgrp -d /home/ascuser -m ascuser -s /bin/bash
yum install -y kernel-devel-$(uname -r) gcc make

Ascend 런타임 설치:

./Ascend-docker-runtime_7.3.0_linux-aarch64.run --install

드라이버 및 펌웨어 설치:

./Ascend-hdk-910b-npu-driver_25.5.0_linux-aarch64.run --full --install-for-all
./Ascend-hdk-910b-npu-firmware_7.8.0.5.216.run --full
reboot

카드 확인: npu-smi info

containerd 서비스 설정:

[Unit]
Description=Container Runtime
After=network.target

[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
Restart=always

[Install]
WantedBy=multi-user.target

실행: systemctl enable --now containerd

docker-compose.yaml (워커 노드):

services:
  gpustack-node:
    image: registry_ip:8090/gpustack/gpustack:v2.0-arm64
    container_name: gpustack-node
    restart: unless-stopped
    privileged: true
    network_mode: host
    runtime: ascend
    environment:
      - ASCEND_VISIBLE_DEVICES=${NPU_IDS}
      - GPUSTACK_TOKEN=worker_token
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /shared/nfs/models:/usr/local/models
    command: >
      --server-url http://server_ip:9090
      --worker-ip worker_ip

자동 실행 스크립트:

#!/bin/bash
NPU_IDS=$(ls /dev/davinci* | awk -F'[0-9]+' '{print $NF}' | tr '\n' ',' | sed 's/,$//')
[ -z "$NPU_IDS" ] && NPU_IDS="0"
export ASCEND_VISIBLE_DEVICES=$NPU_IDS
docker-compose up -d

태그: GPUStack Huawei_Ascend ctyunos Docker_Compose NFS_Configuration

6월 1일 09:03에 게시됨