환경 요구사항
server 및 worker 노드에 도커 사전 설치 필요. docker-compose 기반 배포 방식 사용. server 노드는 GPUStack 서비스만 실행하며, worker 노드에 Huawei Ascend 드라이버, 런타임, 펌웨어 설치.
GPUStack 설치 문서: https://docs.gpustack.ai/latest/installation/requirements/
서버 사양:
- gpustack-server: ctyunos 23.01 x86_64
- gpustack-worker: ctyunos 23.01 aarch64
| 소프트웨어 버전 | 다운로드 링크 |
|---|---|
| GPUStack-v2.0 | docker pull docker.1ms.run/gpustack/gpustack:v2.0 |
| Docker-26.1.3 | https://download.docker.com/linux/static/stable/ |
| Ascend-docker-runtime-7.3.0 | https://gitcode.com/Ascend/mind-cluster/releases |
| Ascend-hdk-910b-npu-driver-25.5.0 | https://www.hiascend.com/hardware/firmware-drivers/community |
| Ascend-hdk-910b-npu-firmware-7.8.0.5 | https://www.hiascend.com/hardware/firmware-drivers/community |
로컬 YUM 저장소 설정
server/worker 노드 공통 작업:
# ISO 마운트 mount -o loop /path/to/ctyunos-23.01-aarch64-dvd.iso /mnt # 저장소 설정 cat > /etc/yum.repos.d/local.repo <<EOF [local-repo] name=Local Repository baseurl=file:///mnt enabled=1 gpgcheck=0 EOF # 부팅 시 자동 마운트 echo "/path/to/ctyunos.iso /mnt iso9660 loop,defaults 0 0" >> /etc/fstab yum clean all yum makecache
NFS 서버 설정
server 노드:
yum install -y nfs-utils systemctl enable --now nfs-server mkdir -p /shared/nfs echo "/shared/nfs 192.168.0.0/16(rw,sync,no_root_squash)" > /etc/exports exportfs -ra
worker 노드:
yum install -y nfs-utils mkdir -p /shared/nfs echo "server_ip:/shared/nfs /shared/nfs nfs defaults 0 0" >> /etc/fstab mount -a
서버 노드 GPUStack 설치
docker-compose.yaml:
services:
gpustack:
image: registry_ip:8090/gpustack/gpustack:v2.0-amd64
container_name: gpustack-master
restart: unless-stopped
ports:
- "9090:80"
- "10161:10161"
volumes:
- /shared/nfs/models:/usr/local/models
- /gpustack/data:/var/lib/gpustack
environment:
- GPUSTACK_LOG_LEVEL=info
- TZ=Asia/Seoul
command: >
--system-default-container-registry registry_ip:8090
실행: docker-compose up -d
웹 접속: http://server_ip:9090 (초기 비밀번호: /gpustack/data/initial_admin_password)
워커 노드 설정
사용자 생성 및 종속성 설치:
groupadd -r ascgrp useradd -g ascgrp -d /home/ascuser -m ascuser -s /bin/bash yum install -y kernel-devel-$(uname -r) gcc make
Ascend 런타임 설치:
./Ascend-docker-runtime_7.3.0_linux-aarch64.run --install
드라이버 및 펌웨어 설치:
./Ascend-hdk-910b-npu-driver_25.5.0_linux-aarch64.run --full --install-for-all ./Ascend-hdk-910b-npu-firmware_7.8.0.5.216.run --full reboot
카드 확인: npu-smi info
containerd 서비스 설정:
[Unit] Description=Container Runtime After=network.target [Service] ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/bin/containerd Restart=always [Install] WantedBy=multi-user.target
실행: systemctl enable --now containerd
docker-compose.yaml (워커 노드):
services:
gpustack-node:
image: registry_ip:8090/gpustack/gpustack:v2.0-arm64
container_name: gpustack-node
restart: unless-stopped
privileged: true
network_mode: host
runtime: ascend
environment:
- ASCEND_VISIBLE_DEVICES=${NPU_IDS}
- GPUSTACK_TOKEN=worker_token
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /shared/nfs/models:/usr/local/models
command: >
--server-url http://server_ip:9090
--worker-ip worker_ip
자동 실행 스크립트:
#!/bin/bash
NPU_IDS=$(ls /dev/davinci* | awk -F'[0-9]+' '{print $NF}' | tr '\n' ',' | sed 's/,$//')
[ -z "$NPU_IDS" ] && NPU_IDS="0"
export ASCEND_VISIBLE_DEVICES=$NPU_IDS
docker-compose up -d