<?xml version="1.0" encoding="UTF-8" ?> <?xml-stylesheet type="text/xsl" href="rss.xsl"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel> <title>SON BLOG</title><description>AI Engineer 손성준 · LLM Serving, RAG, Rust, K8s</description><link>https://infoedu.co.kr/</link><atom:link href="https://infoedu.co.kr/feed_rss_updated.xml" rel="self" type="application/rss+xml" /> <managingEditor>손성준</managingEditor><docs>https://github.com/SonAIengine</docs><language>ko</language> <pubDate>Sat, 13 Jun 2026 15:29:11 -0000</pubDate> <lastBuildDate>Sat, 13 Jun 2026 15:29:11 -0000</lastBuildDate> <ttl>1440</ttl> <generator>MkDocs RSS plugin - v1.19.0</generator> <image> <url>None</url> <title>SON BLOG</title> <link>https://infoedu.co.kr/</link> </image> <item> <title>체험존 프로비저너: 신청 한 건으로 격리된 멀티테넌트 스택을 자동 발급하다</title> <category>Docker Compose</category> <category>FastAPI</category> <category>TLS</category> <category>XGEN</category> <category>nginx</category> <category>멀티테넌트</category> <category>인프라</category> <category>트러블슈팅</category> <category>프로비저닝</category> <description>이메일 신청 한 건으로 XGEN 전체 스택을 격리해 자동 발급하는 셀프서비스 체험존(Trial Zone) 프로비저너를 설계하고 운영한 과정을 정리한다. Docker Compose per-tenant 격리, AWS 엣지 TLS 종료, nginx 동적 서브도메인 라우팅, 비동기 발급, TTL 회수까지 실전 트러블슈팅 중심으로 다룬다.</description> <link>https://infoedu.co.kr/devops/infra/trial-zone-provisioner-self-service-multitenant-compose-edge-tls/</link> <pubDate>Sat, 13 Jun 2026 22:28:06 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/devops/infra/trial-zone-provisioner-self-service-multitenant-compose-edge-tls/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/devops/infra/trial-zone-provisioner-self-service-multitenant-compose-edge-tls.png" type="image/png" length="57031" /> </item> <item> <title>AI/ML &amp; LLM</title> <description>GPU 모델 서빙, RAG 파이프라인, 임베딩 최적화, 모델 파인튜닝까지 — LLM을 실제 서비스에 붙이며 쌓은 기술 기록</description> <link>https://infoedu.co.kr/ai/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/index.png" type="image/png" length="28561" /> </item> <item> <title>AMD GPU에서 LLM 돌리기: Vulkan vs ROCm 비교</title> <category>AMD GPU</category> <category>GGUF</category> <category>LLM</category> <category>ROCm</category> <category>Vulkan</category> <category>XGEN</category> <category>llama.cpp</category> <category>모델서빙</category> <description>XGEN 모델 서버에서 AMD GPU로 LLM을 서빙할 때 Vulkan과 ROCm 백엔드를 선택하는 기준, mlock 설정, GPU 감지 fallback chain 구현까지 실전 비교</description> <link>https://infoedu.co.kr/ai/XGEN/amd-gpu-llm-run-vulkan-vs-rocm-comparison/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/amd-gpu-llm-run-vulkan-vs-rocm-comparison/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/amd-gpu-llm-run-vulkan-vs-rocm-comparison.png" type="image/png" length="49104" /> </item> <item> <title>문서 임베딩 파이프라인: 청킹 옵션과 전처리 전략</title> <category>OCR</category> <category>RAG</category> <category>XGEN</category> <category>문서 처리</category> <category>임베딩</category> <category>청킹</category> <description>xgen-retrieval에서 PDF/DOCX/PPT 문서를 임베딩 파이프라인으로 처리하는 과정 - force_chunking, advanced chunking, OCR 처리, 텍스트 정제, 메타데이터 추출까지</description> <link>https://infoedu.co.kr/ai/XGEN/document-embedding-pipeline-chunking-option-preprocessing-strategy/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/document-embedding-pipeline-chunking-option-preprocessing-strategy/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/document-embedding-pipeline-chunking-option-preprocessing-strategy.png" type="image/png" length="48354" /> </item> <item> <title>Embedding 모델 서빙: batch size 최적화로 긴 문서 처리</title> <category>Embedding</category> <category>LLM</category> <category>XGEN</category> <category>batch size</category> <category>llama.cpp</category> <category>모델서빙</category> <description>XGEN 모델 서버에서 임베딩 모델을 서빙할 때 batch size 512→2048 증가로 긴 문서 임베딩을 지원하고, n_ubatch와 n_batch 차이, CPU 전용 처리 결정까지의 실전 기록</description> <link>https://infoedu.co.kr/ai/XGEN/embedding-model-serving-batch-size-optimization/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/embedding-model-serving-batch-size-optimization/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/embedding-model-serving-batch-size-optimization.png" type="image/png" length="51711" /> </item> <item> <title>GPU 상태 모니터링 및 자동 모델 배포 시스템</title> <category>AMD</category> <category>GPU</category> <category>LLM</category> <category>NVIDIA</category> <category>Python</category> <category>모니터링</category> <category>모델서빙</category> <category>자동화</category> <description>amdsmi → pynvml → torch.hip → torch.cuda 순서의 Fallback Chain으로 GPU를 감지하고, xgen-model이 UI 설정에 따라 자동으로 vLLM 또는 llama-server를 선택해 배포하는 시스템</description> <link>https://infoedu.co.kr/ai/XGEN/gpu-status-monitoring-auto-model-deploy-system/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/gpu-status-monitoring-auto-model-deploy-system/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/gpu-status-monitoring-auto-model-deploy-system.png" type="image/png" length="44189" /> </item> <item> <title>HuggingFace 모델 검색 및 다운로드 자동화</title> <category>FastAPI</category> <category>HuggingFace</category> <category>Tauri</category> <category>XGEN</category> <category>모델 관리</category> <category>모델 다운로드</category> <description>XGEN 모델 서버에서 HuggingFace Hub API로 모델을 검색하고, 백그라운드로 다운로드하며 진행상황을 추적하는 DownloadService 구현과 xgen-app(Tauri)과의 연동</description> <link>https://infoedu.co.kr/ai/XGEN/huggingface-model-search-download-automation/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/huggingface-model-search-download-automation/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/huggingface-model-search-download-automation.png" type="image/png" length="45401" /> </item> <item> <title>Iterative RAG: 반복 검색으로 복잡한 질문 답변하기</title> <category>Iterative RAG</category> <category>LLM</category> <category>RAG</category> <category>XGEN</category> <category>검색</category> <category>검색엔진</category> <category>벡터검색</category> <description>xgen-workflow에서 단순 RAG의 한계를 넘어 Iterative 방식으로 반복 검색하며 컨텍스트를 보완하는 4단계 파이프라인 구현 - Query Expansion, Large-Scale Search, Iterative Filtering, Compression</description> <link>https://infoedu.co.kr/ai/XGEN/iterative-rag-search-engine-impl/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/iterative-rag-search-engine-impl/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/iterative-rag-search-engine-impl.png" type="image/png" length="50452" /> </item> <item> <title>Late Chunking과 Sparse Embedding: 차세대 검색 파이프라인</title> <category>Late Chunking</category> <category>RAG</category> <category>Sparse Embedding</category> <category>XGEN</category> <category>임베딩</category> <description>xgen-workflow에서 Late Chunking 기법으로 문서 컨텍스트를 보존하는 청킹과, Sparse Embedding을 결합한 차세대 RAG 검색 파이프라인 설계 및 구현</description> <link>https://infoedu.co.kr/ai/XGEN/late-chunking-sparse-embedding-next-gen-search-pipeline/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/late-chunking-sparse-embedding-next-gen-search-pipeline/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/late-chunking-sparse-embedding-next-gen-search-pipeline.png" type="image/png" length="56424" /> </item> <item> <title>llama.cpp 서버 운영기: ROCm GPU에서의 삽질과 해결</title> <category>AMD</category> <category>GPU</category> <category>LLM</category> <category>ROCm</category> <category>XGEN</category> <category>llama.cpp</category> <category>모델서빙</category> <category>트러블슈팅</category> <description>AMD GPU 환경에서 llama.cpp 서버를 운영하며 겪은 ROCm GPU page fault, 메모리 크래시, Vulkan 전환까지의 실전 트러블슈팅 기록</description> <link>https://infoedu.co.kr/ai/XGEN/llama-cpp-server-ops-story-rocm-gpu-troubleshoot-fix/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/llama-cpp-server-ops-story-rocm-gpu-troubleshoot-fix/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/llama-cpp-server-ops-story-rocm-gpu-troubleshoot-fix.png" type="image/png" length="51861" /> </item> <item> <title>로컬 LLM 모델 관리 시스템: 로드/언로드/활성화 라이프사이클</title> <category>FastAPI</category> <category>LLM</category> <category>XGEN</category> <category>라이프사이클</category> <category>모델 관리</category> <description>XGEN 모델 서버에서 LLM 모델의 로드, 언로드, 활성화 상태를 관리하는 ProcessManager 라이프사이클 설계 - ModelState, auto activate, loading_status API 구현</description> <link>https://infoedu.co.kr/ai/XGEN/local-llm-model-management-system-load-unload-activation-lifecycle/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/local-llm-model-management-system-load-unload-activation-lifecycle/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/local-llm-model-management-system-load-unload-activation-lifecycle.png" type="image/png" length="46485" /> </item> <item> <title>멀티 GPU LLM 배포: GPU 선택 및 레이어 오프로딩 전략</title> <category>GPU</category> <category>LLM</category> <category>XGEN</category> <category>llama.cpp</category> <category>vLLM</category> <category>레이어 오프로딩</category> <category>멀티GPU</category> <category>모델서빙</category> <description>XGEN 모델 서버에서 멀티 GPU 환경을 지원하는 방법 - main_gpu, split_mode, tensor_split, n_gpu_layers를 통한 레이어 오프로딩 설계와 ProcessManager의 백엔드 자동 선택 구조</description> <link>https://infoedu.co.kr/ai/XGEN/multi-gpu-llm-deploy-gpu-selection-layer-offloading-strategy/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/multi-gpu-llm-deploy-gpu-selection-layer-offloading-strategy/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/multi-gpu-llm-deploy-gpu-selection-layer-offloading-strategy.png" type="image/png" length="48285" /> </item> <item> <title>OpenAI 호환 API 서버 직접 만들기</title> <category>API 서버</category> <category>FastAPI</category> <category>LLM</category> <category>OpenAI API</category> <category>XGEN</category> <category>llama.cpp</category> <category>vLLM</category> <category>프록시</category> <description>XGEN 모델 서버에서 /v1/chat/completions, /v1/embeddings 등 OpenAI 호환 엔드포인트를 FastAPI로 구현하고, llama-server와 vLLM 백엔드를 프록시하는 구조 설계 기록</description> <link>https://infoedu.co.kr/ai/XGEN/openai-compatible-api-server-direct-build/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/openai-compatible-api-server-direct-build/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/openai-compatible-api-server-direct-build.png" type="image/png" length="41606" /> </item> <item> <title>Qdrant 하이브리드 검색: Sparse + Dense 벡터 통합</title> <category>BM25</category> <category>Qdrant</category> <category>RAG</category> <category>Sparse Vector</category> <category>XGEN</category> <category>벡터검색</category> <category>하이브리드검색</category> <description>xgen-retrieval에서 Qdrant의 Prefetch+Fusion API로 BM25 Sparse Vector와 Dense Embedding을 결합하는 하이브리드 검색 구현, Full-Text Index 추가, 컬렉션 설정까지</description> <link>https://infoedu.co.kr/ai/XGEN/qdrant-hybrid-search-sparse-dense-vector-integration/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/qdrant-hybrid-search-sparse-dense-vector-integration/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/qdrant-hybrid-search-sparse-dense-vector-integration.png" type="image/png" length="50451" /> </item> <item> <title>RAG 서비스의 토큰 관리와 컨텍스트 윈도우 최적화</title> <category>LLM</category> <category>RAG</category> <category>XGEN</category> <category>컨텍스트 윈도우</category> <category>토큰 관리</category> <description>xgen-workflow Iterative RAG에서 vLLM 32K 컨텍스트 한계를 관리하는 TokenBudgetManager 구현 - 한글/영문 토큰 추정, 배치 필터링 토큰 제한, 압축 단계 토큰 예산 관리</description> <link>https://infoedu.co.kr/ai/XGEN/rag-service-token-management-context-window-optimization/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/rag-service-token-management-context-window-optimization/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/rag-service-token-management-context-window-optimization.png" type="image/png" length="48561" /> </item> <item> <title>Sparse Vector와 Full-Text Index 하이브리드 검색 구현</title> <category>BM25</category> <category>Python</category> <category>Qdrant</category> <category>RAG</category> <category>Sparse Vector</category> <category>하이브리드검색</category> <description>Qdrant에 Sparse Vector(BM25/SPLADE)와 Full-Text Index를 함께 구성하고, RRF Fusion으로 Dense+Sparse 하이브리드 검색을 구현한 과정</description> <link>https://infoedu.co.kr/ai/XGEN/sparse-vector-full-text-index-hybrid-search-impl/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/sparse-vector-full-text-index-hybrid-search-impl/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/sparse-vector-full-text-index-hybrid-search-impl.png" type="image/png" length="50847" /> </item> <item> <title>SSE 스트리밍으로 대규모 배치 워크플로우 결과 전달하기</title> <category>FastAPI</category> <category>SSE</category> <category>XGEN</category> <category>배치처리</category> <category>스트리밍</category> <description>xgen-workflow에서 100개 이상 테스트 케이스를 배치 처리하며 진행상황을 SSE로 실시간 전달하는 아키텍처 - batch_results에서 progress-only 방식으로의 전환, 취소 구현, Redis 세션 관리까지</description> <link>https://infoedu.co.kr/ai/XGEN/sse-streaming-large-scale-batch-workflow-result-deliver/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/sse-streaming-large-scale-batch-workflow-result-deliver/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/sse-streaming-large-scale-batch-workflow-result-deliver.png" type="image/png" length="52194" /> </item> <item> <title>vLLM 모델 배포: 샘플링 파라미터 튜닝 가이드</title> <category>GPU</category> <category>LLM</category> <category>llama.cpp</category> <category>vLLM</category> <category>모델서빙</category> <category>성능튜닝</category> <description>xgen-model에서 vLLM과 llama-server 두 백엔드의 핵심 파라미터를 정리하고, GPU 메모리 활용률, 컨텍스트 길이, 배치 설정이 성능에 미치는 영향을 실전 경험으로 정리</description> <link>https://infoedu.co.kr/ai/XGEN/vllm-model-deploy-sampling-parameter-tuning-guide/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/vllm-model-deploy-sampling-parameter-tuning-guide/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/vllm-model-deploy-sampling-parameter-tuning-guide.png" type="image/png" length="43286" /> </item> <item> <title>vLLM vs llama.cpp: 백엔드 스위칭 아키텍처 설계</title> <category>FastAPI</category> <category>LLM</category> <category>XGEN</category> <category>llama.cpp</category> <category>vLLM</category> <category>모델서빙</category> <category>백엔드스위칭</category> <category>아키텍처</category> <description>XGEN 모델 서버에서 vLLM과 llama-server를 런타임에 전환하는 UnifiedBackendManager 설계, switch-backend API, model_type에 따른 분기 전략, 리팩토링 과정 기록</description> <link>https://infoedu.co.kr/ai/XGEN/vllm-vs-llama-cpp-backend-switching-architecture-design/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/vllm-vs-llama-cpp-backend-switching-architecture-design/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/vllm-vs-llama-cpp-backend-switching-architecture-design.png" type="image/png" length="48599" /> </item> <item> <title>Python 싱글턴 풀 패턴으로 배치 실행 메모리 누수 해결하기</title> <category>Python</category> <category>XGEN</category> <category>메모리최적화</category> <category>싱글턴</category> <category>워크플로우</category> <description>RAG 워크플로우를 배치로 100건 이상 반복 실행할 때 발생하는 메모리 누수를 싱글턴 풀 패턴으로 해결한 과정. LLM 클라이언트, 검색 캐시, RAG 서비스의 객체 재사용 설계와 캐시 비활성화의 역설적 결정.</description> <link>https://infoedu.co.kr/ai/XGEN/workflow-execution-optimization-searchcache-singleton-pool-pattern/</link> <pubDate>Sat, 13 Jun 2026 21:42:21 +0000</pubDate> <source url="https://infoedu.co.kr/feed_rss_updated.xml">SON BLOG</source><guid isPermaLink="true">https://infoedu.co.kr/ai/XGEN/workflow-execution-optimization-searchcache-singleton-pool-pattern/</guid> <enclosure url="https://infoedu.co.kr/assets/images/social/ai/XGEN/workflow-execution-optimization-searchcache-singleton-pool-pattern.png" type="image/png" length="54360" /> </item> </channel> </rss>