네이버클라우드, 옴니모달 하이퍼클로바X 공개…현실형 AI 에이전트 구현 가속:브레이크뉴스

▲ 텍스트와 이미지의 맥락을 함께 이해해 결과물을 만들어내는 HyperCLOVA X SEED 8B Omni 모델 © 브레이크뉴스

브레이크뉴스 정민우 기자= 네이버클라우드가 네이티브 옴니모달 구조를 적용한 하이퍼클로바X를 공개하며 차세대 AI 파운데이션 모델 경쟁에 본격적으로 뛰어들었다. 기본기부터 설계한 옴니모달 전략을 바탕으로 데이터 차별화와 단계적 스케일업, 특화 모델 생산까지 이어지는 확장 로드맵도 함께 제시했다.

네이버클라우드(대표 김유원)는 과학기술정보통신부 ‘독자 AI 파운데이션 모델’ 프로젝트의 주관 사업자로 추진 중인 ‘옴니(Omni) 파운데이션 모델’ 개발 과제의 첫 성과를 29일 공개했다. 이번에 공개된 모델은 국내 최초로 네이티브 옴니모달 구조를 적용한 ‘하이퍼클로바X 시드 8B 옴니(HyperCLOVA X SEED 8B Omni)’와 기존 추론형 AI에 시각·음성·도구 활용 역량을 결합한 ‘하이퍼클로바X 시드 32B 싱크(HyperCLOVA X SEED 32B Think)’ 두 종이다. 두 모델은 모두 오픈소스로 공개됐다.

네이버클라우드는 이번 공개를 통해 일상과 산업 현장에서 누구나 활용할 수 있는 현실형 AI 에이전트 구현을 본격화한다는 계획이다.

네이티브 옴니모달 구조로 ‘현실 세계 이해’ 강화

하이퍼클로바X 시드 8B 옴니는 텍스트·이미지·오디오 등 서로 다른 형태의 데이터를 하나의 모델에서 처음부터 함께 학습하는 네이티브 옴니모달 구조를 전면 적용한 것이 특징이다. 이는 텍스트 기반 모델에 이미지나 음성 모델을 결합하는 기존 멀티모달 방식과 달리, 단일 의미 공간에서 다양한 정보를 통합적으로 이해할 수 있도록 설계됐다.

옴니모달 AI는 말과 글, 시각·음성 정보가 복합적으로 오가는 현실 환경을 자연스럽게 이해할 수 있어 차세대 파운데이션 기술로 주목받고 있다. 글로벌 빅테크 기업들 역시 옴니모달을 차세대 AI 모델의 핵심 기술 축으로 삼고 있는 상황이다.

데이터 차별화와 단계적 스케일업 전략

네이버클라우드는 옴니모달 AI의 성능을 좌우하는 핵심 요소로 ‘데이터 차별화’를 강조했다. 단순히 모델 규모를 키우는 방식이 아니라, 현실 세계의 다양한 맥락을 담은 데이터를 확보하는 것이 중요하다는 판단이다.

성낙호 네이버클라우드 하이퍼스케일 AI 기술 총괄은 “모델을 아무리 대규모로 확장해도 데이터 다양성이 제한되면 AI의 문제 해결 능력 역시 특정 영역에 머물 수밖에 없다”며 “디지털화되지 않은 생활 맥락 데이터나 지역의 지리적 특성이 반영된 공간 데이터 등 현실 세계 데이터를 확보하고 정제하는 과정이 선행돼야 한다”고 말했다.

네이버클라우드는 이번 모델 공개를 통해 네이티브 옴니모달 개발 방법론을 검증한 만큼, 향후 차별화된 데이터를 본격적으로 학습시키며 단계적인 스케일업에 나설 방침이다. 단일 모델 구조의 옴니모달 AI는 규모 확장이 상대적으로 용이하다는 점에서 산업과 일상에 특화된 다양한 크기의 모델을 효율적으로 확장할 수 있다는 설명이다.

멀티모달 생성까지 구현…AI 에이전트 활용성 확대

하이퍼클로바X 시드 8B 옴니는 텍스트 지시를 기반으로 이미지를 생성·편집하는 옴니모달 생성 기능도 갖췄다. 텍스트와 이미지의 맥락을 함께 이해해 의미를 반영한 결과물을 만들어내는 방식으로, 단일 모델에서 텍스트 이해와 이미지 생성·편집을 자연스럽게 수행한다.

이는 글로벌 프런티어 AI 모델들이 제공해온 기능으로, 네이버클라우드는 이번 모델 공개를 통해 해당 수준의 멀티모달 생성 역량을 확보했음을 입증했다는 평가다.

네이버클라우드는 앞으로 옴니모달 하이퍼클로바X를 기반으로 현실 세계 이해 능력을 강화한 AI 에이전트를 다양한 산업과 서비스 영역으로 확장해 나간다는 계획이다. 이를 통해 국내 AI 생태계 전반의 경쟁력을 한 단계 끌어올린다는 구상이다.

*아래는 위 기사를 '구글 번역'으로 번역한 영문 기사의 [전문]입니다. '구글번역'은 이해도 높이기를 위해 노력하고 있습니다. 영문 번역에 오류가 있을 수 있음을 전제로 합니다.<*The following is [the full text] of the English article translated by 'Google Translate'. 'Google Translate' is working hard to improve understanding. It is assumed that there may be errors in the English translation.>

Naver Cloud Unveils Omnimodal HyperClova X, Accelerating the Implementation of Realistic AI Agents

Naver Cloud has officially entered the race for next-generation AI foundation models with the unveiling of HyperClova X, featuring a native omnimodal architecture. Building on its fundamentally designed omnimodal strategy, the company also presented an expansion roadmap encompassing data differentiation, gradual scale-up, and specialized model production.

On the 29th, Naver Cloud (CEO Kim Yu-won) unveiled the first results of its "Omni Foundation Model" development project, which it is leading as the lead developer of the Ministry of Science and ICT's "Independent AI Foundation Model" project. The newly unveiled models are the "HyperClova X SEED 8B Omni," Korea's first to adopt a native omnimodal architecture, and the "HyperClova X SEED 32B Think," which combines existing inference-based AI with visual, voice, and tool-based capabilities. Both models have been released as open source.

With this release, Naver Cloud plans to fully implement realistic AI agents that can be utilized by anyone in everyday life and industrial settings.

Enhanced Real-World Understanding with a Native Omnimodal Architecture

HyperClova X SEED 8B Omni features a fully native omnimodal architecture, which allows for co-learning of different data types, such as text, images, and audio, from a single model. Unlike existing multimodal approaches that combine text-based models with image or voice models, this approach is designed to integrate diverse information within a single semantic space.

Omnimodal AI is attracting attention as a next-generation foundation technology, as it can naturally understand real-world environments where speech, text, visual, and audio information intersect. Global big tech companies are also embracing omnimodal as a core technology for their next-generation AI models.

Data Differentiation and a Phased Scale-Up Strategy

Naver Cloud emphasized "data differentiation" as a key factor in determining the performance of omnimodal AI. Rather than simply increasing the model size, it is crucial to secure data that captures diverse real-world contexts.

Seong Nak-ho, Head of Hyperscale AI Technology at Naver Cloud, stated, "No matter how large a model is scaled, if data diversity is limited, AI's problem-solving capabilities will inevitably be limited to specific areas. We must first secure and refine real-world data, such as non-digitized contextual data or spatial data reflecting regional geographic characteristics."

Having validated its native omnimodal development methodology through this model release, Naver Cloud plans to actively train on differentiated data and embark on a phased scale-up strategy. Omnimodal AI, with its single-model architecture, is relatively easy to scale, allowing for efficient expansion into models of various sizes tailored to industries and everyday life.

Implementing Multimodal Generation… Expanding the Usability of AI Agents

HyperClovaX Seed 8B Omni also features omnimodal generation capabilities, which generate and edit images based on text instructions. By understanding the context of text and images together to produce meaningful results, a single model seamlessly performs text understanding and image generation/editing.

This capability has been provided by global frontier AI models, and Naver Cloud's release demonstrates its ability to achieve this level of multimodal generation capability.

Naver Cloud plans to expand AI agents with enhanced real-world understanding capabilities based on the omnimodal HyperClovaX into various industries and service areas. Through this, the company aims to elevate the competitiveness of the domestic AI ecosystem to the next level.