AI Storage: NAS vs SAN vs Object for Training and Inference
Artificial intelligence depends heavily on data. For enterprises working with large language models and generative AI, managing vast amounts of data is essential—both for training models and storing AI outputs. However, this data usually comes from multiple sources, including structured databases and unstructured files, spread across on-premises systems and the cloud. Addressing AI’s data needs means examining storage options like storage area networks (SAN), network attached storage (NAS), and object storage.
AI’s Data Mountain
Modern AI projects don’t rely on a single data source. Generative AI models pull from diverse unstructured data such as documents, images, audio, video, and even code. These models focus on relationships between data points, using both the original unstructured files and vectorised data stored as blocks.
Training large language models benefits from multiple data sources. Enterprises also link these models directly to their own data, often through retrieval augmented generation (RAG), which helps improve result relevance. This data can come from documents or enterprise applications with relational databases. Vectorisation is increasingly supported by major database vendors, allowing AI to interact with structured data efficiently.
Choosing Between NAS and SAN
Where to store AI data depends on multiple factors. Leaving data in its original location isn’t always an option—processing needs, isolation requirements, or performance demands often require new storage solutions. Vectorisation can increase data volume by up to ten times, pushing storage systems to their limits.
Unstructured data tends to be stored on file-based NAS, favored for its lower cost and easier scalability compared to direct-attached storage (DAS) or block access SAN. Structured data often resides on block storage, typically SAN, since high input/output operations per second (IOPS) and throughput are critical for production systems like ERP and CRM.
AI applications usually access data via file protocols, even when stored on SAN. The best choice depends on data type and AI workload. For document or image-heavy AI, NAS can be sufficient. For high-speed, data-intensive applications like autonomous vehicles or surveillance, SAN or fast local storage may be necessary. Data architects must also weigh the cost of moving data between storage types, especially during training versus inference.
Object Storage’s Role
Object storage offers a way to unify diverse data sources. It’s gaining traction both in the cloud and on-premises, thanks to its flat structure, global namespace, ease of management, scalability, and cost-effectiveness. However, traditional object storage has lagged in performance, making it less suited for latency-sensitive AI workloads.
Storage vendors are narrowing this gap. Solutions like Pure Storage’s FlashBlade and NetApp’s OnTap combine file, block, and object storage in one platform, enabling flexibility without hardware silos. Other technologies, like Hammerspace’s Hyperscale NAS, optimize file system performance to keep pace with GPU demands.
Balancing Storage for AI Projects
Until high-performance object storage becomes more widespread, AI systems will continue using a mix of NAS, SAN, object, and DAS storage. The balance among these will shift over an AI project’s lifetime and as AI tools develop.
Unstructured data often requires new hardware investments, while block and vector database needs are generally met with existing systems. Effective AI storage strategies consider keeping source unstructured data on file or object storage while managing vectorised data on block storage, maximizing both performance and flexibility.
For those interested in learning more about AI storage solutions and training, explore Complete AI Training's latest AI courses to deepen your knowledge of AI infrastructure and development.
Your membership also unlocks: