Exclusive: How Complexity is Slowing AI Progress

Enterprises have moved past the novelty phase of AI. Large language models (LLMs) are no longer confined to innovation teams or labs. Now, they’re embedded in workflows, powering search, summarisation, boosting productivity, and increasingly, decision support. Retrieval-Augmented Generation (RAG) has emerged as the go-to method for grounding these models in enterprise data. But as deployments scale, something’s breaking.
by Haider Aziz, General Manager – Middle East, Turkey and Africa – VAST Data
What started as a clever workaround, retrieving up-to-date knowledge from internal sources rather than retraining massive models, has evolved into a fragile, bloated architecture. Data is copied across systems. Embeddings are generated on schedules. Permissions live elsewhere. Vector databases operate in silos. And each of these steps needs to work perfectly, in sequence, at scale, and in real time. For now at least, RAG is doing its job. The pipeline around it isn’t.
Complexity is creeping in and confidence is creeping out
Globally, Enterprise IT leaders are waking up to the fact that while their AI stack may look impressive, their retrieval architecture is often a fragile chain of loosely connected tools, each one dependent on the next, and none built to work as a whole.
● Files sit in traditional storage
● Embeddings are created by GPU clusters on a delayed cadence
● Those embeddings are stored in a vector database, sometimes in the cloud
● Permissions are managed in a separate access control system
● A query orchestration layer attempts to hold it all together
● Structured data? That’s an entirely different path through a data warehouse or lakehouse, often with its own batch delay
Every step introduces risk, and latency.
One enterprise AI architect I spoke with recently likened it to “building a Formula 1 car where the tyres, engine and chassis were all bought from different vendors who’ve never met.”
When users ask a question and get an outdated or incomplete answer, they stop trusting the system. When governance teams realise the retrieval layer can return content they can’t audit or enforce, things grind to a halt. And when IT teams need to troubleshoot a failure in a 7-step pipeline, the case for scale collapses under its own weight.
The illusion of real-time
The challenge here isn’t just speed. It’s freshness, trust, and operational clarity. A generative model might respond in milliseconds. But if it’s retrieving from an index that was last updated 12 hours ago, or referencing content a user no longer has access to, then the answer, however fast, is flawed.
This is particularly problematic in environments like finance, defence, healthcare, and utilities, where the cost of a stale or overexposed answer is reputational, regulatory, or worse. As CloudFactory has noted, RAG systems that aren’t continuously evaluated and curated tend to degrade over time. Without live indexing, dynamic access enforcement, and reranking built in, these systems become fragile under load and drift away from accuracy.
Real-time AI isn’t just about inference speed. It’s about having the right data in the right moment, and knowing that data is policy-aligned.
What a modern retrieval layer looks like
Some in the industry have started calling this next phase the shift from a “retrieval pipeline” to an “insight engine.” It reflects a subtle but important transition: away from cobbling together components, and toward a unified system where data, vectors, permissions and compute operate in one continuous flow.
In the most forward-leaning architectures:
● Data is embedded at the point of arrival, not in post-processed batches
● Structured and unstructured sources are indexed and queried from a single namespace
● Permissions are enforced dynamically, at query time, without external policy syncs
● Inference happens close to the data, not across layers of latency
● And semantic search, summarisation, and retrieval are part of the storage fabric itself — not an afterthought
This isn’t theoretical. Some of the world’s most advanced organisations in AI R&D, sovereign infrastructure, and enterprise transformation, are actively building toward this model now. And in the GCC, the opportunity may be even greater.
Across the UAE and Saudi Arabia, many enterprises are building AI infrastructure without the burden of legacy systems found elsewhere. With national AI strategies advancing quickly and sovereign compute investment accelerating, the region can leapfrog the “patchwork” phase entirely.
For IT leaders in the Gulf, simplifying the retrieval layer isn’t just smart, it’s strategic.
Why simplification is the path to scale
The lesson for CIOs and infrastructure leaders is simple: RAG can’t scale inside a pipeline that wasn’t designed to scale. You don’t make AI more trustworthy by adding more moving parts. You do it by making the parts simpler, more aligned, and more integrated.
That’s the real unlock for enterprise AI in 2025 and beyond: not just faster models, but retrieval systems that don’t need to be re-engineered every time the business changes.
The sooner organisations start collapsing the distance between their data, their policies, and their inference engines, the sooner they’ll stop debugging broken pipelines, and start scaling intelligence that works.
For CIOs in the GCC, where transformation cycles are measured in months, not years, now is the time to build AI infrastructure that moves as fast as the ambition behind it