Cohere trains and deploys its generative AI models on OCI

May 20, 2025 | 6 minute read

By Erin Dawson, Senior Developer Marketing Manager at Oracle

Figure 1: Cohere is a leading figure in AI innovation.

If you've spent any time in the world of cloud computing, you’ve heard the phrase “at scale.” But when speaking of “scale” in the AI space, with its massive workloads to train and deploy LLMs, it’s helpful to recalibrate what scale means. This is the difference between a sandbox and a football field; between Pluto and Saturn; between an anthill and Denali.

Sheer growth demands for AI workloads meant that Cohere Inc., a security-first enterprise AI company, needed additional resources and the flexibility to scale its AI workloads: enough graphics processing units (GPUs) to train LLMs and enough resources to distribute them at scale, all while ensuring security and performance reliability.

Cohere used Oracle Cloud Infrastructure Kubernetes Engine (OKE) and GPUs hosted on Oracle Cloud Infrastructure (OCI) to ensure that its world-class AI solution was leveraged at the enterprise level, across the world.

The challenge of bringing AI to the enterprise

Cohere helps enterprise-scale organizations revolutionize operations and boost productivity using secure AI, maximizing potential for real-world business applications using cutting-edge large language models—that’s no mean feat.

To help meet increasing customer demand Cohere needed more GPUs. With larger demands for AI workloads, Cohere needed to train its state-of-the-art foundational models quickly.

“We needed to expand our capacity, and we certainly were excited to engage on the training front with GPUs with Oracle. From a business perspective, it was critical,” says Autumn Moulder, VP of Infrastructure & Security at Cohere. “We needed [additional GPUs] to meet growing customer demand on our platform, so there was an urgent need there.

The role of GPUs to facilitate the massive workloads demanded by AI is difficult to overstate, but so is the role of Kubernetes to orchestrate it all. “Kubernetes powers everything we do,” Moulder says. “It's that foundational layer that allows us to translate how our workloads get deployed and how we actually use compute across many, many providers.”

This meant that Cohere needed additional GPUs and a scalable Kubernetes environment to function both alongside and on top of its already established multi-cloud environment to meet increasing customer demand.

Designing a solution

Before Cohere began using OCI Compute powered by NVIDIA GPUs and deploying Kubernetes clusters efficiently, the enterprise AI platform needed to integrate OCI into its environment. Cohere had already been using Kubernetes to deploy its workloads, and by using OKE, it could continue deploying as they were before, but with access to the GPUs they needed.

From a design standpoint, because every organization’s requirements are different, Cohere leaned on OCI’s expertise to design its cloud infrastructure in a way that fit its needs.

“Anytime that you’re looking to do training at any scale, certainly you need to find a partner who you can work with and say, ‘Hey, how’s the network set up? Can we make sure that the servers are racked close together, what’s the interconnect look like’?” says Moulder. The Cohere team didn’t need to worry about all the technical details—Cohere knew what it needed, and Oracle made it happen.

Another key differentiator for Cohere was Oracle’s emphasis on cloud native.

Moulder highlights that Cohere benefited from “…being able to say, ‘This provider has all of the kind of similar things that we’d expect and we need it co-located with the cluster’—those aren’t things that every provider can just offer you So it certainly was important, especially early in the days.”

Figure 2: The architecture for Cohere’s deployment.

The flexibility of additional GPUs

With the infrastructure in place to leverage additional GPUs, Cohere was able to train foundational language models to meet growing customer demand and do so flexibly. Cohere’s ability to scale its AI workloads dynamically, allocating the right amount of compute power based on demand, meant they could train models faster.

Flexibility is a huge boon to deploying these AI workloads, but efficiency was also a priority due to their size and importance.

One way Cohere was able to use the additional GPUs efficiently was through using RDMA networks. OCI’s highly optimized RDMA networks make it possible to scale large AI models efficiently. This significantly improved the training and inference performance of Cohere’s AI models by enabling faster data transfer between compute nodes without overloading the CPU.

Collaboration and support

When first deploying workloads onto a new solution, as Cohere did, the risk of failure could mean substantial downtime. This is not a process you want to get wrong. Thankfully, OCI’s Support team worked with Cohere to help ensure that all of this ran smoothly.

“At every stage, the Oracle Support team was there,” Moulder says. “It was great to have a support team that said, ‘We’ll meet you where you are—you don’t just have to throw a ticket over the wall and hope that somebody’s going to meet you.’”

Making security a priority

Collaboration between Oracle and Cohere wasn’t limited to ensuring that Cohere could deploy its workloads—both parties sought to ensure that doing so was done according to the highest security standards.

Part of that collaboration meant conducting a Security Maturity Profile Assessment, performed by Oracle, to help ensure that Cohere mitigated any security vulnerabilities.

Some of the general security areas addressed in the Security Maturity Profile Assessment included: Logging and Monitoring and Alerting (SIEM Integration), Security Posture Management (CloudGuard Enablement), Identity and Access Management (Federation Integration), Network Management and Security (Fast Connect/VPN), Cloud Governance, and Database Security.

“It can be a challenge when you onboard another cloud provider, simply because the way that you secure that environment may look a little different than what you were used to,” Moulder says. “This was very valuable for OCI to come to us and say, ‘Hey, we offer this capability to do this evaluation, we can work with you, we can help you make sure that the right controls are in place.’ They worked hand-in-hand with our Security team to make sure that we knew how to map the controls we have in other environments, here’s how we mapped those very directly to the way things are working in the OCI framework. That made it a lot faster for us to gain confidence in the security profile on the OCI side.”

Results

Cohere chose Oracle as a true partner to help the company with its design, help ensure it was secure, and optimize the technology to maximize performance and scalability with OCI for Cohere’s ever-in-demand and growing AI workloads. Cohere also benefited from OCI’s scalability with OKE and its high-performance GPUs for AI workloads.

The combination of performance, cost efficiency, and security (including hand-in-hand support) meant that Cohere could train and deploy models quickly and without worrying about running out of GPU resources.

When it comes to advice for organizations in a similar scenario to Cohere, Moulder says: “Make sure that you’re working with a partner, like Oracle, who has that understanding of how the hardware and its unique requirements impact your workloads, because there's such a tight coupling between AI and hardware. Finding a cloud partner who does a good job of understanding what works well, and then bringing those components to you in an automated fashion—it's really critical.”

Learn more about OCI Kubernetes Engine or Oracle’s approach to AI.

By Erin Dawson,

Senior Developer Marketing Manager, Oracle