Author
Published
1 Apr 2026Form Number
LP2414PDF size
6 pages, 186 KBAbstract
This article summarizes MLPerf 6.0 results based on the Lenovo ThinkSystem SR650 V4 with two Intel Xeon 6 processors. Lenovo submitted results on three important workloads: Llama 3.1 8B (generative AI), Whisper (speech AI), and rGAT (graph AI). The results demonstrate that a mainstream 2U, CPU-based enterprise server can deliver balanced, competitive inference throughput and latency across diverse production use cases without requiring accelerator-centric deployments.
Introduction
The latest MLPerf 6.0 results show that the Lenovo ThinkSystem SR650 V4 continues to be a strong platform for enterprise AI inference. Using a 2-socket configuration with Intel Xeon 6787P processors, Lenovo submitted results on three important workloads:
- Llama 3.1 8B for generative AI
- Whisper for speech AI
- rGAT for graph AI
Together, these benchmarks highlight balanced performance across some of the most relevant AI use cases in business today.
The SR650 V4 is a 2U, 2-socket platform based on Intel Xeon 6700- and 6500-series processors with DDR5 memory support, making it well suited for customers who want to scale AI on familiar enterprise infrastructure.
MLPerf 6.0 highlights on Lenovo ThinkSystem SR650 V4
In MLPerf 6.0, Lenovo focused on three workloads that align closely with real enterprise AI demand: generative AI, speech AI, and graph AI. Running on the Lenovo ThinkSystem SR650 V4 with 2x Intel Xeon 6787P processors, the results show that the platform can deliver competitive performance across these very different inference scenarios.
For Llama 3.1 8B, the ThinkSystem SR650 V4 delivered strong results in both interactive and batch-style testing. In the Server scenario, the system achieved 281.78 tokens per second, which placed Lenovo in 2nd place among the compared results. In the Offline scenario, it delivered 775.98 tokens per second, good for 3rd place. The submission also recorded a p99.9 end-to-end latency of about 14 seconds, time to first token of about 2.5 seconds, and time per output token of about 103 milliseconds. Together, these results show that the platform can support both responsive user-facing generative AI and higher-throughput inference environments.
For Whisper, the ThinkSystem SR650 V4 achieved 18.57 samples per second in Lenovo’s benchmark summary. In the competitive comparison, the system reached 1,411.42 samples per second in the Offline test, placing 3rd. This highlights the platform’s ability to handle speech AI workloads such as transcription, meeting summarization, and call analytics efficiently.
For rGAT, Lenovo measured throughput of approximately 13.6K samples per second. In the competitive table, the ThinkSystem SR650 V4 posted 13,570.30 samples per second in the Offline category, also placing 3rd. This demonstrates strong capability for graph-based AI workloads such as fraud detection, relationship analysis, and recommendation use cases.
Taken together, these MLPerf 6.0 results show that the ThinkSystem SR650 V4 is not just tuned for a single benchmark. It delivers a balanced profile across language, speech, and graph inference, which is exactly what many enterprises need as they expand AI into production.
Why these MLPerf 6.0 results matter
Enterprise AI is moving from pilots into production. As organizations deploy more AI services, they need systems that can deliver not only raw throughput, but also predictable response times, operational simplicity, and the flexibility to support multiple workload types on a common platform. That is why MLPerf matters: it provides a standardized way to evaluate real AI workloads under comparable conditions. MLCommons describes MLPerf Inference as an industry-standard benchmark suite for measuring machine learning system performance on representative workloads.
The ThinkSystem SR650 V4 fits well into this shift. It gives enterprises a mainstream data center server that can support modern AI inferencing without requiring every deployment to be accelerator centric. For organizations that want to run language, speech, and graph inference within existing operational models, that is a meaningful business advantage. This interpretation is partly inferred from the benchmark coverage and the SR650 V4 platform design.
Built on a proven enterprise platform
Our earlier MLPerf 5.1 paper positioned the ThinkSystem SR650 V4 as a data center-grade server for CPU-only inferencing across diverse AI workloads, including Llama 3.1 8B, Whisper, DLRMv2, RetinaNet, and rGAT. That paper also described the benchmarked configuration as using Intel Xeon 6787P processors, 1024 GB of DDR5-6400 memory, and Ubuntu 24.04.2 LTS.
Reusing that hardware foundation in the MLPerf 6.0 discussion helps show continuity: the same server family continues to deliver practical AI performance as benchmark focus shifts toward newer production workloads.
Conclusion
The MLPerf 6.0 results strengthen the case for the Lenovo ThinkSystem SR650 V4 as a versatile platform for enterprise AI inference. Across Llama 3.1 8B, Whisper, and rGAT, the server demonstrated competitive performance on workloads that map directly to high-value business use cases in generative AI, speech AI, and graph AI.
For organizations looking to deploy AI on proven infrastructure, the ThinkSystem SR650 V4 offers a practical balance of scalability, flexibility, and performance. Rather than being optimized for just one narrow workload, it shows the kind of breadth that enterprises need as AI becomes a standard part of production IT.
System Configuration and Software Environment
The following table lists the server configuration.
Author
Kelvin He is an AI Data Scientist at Lenovo. He is a seasoned AI and data science professional specializing in building machine learning frameworks and AI-driven solutions. Kelvin is experienced in leading end-to-end model development, with a focus on turning business challenges into data-driven strategies. He is passionate about AI benchmarks, optimization techniques, and LLM applications, enabling businesses to make informed technology decisions.
Trademarks
Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.
The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
ThinkSystem®
The following terms are trademarks of other companies:
Intel®, the Intel logo and Xeon® are trademarks of Intel Corporation or its subsidiaries.
Other company, product, or service names may be trademarks or service marks of others.
Configure and Buy
Full Change History
Course Detail
Employees Only Content
The content in this document with a is only visible to employees who are logged in. Logon using your Lenovo ITcode and password via Lenovo single-signon (SSO).
The author of the document has determined that this content is classified as Lenovo Internal and should not be normally be made available to people who are not employees or contractors. This includes partners, customers, and competitors. The reasons may vary and you should reach out to the authors of the document for clarification, if needed. Be cautious about sharing this content with others as it may contain sensitive information.
Any visitor to the Lenovo Press web site who is not logged on will not be able to see this employee-only content. This content is excluded from search engine indexes and will not appear in any search results.
For all users, including logged-in employees, this employee-only content does not appear in the PDF version of this document.
This functionality is cookie based. The web site will normally remember your login state between browser sessions, however, if you clear cookies at the end of a session or work in an Incognito/Private browser window, then you will need to log in each time.
If you have any questions about this feature of the Lenovo Press web, please email David Watts at dwatts@lenovo.com.
