Go Back
Etched
Founded: 2023
Application Field: Inference
Etched was founded in June 2023 by two Harvard dropouts, Gavin Uberti and Chris Zhu, with the mission of developing an AI inference accelerator chip named Sohu, which boasts 10 times the inference performance of NVIDIA's H100 GPUs. Shortly after its founding, the company achieved a valuation of $34 million.
Application Field: Inference
Etched was founded in June 2023 by two Harvard dropouts, Gavin Uberti and Chris Zhu, with the mission of developing an AI inference accelerator chip named Sohu, which boasts 10 times the inference performance of NVIDIA's H100 GPUs. Shortly after its founding, the company achieved a valuation of $34 million.
More Details
Unparalleled Performance
- Sohu delivers 10 times the inference performance of NVIDIA's H100 GPUs.
- Runs large AI models 140 times faster in simulations compared to traditional GPUs.
Revolutionary Architecture
- Utilizes a groundbreaking design that directly embeds the transformer architecture into the chip core, optimizing performance specifically for transformer-based models.
- Achieves unprecedented efficiency by eliminating the need for general-purpose programmability found in GPUs.
Advanced Inference Capabilities
- Supports tree search encoding, enabling parallel comparisons of hundreds of responses for better accuracy and efficiency.
- Implements multicast speculative decoding, allowing real-time generation of new content.
Scalability for Large Models
- Designed to operate trillion-parameter AI models efficiently.
- The system can scale up to support 100 trillion-parameter models, enabling future-proof AI solutions.
Open-Source Software Stack
- Features a fully open-source software stack, allowing for flexibility, customization, and community collaboration.
- Simplifies development and integration for AI applications.
Cost-Effective and Energy-Efficient
- Highly optimized for transformer-based workloads, reducing unnecessary overhead and improving cost-efficiency.
- Uses a streamlined design that minimizes power consumption compared to traditional GPU-based systems.
Real-Time Capabilities
- Enables real-time inference for large models, unlocking potential in applications like content generation, chatbot interactions, and more.
Future-Oriented Design
- Specifically built for transformer-based models, which dominate current state-of-the-art AI applications.
- Poised to handle next-generation AI workloads as models become larger and more complex.
Simplified Hardware and Software Integration
- Features a single-core architecture optimized for efficiency, reducing complexity in deployment.
- The open-source nature of the system makes it easier for developers to adapt and integrate into existing workflows.
Breakthrough Efficiency
- Sohu’s architecture prioritizes high utilization of computational resources, with over 90% FLOPS utilization compared to ~30% in traditional GPUs.
- Designed to solve the compute bottlenecks in large-scale AI inference.