The following article was created by the RISC-V Association based on an article on Esperanto Technology. More accurate information can be obtained by participating in RISC-V Day.
Click here to register for RISC-V Day
Esperanto Technologies has developed a scalable “Generative AI Appliance” based on RISC-V technology. In Esperanto, the literal translation might be “Generativa AI Aparato.” This development is a significant advance in the field of AI and computing and targets the needs and applications of Generative AI. Furthermore, the AI/HPC platform of Esperanto RISC-V AI semiconductor ET-SoC-1 can be evaluated using META OPT. Esperanto’s “large-scale language model” can be evaluated. META OPT can be evaluated by LLM on an AI/HPC platform using the RISC-V AI semiconductor ET-SoC-1.It is ideal for the healthcare, legal, and financial industries that need to maintain data privacy. Esperanto’s processor design leverages the RISC-V instruction set to provide energy efficiency for a wide range of AI and HPC workload combinations. The product design for “Generative AI Home Appliances” includes mechanical, thermal, electrical, and firmware aspects. You can see that it’s not just a statement.
What is noteworthy is that on April 25, 2023, the hardware and software that will allow user companies to evaluate the AI generated using the AI chip ET-SoC-1 will be available. Esperanto has ported Meta’s Open Pre-Trained Transformer model to data center systems using its own chips. The advantage of being able to evaluate the performance and power consumption of generative AI processing on an evaluation server using a 7-nanometer chip is significant. Rather than theoretical performance metrics, it provides a means to analyze a chip’s capabilities in a real application environment. The practicality of an AI chip is determined not only by its theoretical processing power and efficiency, but also by its performance in specific use cases. Esperanto provides benchmarks that are relevant to real-world applications, allowing customers to evaluate products and understand their benefits.
This “generation AI home appliance” has the following features:
1. Advanced RISC-V hardware: Utilizes 7-nanometer RISC-V technology to provide high-performance and energy-efficient computing solutions. This makes it suitable for generative AI, other AI, and large-scale high-performance computing (HPC) workloads.
2. Ease of deployment: Designed to be user-friendly, developers can quickly create and deploy vertical applications. Preloaded and self-contained, there is no need to download, port, or tune the latest models to your hardware platform.
3. Data privacy and low total cost of ownership (TCO): Integrated software/hardware solution that can be installed in private data centers or at the enterprise edge. This ensures a high level of data privacy and reduces TCO.
4. Running the latest LLM and image generation models: Currently supports the latest Large Language Models (LLM) and image generation models, including LLaMA 2, Vicuna, StarCoder, OpenJourney, and Stable Diffusion. Esperanto plans to continually update the system with the latest versions of these models as they are released.
5. Designed for a variety of industries: This appliance is ideal for industries such as healthcare, law, and finance that require fast and accurate data processing and maintain data privacy.
6. Energy-efficient and cost-effective: Positioned as a more energy-efficient and cost-effective alternative compared to traditional GPU-based systems. There is a growing trend towards small and efficient LLM and diffusion models that offer low training and inference costs with high accuracy.
7. Product Availability: Currently available and includes the ET-SoC-1 AI accelerator chip that can run up to 4 LLMs simultaneously. Supplied in a standard 2U high rack mount chassis.
ChatGPT, a natural language AI real-time dialogue system that performs learning and inference simultaneously, reached 100 million active users in January 2023, the fastest app in history. OpenAI predicts that ChatGPT will generate $200 million in revenue in 2023. The total AI market is estimated to be $207.9 billion in 2023 and is expected to grow to $1.84 trillion by 2030. Furthermore, the US AI market is expected to grow to $420.465 billion by 2025.
NVIDIA dominates the AI accelerator market, with its H100 AI GPU being deployed in data centers around the world. It is estimated that approximately 3.5 million H100 AI GPUs will be shipped in 2024, consuming approximately 13,000 GWh of electricity per year. NVIDIA is planning the next generation H200 AI GPU and B100 AI GPU based on Blackwell GPU architecture, which are expected to have more efficient power performance and improved AI performance. NVIDIA’s AI GPUs are expected to consume 13,000 GWh of electricity per year by 2024, equivalent to the electricity demand of countries such as Guatemala and Lithuania.
Given this situation, with RISC-V, Esperanto is aiming for a more sustainable and power-efficient computing solution. Efforts to provide energy-efficient AI acceleration alternatives have important societal implications for reducing power consumption across industries. The company is developing competitive AI accelerators that prioritize reducing power consumption while providing the computing power needed for advanced AI applications. Esperanto is already shipping it as an AI/HPC RISC-V platform.
Esperanto has developed a tool to help program complex AI chips. These tools are designed to transform trained models into optimized applications that span multiple chips and support major machine learning frameworks such as PyTorch and TensorFlow. This is particularly relevant to large-scale language models (LLM), which are common in generative AI. Looking ahead, Esperanto is already powering its architecture to accelerate next-generation generative AI large-scale language models and high-performance computing (HPC) workloads. Upcoming chip designs aim to address training tasks in addition to inference, using improved Minion and Maxion cores and RVV-compatible vector/tensor units to support 8-bit floating from 64-bit HPC workloads. It is planned to handle a wide range of workloads up to decimal point AI inference.