Model Optimization
Our neural network model optimizers apply a range of mathematically rigorous optimization techniques for structured network pruning, number system selection (floating point vs fixed-point vs. Posit), mixed-precision weight and activation quantization across network layers (from 16-bit down to 1-bit), fine-tuning approaches to improve the model efficiency and performance.
Low-Cost Deployment
Our optimization tools streamline the deployment of highly optimized neural network models on diverse hardware platforms, delivering significant cost savings. These optimized models feature low latency, high batch processing throughput, and exceptional output accuracies, ensuring that the cost per inference job on our backend platforms remains highly competitive.
Low Memory Footprint
Our compilation and mapping tools are tailored to address the inherent constraints of on-chip low memory resources and limited I/O bandwidth commonly found in many embedded computing platforms. This is accomplished through a suite of advanced neural network optimizations that leverage techniques such as singular value decomposition, low rank adaptation, and sampling.
Retargetable Compilation
Our versatile toolkit empowers us to efficiently optimize and implement a broad spectrum of neural network models, spanning from convolutional networks to transformers, in both compact and large-scale configurations, across a multitude of hardware platforms. These platforms include Xilinx/AMD and Intel FPGA devices, as well as embedded systems featuring RISC-V processors equipped with AI engines for executing essential tasks like convolution and matrix multiplication. The supported hardware platforms encompass a variety of systems with varying compute resources, memory capacities, and I/O bandwidth, alongside RISC-V based embedded systems equipped with AI engines for executing essential tasks like convolution and matrix multiplication.
Sustainable Inference
Our innovative solution reduces the computational resource requirements for conducting inference with custom-architected or client-supplied neural network models by substituting high-energy operations with more cost-effective alternatives. This results in significant energy savings and a reduced carbon footprint, promoting the sustainable advancement and deployment of neural network engines for a wide range of applications