Solution

Processors for EDGE computing suffer from:

Having limited processing speed
Consuming too much power
Poor utilization of silicon
Running high clock cycles

Feeding the 3-dimensional neural network data using a traditional instruction set makes the compiler intangible in terms of achieving utilization and there by low power and size.

Roviero has developed a natively graph computing processor for edge inference. CortiCore architecture provides the solution via its unique instruction set that dramatically reduces the compiler complexity.

The approach allows us to create a compiler that achieves >80% utilization with 16X reduced memory* on all neural networks – demonstrated on our FPGA platforms.

*compared to currently available solutions

You no longer have to settle for low power vs performance!

Watch five cameras running simultaneously
on our demo system.

Key Features

Internal Memory

Low internal memory requirement (min 256KB)
flexible tradeoff on performance and memory

External memory

Sleeps > 99% of the time
Low power: access one time per input frame

High Utilization

> 80% utilization for all types of model structures
Efficiently handle weight-stationary & Datastationar

Power Consumption

Achieves micro-Watt power when incumbents struggle with milli-Watts

Speed

Scalable from 0.1TOPS to 100TOPS
Runs at low clock-cycle- 10-30x better
Compiler designed to bring up networks efficiently
Support large input frame without down scaling

Confiquration

Flexibility to reconfiqure/extend to support current and future application models

The magic in Roviero’s NPE (Neural Processing Engine) happens in the software domain by what we call CortiSoft. Our compilers and software tools allow for the porting of any neural network to run on CortiOne hardware accelerator enabling a highly efficent compute of large data.

Scalability

Scalable RTL via parameters for performance and power
Number of ALU
Number of clusters
Activation memory size per cluster
DDR or No DDR – external memory
Internal system memory
External shared memory
Hardware confguration input to compiler

Additional Key Features

Any frameworks, any NN, any backbone
AI optimized instruction set – makes compiler possible
AI Data movement and compute-oriented instructions
>80% compute utilization
Highly parallel design – high performance at low frequency of operation

Implements sparse NN efficiently, reducing model size and compute requirement by >3x
All digital logic – implement in any process node
Very low host code support to run the AI processing job