DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

03/31/2023
by   Angelo Garofalo, et al.
0

On-chip DNN inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a System-on-Chip with a heterogeneous cluster of 8 RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost performance and efficiency on key compute-intensive Deep Neural Network (DNN) kernels, the cluster is enriched with three digital accelerators: a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); a minimal overhead datamover to marshal 1-b to 32-b data on-the-fly; a 16-b floating point Tensor Product Engine (TPE) for tiled matrix-multiplication acceleration. DARKSIDE is implemented in 65nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency - enough to enable on-chip floating-point training at competitive speed coupled with ultra-low power quantized inference.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro