

#### **3D ADVANCED TECHNOLOGIES FOR NEUROMORPHIC ARCHITECTURES – HUGHES METRAS - CEA/LETI**

Alexandre Valentian, Pascal Vivet, Severine Cheramy, Amandine Jouve, Perrine Batude

### LETI'S TECHNOLOGICAL: SILICON PLATFORM



World-class facilities for your future business needs







Neuro-Inspired March.26-29th.2019 Neuro-Inspired Computational Elements Workshop



#### WHY GOING 3D?

#### RETINAL NEURONS AND NEUROMORPHIC VISION CHIPS



#### **BIOLOGICAL RETINA**

The cells in the retina, which are interconnected, extract information from the visual field by engaging in a complex web of excitatory (one-way arrows), inhibitory (circles on a stick), and conductive or bidirectional (two-way arrows) signaling. This circuitry generates the selective responses of the four types of ganglion cells (at bottom) that make up 90 percent of the optic nerve's fibers, which convey visual information to the brain. On (green) and Off (red) ganglion cells elevate their firing (spike) rates when the local light intensity is brighter or darker than the surrounding region. Inc (blue) and Dec (yellow) ganglion cells spike when the intensity is increasing or decreasing, respectively.

#### SILICON RETINA

Neuromorphic circuits emulate the complex interactions that occur among the various retinal cell types by replacing each cell's axons and dendrites (signal pathways) with metal wires and each synapse with a transistor. Permutations of this arrangement produce excitatory and inhibitory interactions that mimic similar communications among neurons. The transistors and the wires that connect them are laid out on silicon chips. Various regions of the chip surface perform the functions of the different cell layers. The large green squares are phototransistors, which transduce light into electricity.



SILICON CHIP DETAIL

Source : Scientific American

#### **BIOLOGICAL INSPIRED NEURONES USING OXRAM**





- Classification of handwritten
   numbers
- Small resolution image
  - 12\*12 pixels

leti

- Fully-connected network
  - 10 neurones : 1 neurone / class
  - 144 synapses

- Si Real Estate: 1,8 mm<sup>2</sup>
- Clock frequency: 50 MHz
- 10 neurones
- 10\*144 synapses = 11,5 kOxRAMs

# → NEED FOR BETTER INTEGRATION /3D → Relative cost of Oxrams vs Neurones



#### **VISION PROCESSING**

#### Neural network for vision processing

- A SNN layer is often a 2D structure
- Image recognition applications need at least two layers
- This becomes inherently a 3D structure

#### It lends itself well to a 3D implementation

Logical layers are mapped to physical tiers

#### Two distinct building blocks

- A Silicon Retina
- CMOS image sensor tier, with a 256x192 resolution
- Pre-processing tier, which generates spiking events corresponding to changes in pixel intensity
- A Neural processor
- 'Layer 1' extracts *features*: horizontal, vertical, diagonal segments ...
- Layer 2' combines those *features* to extract complex shapes: leads to object recognition





#### **RETINE : A 3D CIRCUIT SMART-RETINA**

#### **Computer Vision applications**



#### 3D-stack : Image Sensor & Processor

- SIMD fully programmable accelerator
- Heterogenous technology, High Sensitivity
- Technology ALTIS 130nm
- CuCu Hybrid Bonding stacking



CuCu, pitch 7µm





- ALTIS 130 nm
- Die size : 17 mm x 11 mm
- Layer 1 image sensor :
  - fill factor > 70 %
  - 🛑 pixel 12 μm
  - 🛑 pixel dynamic 1 to 8 bit
  - > 1000 FPS @ 192x256
  - > 60 FPS @ 768x1024
- Layer 2 parallel processor :
  - SIMD matrix of 3072 ALUs (16 x 12 x 16 ALUs)
  - Computing power : 161 GOPS
  - **—** Target : 100 MHz 175 Mhz 210 MHz
  - 72 kB distributed memory + 96 kB shared RAM memory

[S.Chevobbe, L. Millet, C. Andriamisaina, M. Duranton, D43D'15]



image sensor

RETINE - Layer 2 :



parallel processor



Bottom d layer 1

IO TSV

Top die layer 2

Ň

IO TSV

Controler Layer 2

Controler Layer 1

Switcher 1

#### **3D NEURAL NETWORK CIRCUIT**

#### **Neural Networks**

Classically divided in two layers of computation Difficult to implement in 2D, due to high congestions Very well adapted to 3D : one neuron layer per die !



12345...nLayer 2

Memory

Compared to 2D, 3D offers : 2x better total area 25% better in power





|                         | 2D circuit | 3D circuit | Gain |
|-------------------------|------------|------------|------|
| Critical path (ns)      | 9          | 6.6        | -26% |
| Power (mW)              | 430        | 350        | -17% |
| Area (mm <sup>2</sup> ) | 7.9        | 3.6        | -54% |
| Wires (m)               | 19.9       | 15.6       | -21% |

[B. Belhadj, R. Heliot, A. Valentian, P. Vivet, CASSES'2014]

More layers ? Tighter integration of Neuron, Memory, and NVM?



## **3D** integration

Towards High Density Interconnects



## **3D PARTITIONNING LEVELS**



#### **NEW ARCHITECTURES:**

Partitioning Alternative to scaling Shorter Interconnects

#### **DRIVERS**:

Form Factor Yield / Cost Perf / Interconnect / Density Heterogeneity

### **3D INTEGRATION:** A TOOL BOX FOR NOVEL COMPUTING ARCHITECTURES

Chiples

**Chiplet concept: Memory proximity – high bandwith** 

Chipler

Intact

nterposer

Package

NANOELEC.

# Interposer for specialization:

leti

- System-in Package, Silicon
   (Passive or active)
   photonic
- Heterogeneous integration enablement
- Application specific

# Integration for high performance:

- Scale-out
- Many-core architecture

#### **Chiplet for low cost:**

- Small to medium size chips (1 cm<sup>2</sup> max)
- Advanced technology node
- Generic
- High volume

#### **TSV TECHNOLOGIES**

Smaller diameters







# Leti3D INTEGRATION:LetiHIGH DENSITY HYBRID BONDING





#### MAIN CHALLENGES

Wafer to wafer

Towards smaller pitch & multi layers

Die To Wafer

Single Die Handling

Mid Term Self Assembly

Increasing throughput

NICE 2019 / Albany / Hughes Metras – Cea Leti





#### 3D SEQUENTIAL INTEGRATION: COOLCUBE...



### 🔆 CHALLENGES

- Process Flow Validation
  - Low Temperature Epitaxy
  - Low K Spacer

- Cost Analysis
- Design Flow and Tools

#### ... A LARGE RANGE OF APPLICATIONS







#### PERSPECTIVES



### **EXPLORING THE VALUE OF**

• STACKING TWO DOUBLE LAYERS OF LOGIC + RRAM STACKS...





#### DEEP NEURAL NETWORKS, AI ALGORITHMS

- Are already supported by 3D technologies
- Digital Architrectures using Interposer and HBM memories

#### NEED TO GO FURTHER

CONCLUSION

erc

my-CUBE

to pursue integration and reduce power consumption for embedded applications

- Start with biological inspired systems
- Analog circuitry
- Non volatile memories
- High density 3D  $\rightarrow$  Hybrid bonding fine pitch and/or Monolithic Integration

## Interaction Architecture, design, technology





67GELOPS/W (EP16





leti

Ceatech