Research Projects
[Jul. 2023] AutoDCIM, Automated Digital CIM Macro Compiler.
- I worked with ACCESS and developed AutoDCIM, the first automated digital CIM (DCIM) macro compiler. AutoDCIM takes the user specifications as inputs and generates a DCIM macro architecture with an optimized layout. With the growing interest in the DCIM field, AutoDCIM will play an important role in agile DCIM implementation and developing an ecosystem for DCIM-based AI computing.
- AutoDCIM: An Automated Digital CIM Compiler (DAC’23)
[Feb. 2023] Scaling-out Reconfigurable Digital CIM.
- I designed two 28nm chips that scale out the AI capability based on Reconfigurable Digital Computing-In-Memory (CIM). TensorCIM is the first CIM processor for tensor computing in a Multi-Chip-Module system. MulTCIM is the first CIM accelerator for the emerging multimodal Transformer models, which leverages attention-token-bit hybrid sparsity to improve energy efficiency.
- TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-based Beyond-NN Acceleration (ISSCC’23)
- MulTCIM: A 28nm 2.24$\mu$J/Token Attention-Token-Bit Hybrid Sparse Digital CIM-based Accelerator for Multimodal Transformers (ISSCC’23, extended to JSSC’24) )
[Feb. 2022] Reconfigurable Digital Computing-In-Memory AI Chip.
- I designed an innovative AI chip architecture, Reconfigurable Digital Computing-In-Memory. The architecture fuses the philosophy of reconfigurable computing and digital computing-in-memory, balancing efficiency, accuracy, and flexibility for emerging AI chips. I designed two 28nm chips based on the new architecture, Reconfigurable Digital CIM (ReDCIM) and Transformer CIM (TranCIM). ReDCIM (pronounced as “red-CIM”) is the first CIM chip for cloud AI with flexible FP/INT support, which was covered by Synced. TranCIM is the first CIM chip for Transformer models, which tackles the memory and computation challenges raised by Transformer’s attention mechanism.
- ReDCIM: A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise in-Memory Booth Multiplication for Cloud Deep Learning Acceleration (ISSCC’22, extended to JSSC’23)
- ReDCIM was awarded the 2023 Top-10 Research Advances in China Semiconductors, which was featured by ACCESS and HKUST SENG News.
- TranCIM: A 28nm 15.59$\mu$J/Token Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes (ISSCC’22, extended to JSSC’23)
[Aug. 2020] Evolver, Evolvable AI Chip.
- Evolver: A Deep Learning Processor with On-Device Quantization-Voltage-Frequency Tuning (JSSC’21)
- I designed a 28nm evolvable AI chip (Evolver) with DNN training and reinforcement learning capabilities, to enable intelligence evolution during the chip’s long lifetime. This work demonstrates a lifelong learning example of on-device quantization-voltage-frequency (QVF) tuning. Compared with conventional QVF tuning that determines policies offline, Evolver makes optimal customizations for varying local user scenarios.
- Evolver won the Nomination Award for 2021 Top-10 Research Advances in China Semiconductors.
[Jun. 2018] RANA, Software-Hardware Co-design for AI Chip Memory Optimization.
- RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM (ISCA’18)
- I designed a retention-aware neural acceleration (RANA) framework, which strengthens DNN accelerators with refresh-optimized eDRAM to save total system energy.
- RANA was the only work first-authored by a Chinese research team in ISCA’18, which was covered by Tsinghua University News and AI Tech Talk.
[Jan. 2016] Neural Networks on Silicon.
- I’m collecting works on neural network accelerators and related topics, in a GitHub project named Neural Networks on Silicon. It has attracted many researchers all around the world.
[Apr. 2017] Thinker and DNA, Reconfigurable AI Chip.
- DNA: Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns (TVLSI No.5/2/6/8/8/9 Downloaded Manuscripts in 2017~2022, 6 Times Monthly No.1 Popular Article)
- I designed a deep convolutional neural network accelerator (DNA) targeting flexible and efficient CNN acceleration. This is the first work to assign Input/Output/Weight Reuse to different layers of a CNN, which optimizes system-level energy consumption based on different CONV parameters.
- Thinker: A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications (JSSC’18)
- A reconfigurable multi-modal neural network processor (Thinker) was fabricated based on the DNA architecture, supporting CNN, RNN, and FCN.
- The 65nm Thinker chip was exhibited at the 2016 National Mass Innovation and Entrepreneurship Week, as a representative work from Tsinghua University. The Thinker chip was highly praised by Chinese Premier Li Keqiang, and featured by Yang Lan One on One, AI Tech Talk and MIT Technology Review. It won the ISLPED’17 Design Contest Award, which was the first time for a China Mainland team to win the award.
[Oct. 2014] RNA, Reconfigurable Architecture for Neural Approximation.
- RNA: Reconfigurable Architecture for Neural Approximation in Multimedia Computing (TCSVT’19)
- I designed a reconfigurable neural accelerator (RNA) to process multi-layer perceptron (MLP) for neural approximation. By approximating the core kernels in a program with MLP, RNA achieves higher performance and efficiency with negligible accuracy loss.