Books
Survey Articles
- [JOS’24] B. Yang, J. Chen, F. Tu, “Towards Efficient Generative AI and Beyond-AI Computing: New Trends on ISSCC 2024 Machine Learning Accelerators,” Journal of Semiconductors (JOS), 2024. (Invited Paper)
- [TCAS-I’23] S. Wei, X. Lin, F. Tu, Y. Wang, L. Liu, S. Yin, “Reconfigurability, Why It Matters in AI Tasks Processing: a Survey of Reconfigurable AI Chips,” IEEE Transactions on Circuits and Systems I (TCAS-I), 2023.
Journal Papers
- [JSSC’24] F. Tu, Z. Wu, Y. Wang, W. Wu, L. Liu, Y. Hu, S. Wei, S. Yin, “MulTCIM: Digital Computing-In-Memory-based Multimodal Transformer Accelerator with Attention-Token-Bit Hybrid Sparsity,” IEEE Journal of Solid-State Circuits (JSSC), 2024. (Invited Paper, ISSCC’23 Extension)
- [JSSC’24] Y. Wang, Z. Wu, W. Wu, L. Liu, Y. Hu, S. Wei, F. Tu, S. Yin, “TensorCIM: Digital Computing-In-Memory Tensor Processor with Multi-Chip-Module-based Architecture for Beyond-NN Acceleration,” IEEE Journal of Solid-State Circuits (JSSC), 2024. (Co-Corresponding Author, ISSCC’23 Extension)
- [JSSC’23] F. Tu, Y. Wang, Z. Wu, L. Liang, Y. Ding, B. Kim, L. Liu, S. Wei, Y. Xie, S. Yin, “ReDCIM: Reconfigurable Digital Computing-In-Memory Processor with Unified FP/INT Pipeline for Cloud AI Acceleration,” IEEE Journal of Solid-State Circuits (JSSC), 2023. (Invited Paper, ISSCC’22 Extension, 2023 Top-10 Research Advances in China Semiconductors)
- [JSSC’23] F. Tu, Z. Wu, Y. Wang, L. Liang, L. Liu, Y. Ding, L. Liu, S. Wei, Y. Xie, S. Yin, “TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes,” IEEE Journal of Solid-State Circuits (JSSC), 2023. (ISSCC’22 Extension)
- [TCAS-I’23] Y. Wang, F. Tu, L. Liu, S. Wei, Y. Xie, S. Yin, “SPCIM: Sparsity-Balanced Practical CIM Accelerator with Optimized Spatial-Temporal Multi-Macro Utilization,” IEEE Transactions on Circuits and Systems I (TCAS-I), 2023. (Co-First Author)
- [TCAD’23] F. Tu, Y. Wang, L. Liang, Y. Ding, L. Liu, S. Wei, S. Yin, Y. Xie, “SDP: Co-Designing Algorithm, Dataflow, and Architecture for in-SRAM Sparse NN Acceleration,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2023. (Co-First Author)
- [JSSC’21] F. Tu, W. Wu, Y. Wang, H. Chen, F. Xiong, M. Shi, N. Li, J. Deng, T. Chen, L. Liu, S. Wei, Y. Xie, S. Yin, “Evolver: A Deep Learning Processor with On-Device Quantization-Voltage-Frequency Tuning,” IEEE Journal of Solid-State Circuits (JSSC), 2021. (Nomination Award for 2021 Top-10 Research Advances in China Semiconductors)
- [TCSVT’19] F. Tu, S. Yin, P. Ouyang, L. Liu, S. Wei, “Reconfigurable Architecture for Neural Approximation in Multimedia Computing,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2019.
- [JSSC’18] S. Yin, P. Ouyang, S. Tang, F. Tu, X. Li, S. Zheng, T. Lu, J. Gu, L. Liu, S. Wei, “A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications,” IEEE Journal of Solid-State Circuits (JSSC), 2018. (VLSI’17 Extension)
- [TVLSI’17] F. Tu, S. Yin, P. Ouyang, S. Tang, L. Liu, S. Wei, “Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns,” IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 2017. (TVLSI No.5/2/6/8/8/9 Downloaded Manuscripts in 2017~2022, 6 Times Monthly No.1 Popular Article)
Conference Papers
- [VLSI’24] Y. Wang, Z. He, C. Zhao, Z. Wu, M. Gao, H. Han, S. Wei, Y. Hu, F. Tu, S. Yin, “ETCIM: An Error-Tolerant Digital-CIM Processor with Redundancy-Free Repair and Run-Time MAC and Cell Error Correction,” Symposium on VLSI Technology and Circuits (VLSI), Hawaii, USA, 2024. (Co-Corresponding Author)
- [ISSCC’24] R. Guo, L. Wang, X. Chen, H. Sun, Z. Yue, Y. Qin, H. Han, Y. Wang, F. Tu, S. Wei, Y. Hu, S. Yin, “A 28nm 74.34TFLOPS/W BF16 Heterogeneous CIM-Based Accelerator Exploiting Denoising-Similarity for Diffusion Models,” International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2024.
- [ISSCC’24] Z. Yue, X. Xiang, F. Tu, Y. Wang, Y. Wang, S. Wei, Y. Hu, S. Yin, “A 0.795fJ/b Physically-Unclonable-Function-Protected TCAM for Software-Defined Networking Switch,” International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2024.
- [ISCA’23] S. Li, F. Tu, L. Liu, J. Lin, Z. Wang, Y. Kang, Y. Ding, Y. Xie, “ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing Architecture for Extreme Classification,” International Symposium on Computer Architecture (ISCA), Orlando, USA, 2023. (Acceptance Rate: 21.2% = 79/372)
- [DAC’23] J. Chen, F. Tu, K. Shao, F. Tian, X. Huo, C.-Y. Tsui, K.-T. Cheng, “AutoDCIM: An Automated Digital CIM Compiler,” Design Automation Conference (DAC), San Francisco, USA, 2023. (Corresponding Author, Acceptance Rate: 23%)
- [ISSCC’23] F. Tu, Y. Wang, Z. Wu, W. Wu, L. Liu, Y. Hu, S. Wei, S. Yin, “TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-based Beyond-NN Acceleration,” International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2023.
- [ISSCC’23] F. Tu, Z. Wu, Y. Wang, W. Wu, L. Liu, Y. Hu, S. Wei, S. Yin, “MulTCIM: A 28nm 2.24$\mu$J/Token Attention-Token-Bit Hybrid Sparse Digital CIM-based Accelerator for Multimodal Transformers,” International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2023. (Highlight Paper)
- [ISSCC’22] F. Tu, Y. Wang, Z. Wu, L. Liang, Y. Ding, B. Kim, L. Liu, S. Wei, Y. Xie, S. Yin, “A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise in-Memory Booth Multiplication for Cloud Deep Learning Acceleration,” International Solid-State Circuits Conference (ISSCC), 2022. (Highlight Paper)
- [ISSCC’22] F. Tu, Z. Wu, Y. Wang, L. Liang, L. Liu, Y. Ding, L. Liu, S. Wei, Y. Xie, S. Yin, “A 28nm 15.59$\mu$J/Token Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes,” International Solid-State Circuits Conference (ISSCC), 2022.
- [HotChips’20] F. Tu, W. Wu, Y. Wang, H. Chen, F. Xiong, M. Shi, N. Li, J. Deng, T. Chen, L. Liu, S. Wei, S. Yin, “ELearn: Edge Learning Processor with Bidirectional Speculation and Sparsity & Mixed-Precision aware Dataflow Parallelism Reconfiguration,” Hot Chips, 2020.
- [ISCA’18] F. Tu, W. Wu, S. Yin, L. Liu, S. Wei, “RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM,” International Symposium on Computer Architecture (ISCA), Los Angeles, USA, 2018. (Acceptance Rate: 16.9% = 64/378)
- [VLSI’17] S. Yin, P. Ouyang, S. Tang, F. Tu, L. Liu, S. Wei, “A 1.06-to-5.09 TOPS/W Reconfigurable Hybrid-Neural-Network Processor for Deep Learning Applications,” Symposia on VLSI Technology and Circuits (VLSI Symposia), Kyoto, Japan, 2017.
- [DATE’15] F. Tu, S. Yin, P. Ouyang, L. Liu, S. Wei, “RNA: A Reconfigurable Architecture for Hardware Neural Acceleration,” Design, Automation and Test in Europe Conference (DATE), Grenoble, France, 2015. (Acceptance Rate: 22%)
- [ISCAS’15] F. Tu, S. Yin, P. Ouyang, L. Liu, S. Wei, “Neural Approximating Architecture Targeting Multiple Application Domains,” IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, Portugal, 2015.
Featured Collaborative Papers
- [MICRO’24] M. Wang, I. McInerney, B. Stellato, F. Tu, S. Boyd, K.-H. So, K.-T. Cheng, “Multi-Issue Butterfly Architecture for Sparse Convex Quadratic Programming,” IEEE/ACM International Symposium on Microarchitecture (MICRO), Austin, USA, 2024.
- [JSSC’24] Z. Yue, Y. Wang, H. Wang, R. Guo, F. Tu, J. Yang, S. Wei, Y. Hu, S. Yin, “CV-CIM: A Hybrid Domain XOR-derived Similarity-aware Computation-in-memory Supporting Cost Volume Construction,” IEEE Journal of Solid-State Circuits (JSSC), 2024.
- [ISCA’24] Z. Yue, H. Wang, J. Fang, J. Deng, G. Lu, F. Tu, R. Guo, Y. Li, Y. Qin, Y. Wang, C. Li, H. Han, S. Wei, Y. Hu, S. Yin, “Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture,” International Symposium on Computer Architecture (ISCA), Buenos Aires, Argentina, 2024. (Acceptance Rate: 12.8% = 54/423, Covered by SemiInsights)
- [VLSI’24] R. Guo, X. Chen, L. Wang, F. Tu, S. Wei, Y. Hu, S. Yin, “A 28nm 4170-TFLOPS/W/b and 195-TFLOPS/mm2/b Multiply-Free Fully-Digital Floating-Point Compute-In-Memory Macro with Mitchell’s Approximation,” Symposium on VLSI Technology and Circuits (VLSI), Hawaii, USA, 2024.
- [SCIS’24] W. Wu, F. Tu, X. Li, S. Wei, S. Yin, “SWG: An Architecture for Sparse Weight Gradient Computation,” Science China Information Sciences (SCIS), 2024.
- [TCAS-I’24] X. Zhao, L. Chang, D. Fan, Z. Hu, T. Yue, F. Tu, J. Zhou, “HDSuper: High-Quality and High Computational Utilization Edge Super-Resolution Accelerator With Hardware-Algorithm Co-Design Techniques,” IEEE Transactions on Circuits and Systems I (TCAS-I), 2024.
- [TCAD’24] J. Zhou, J. Wu, Y. Gao, Y. Ding, C. Tao, B. Li, F. Tu, K.-T. Cheng, K.-H. So, N. Wong, “DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2024.
- [MICRO’23] J. Deng, X. Tang, J. Zhang, Y. Li, L. Zhang, B. Han, H. He, F. Tu, L. Liu, S. Wei, Y. Hu, S. Yin, “Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane,” IEEE/ACM International Symposium on Microarchitecture (MICRO), Toronto, Canada, 2023. (Acceptance Rate: 23.8% = 101/424)
- [DAC’23] Y. Zhu, Z. Zhu, G. Dai, F. Tu, H. Sun, K.-T. Cheng, H. Yang, Y. Wang, “PIM-HLS: An Automatic Hardware Generation and Scheduling for Processing-in-Memory-based Neural Network Accelerators,” Design Automation Conference (DAC), San Francisco, USA, 2023. (Acceptance Rate: 23%)
- [TCAS-I’23] W. Wu, F. Tu, M. Niu, Z. Yue, L. Liu, S. Wei, X. Li, Y. Hu, S. Yin, “STAR: An STGCN ARchitecture for Skeleton-based Human Action Recognition,” IEEE Transactions on Circuits and Systems I (TCAS-I), 2023.
- [TC’22] L. Liu, Z. Qu, Z. Chen, F. Tu, Y. Ding, Y. Xie, “Dynamic Sparse Attention for Scalable Transformer Acceleration,” IEEE Transactions on Computers (TC), 2022.
- [TCAS-I’22] J. Yang, F. Tu, Y. Li, Y. Wang, L. Liu, S. Wei, S. Yin, “GQNA: Generic Quantized DNN Accelerator With Weight-Repetition-Aware Activation Aggregating,” IEEE Transactions on Circuits and Systems I (TCAS-I), 2022.
- [ISCA’22] J. Lin, L. Liang, Z. Qu, I. Ahmad, L. Liu, F. Tu, T. Gupta, Y. Ding, Y. Xie, “INSPIRE: In-Storage Private Information Retrieval via Protocol and Architecture Co-design,” International Symposium on Computer Architecture (ISCA), New York City, USA, 2022. (Acceptance Rate: 16.8% = 67/400)
- [DAC’22] H. Lin, M. Yan, D. Wang, M. Zou, F. Tu, X. Ye, D. Fan, Y. Xie, “Alleviating Datapath Conflicts and Design Centralization in Graph Analytics Acceleration,” Design Automation Conference (DAC), San Francisco, USA, 2022. (Acceptance Rate: 23%)
- [TCAD’22] L. Liang, Z. Qu, Z. Chen, F. Tu, Y. Wu, L. Deng, G. Li, P. Li, Y. Xie, “H2Learn: High-Efficiency Learning Accelerator for High-Accuracy Spiking Neural Networks,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2022.
- [ASPLOS’22] Z. Qu, L. Liu, F. Tu, Z. Chen, Y. Ding, Y. Xie, “DOTA: Detect and Omit Weak Attentions for Scalable Transformer Acceleration,” ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Lausanne, Switzerland, 2022. (Acceptance Rate: 20.2% = 80/397)
- [ICCAD’21] H. Amrouch, J. Chen, K. Roy, Y. Xie, I. Chakraborty, W. Huangfu, L. Liang, F. Tu, C. Wang, M. Yayla, “Brain-Inspired Computing: Adventure from Beyond CMOS Technologies to Beyond von Neumann Architectures,” International Conference on Computer Aided Design (ICCAD), 2021. (Invited Paper)
- [DAC’21] X. Lin, L. Sun, F. Tu, L. Liu, X. Li, S. Wei, S. Yin, “ADROIT: An Adaptive Dynamic Refresh Optimization Framework for DRAM Energy Saving in DNN Training,” Design Automation Conference (DAC), 2021. (Acceptance Rate: 23%)
- [MICRO’20] L. Liu, Z. Qu, L. Deng, F. Tu, S. Li, X. Hu, Z. Gu, Y. Ding, Y. Xie, “DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture,” IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020. (Acceptance Rate: 15.6% = 66/422)
- [DAC’20] F. Xiong, F. Tu, M. Shi, Y. Wang, L. Liu, S. Wei, S. Yin, “STC: Significance-Aware Transform-Based Codec Framework for External Memory Access Reduction,” Design Automation Conference (DAC), San Francisco, USA, 2020. (Acceptance Rate: 23%)
- [ISVLSI’19] F. Xiong, F. Tu, S. Yin, S. Wei, “Towards Efficient Compact Network Training on Edge-Devices,” IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Miami, USA, 2019. (Invited Paper)
- [TPDS’18] S. Yin, S. Tang, X. Lin, P. Ouyang, F. Tu, L. Liu, J. Zhao, C. Xu, S. Li, Y. Xie, S. Wei, “Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory,” IEEE Transactions on Parallel and Distributed Systems (TPDS), 2018.
- [TCAD’18] J. Yan, S. Yin, F. Tu, L. Liu, S. Wei, “GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2018.
- [TCAD’18] S. Yin, S. Tang, X. Lin, P. Ouyang, F. Tu, L. Liu, S. Wei, “A High Throughput Acceleration for Hybrid Neural Networks with Efficient Resource Management on FPGA,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2018.
- [DAC’18] X. Lin, S. Yin, F. Tu, L. Liu, X. Li, S. Wei, “LCP: A Layer Clusters Paralleling Mapping Method for Accelerating Inception and Residual Networks on FPGA,” Design Automation Conference (DAC), San Francisco, USA, 2018. (Acceptance Rate: 24.3%)