Shubham Sahay - Authorea

Shubham Sahay

Public Documents 10

Ferroelectric FET based Bayesian Inference Engine for Disease Diagnosis

Arka Chakraborty

and 3 more

July 10, 2023

Probabilistic/stochastic computations form the backbone of autonomous systems and classifiers. Recently, biomedical applications of probabilistic computing such as hyperdimensional computing for DNA sequencing, Bayesian networks for disease diagnosis, etc. have attracted significant attention owing to their high energy efficiency. Bayesian inference is widely used for decision making based on independent (often conflicting) sources of information/evidence. A cascaded chain or tree structure of asynchronous circuit elements known as Muller C-Elements can effectively implement Bayesian inference. Such circuits utilize stochastic bit streams to encode input probabilities which enhances their robustness and fault tolerance. However, the CMOS implementation of Muller C-Element are bulky and energy hungry which restricts their widespread application in resource constrained IoT and mobile devices. To enable Bayesian inference based decision making in IoT devices such as UAVs, robots, space rovers, etc, for the first time, we propose a highly compact and energy-efficient implementation of Muller C-Element utilizing a single Ferroelectric FET. The proposed implementation exploits the unique drain-erase, program inhibit and drain inhibit characteristics of FeFETs to encode the output as the polarization state of the ferroelectric layer. Our extensive investigation utilizing an in-house developed experimentally calibrated compact model of FeFET reveals that the proposed C-Element consumes an ultra-low power of 1.07 fW. We also propose a novel read circuitry for realising a Bayesian inference engine by cascading a network of proposed FeFET based C-Elements for practical applications. Furthermore, for the first time, we analyze the impact of cross-correlation between the stochastic input bit streams on the accuracy of the C-Element based Bayesian inference implementations. For proof of concept demonstration, we utilize the proposed FeFET based Muller C-Element for performing breast cancer diagnosis utilizing Wisconsin data-set.

Capacitor-less 1T-DRAM as Synaptic Element for Online Learning

MD Yasir Bashir

and 2 more

November 09, 2023

In this work, for the first time, we have demonstrated the feasibility of 1T-DRAMs as synaptic elements with MLC, large dynamic range, appreciable linearity, ultra-low energy, high endurance, and large integration density. Our results may provide incentive for its experimental realization.Â

Exploiting Drain-Erase Scheme in Ferroelectric FETs for Logic-in-Memory

Musaib Rafiq

and 2 more

May 23, 2023

The conventional computing platforms based on von-Neumann architecture are highly space- and energy-intensive while handling the emerging applications such as AI, ML, and big data. To overcome the von Neumann bottleneck, compact and light-weight logic-inmemory implementations of Boolean logic gates based on emerging non-volatile memory (e-NVM) such as RRAMs, PCM, STT-MRAMs, etc., were proposed recently. However, these e-NVMs not only exhibit significant temporal and spatial variability, but their large-scale integration with CMOS process is also a technological challenge. To overcome these issues with the emerging non-volatile memories, Ferroelectric FETs based on CMOS-compatible doped Hafnium oxide with the capability of large-scale CMOS integration in the advanced logic nodes were proposed. Considering the high scalability and CMOS-compatibility of the FeFETs, in this work, for the first time, we propose a logic-inmemory implementation utilizing a single ferroelectric fullydepleted-silicon-on-insulator (Fe-FDSOI) FET exploiting the unique drain erase phenomenon. In our proposed logicin-memory implementation, inputs are applied at the gate and drain terminals using a novel input-to-voltage mapping scheme, and output is obtained as the current flowing through the Fe-FDSOI FET. We utilize an experimentally calibrated compact model of the ferroelectric capacitor connected to the baseline industry standard BSIM-IMG compact model for the FDSOI transistor for proof of concept demonstration. We also perform a comprehensive analysis of the performance metrics of the proposed logic-inmemory implementation. Our results indicate that we can realize at least 10 Boolean logic gates with high energy and area-efficiency utilizing the proposed scheme.

Efficient Implementation of Mahalanobis Distance on Ferroelectric FinFET Crossbar for...

Musaib Rafiq

and 2 more

June 20, 2023

The developments in the nascent field of artificial-intelligence-of-things (AIoT) relies heavily on the availability of high-quality multi-dimensional data. A huge amount of data is being collected in this era of big data, predominantly for AI/ML algorithms and emerging applications. Considering such voluminous quantities, the collected data may contain a substantial number of outliers which must be detected before utilizing them for data mining or computations. Therefore, outlier detection techniques such as Mahalanobis distance computation have gained significant popularity recently. Mahalanobis distance, the multivariate equivalent of the Euclidean distance, is used to detect the outliers in the correlated data accurately and finds widespread application in fault identification, data clustering, single-class classification, information security, data mining, etc. However, traditional CMOS-based approaches to compute Mahalanobis distance are bulky and consume a huge amount of energy. Therefore, there is an urgent need for a compact and energy-efficient implementation of an outlier detection technique which may be deployed on AIoT primitives, including wireless sensor nodes for in-situ outlier detection and generation of high-quality data. To this end, in this paper, for the first time, we have proposed an efficient Ferroelectric FinFET-based implementation for detecting outliers in correlated multivariate data using Mahalanobis distance. The proposed implementation utilizes two crossbar arrays of ferroelectric FinFETs to calculate the Mahalanobis distance and detect outliers in the popular Wisconsin breast cancer dataset using a novel inverter-based threshold circuit. Our implementation exhibits an accuracy of 94.1% which is comparable to the software implementations while consuming a significantly low energy (13.56 pJ)

Assessing the Performance of Reinforcement Learning on Passive RRAM Crossbar Array

Arjun Tyagi

and 1 more

April 05, 2023

Reinforcement learning is a promising approach that can allow machines to acquire knowledge and solve problems without the intervention of humans. However, the current implementation of reinforcement learning algorithms on standard complementary metal-oxide-semiconductor based platform constraints the performance due to von Neumann architecture, which leads to increased energy consumption and latency. To this end, in this work, we propose an extremely area- and energy-efficient implementation of Monte Carlo learning on passive resistive random access memory (RRAM) crossbar array considering the non-ideal hardware artifacts such as device-to-device variation, noise and endurance failure. To illustrate the capabilities of our implementation, we considered the classical control problem of cart-pole. Our results indicate that the proposed passive RRAM crossbar-based implementation of Monte Carlo learning not only outperforms prior digital and active 1 Transistor - 1 RRAM (1T1R) crossbar-based implementation by more than five orders of magnitude in terms of area but is also robust against spatial and temporal variations and endurance failure of RRAM devices.

Ferroelectric FET Based Time-Mode Multiply-Accumulate Accelerator: Design and Analysi...

Tanveer kaur

and 4 more

November 28, 2022

General-purpose multiply-accumulate (MAC) accelerators have become inevitable in the IoT edge devices for performing computationally intensive tasks such as deep learning, signal processing, combinatorial optimization, etc. The throughput and the energy-efficiency of the conventional digital processors and MAC accelerators are limited due to their sparse design owing to the von-Neumann architecture. Although mixed-signal time-mode MAC accelerators utilizing emerging non-volatile memories appear promising owing to their ability to perform in-memory MAC operation via the physical laws, their application is limited due to their incompatibility and complex integration with the CMOS-process, high sensitivity to process variations, large operating voltage/cell currents, etc. To mitigate these issues, in this work, we propose a time-mode MAC accelerator based on ferroelectric-FinFETs with CMOS-compatible doped-HfO2 in the gate stack. Our rigorous analysis reveals a trade-off between the performance metrics such as computational precision, area- and energy-efficiency of the proposed MAC accelerator. Therefore, we provide the necessary design guidelines to further optimize the performance. Extensive design space exploration and simulations exploiting an experimentally calibrated compact model for the doped HfO2 ferroelectric capacitor along with 7 nm-technology PDK from ARM (ASAP) indicates that the proposed MAC accelerator exhibits a record energy-efficiency of 2.612 PetaOperations/Joule , a considerably high area-efficiency of 88.5 bits/µm2 (including I/O peripheral circuitry) , and a throughput of 4.6 TeraOps/s while supporting a 4-bit MAC operation for a square weight matrix of size 200×200 which is sufficient for realistic inference tasks.

Satisfiability Attack-resilient Camouflaged Multiple Multivariable Logic-in-Memory Ex...

Shubham Sahay

and 2 more

November 11, 2022

Logic-in-memory implementations have attracted significant attention recently for energy efficient in-situ processing of big data in this era of IoT. However, the emerging memory technologies such as RRAMs, PCMs, STT-MRAMs, etc. are still immature and exhibit significant spatial and temporal variations limiting the yield and the size of crossbar arrays available for implementing logic functions. Considering the technological maturity, ultra-high density and ultra-low cost of 3D NAND flash memory, in this work, we have proposed a novel methodology to exploit 3D NAND flash memory for realizing any logic function in sum-of-product form (SOP) with ≤177 literals/inputs and ≤214 minterms parallelly. Moreover, all the logic functions realized using the proposed technique appear same at the layout level rendering the logic-in-memory implementation utilizing the 3D NAND flash memory an innate camouflaging property and an inherent immunity against security vulnerabilities in the semiconductor supply chain. We have also evaluated the resiliency of the proposed technique against reverse engineering attacks such as SAT attacks, ATPG attacks and brute force attacks on ISCAS’85 and ISCAS’89 benchmark circuits. Our results indicate that the proposed logic-in-memory implementation facilitates complete obfuscation of the logic function without introducing any area overhead and exhibits a strong resiliency against reverse engineering.

Energy-Efficient Implementation of Generative Adversarial Networks on Passive RRAM Cr...

Shubham Sahay

and 2 more

April 19, 2022

Generative algorithms such as GANs are at the cusp of next revolution in the field of unsupervised learning and large-scale artificial data generation. However, the adversarial (competitive) co-training of the discriminative and generative networks in GAN makes them computationally intensive and hinders their deployment on the resource-constrained IoT edge devices. Moreover, the frequent data transfer between the discriminative and generative networks during training significantly degrades the efficacy of the von-Neumann GAN accelerators such as those based on GPU and FPGA. Therefore, there is an urgent need for development of ultra-compact and energy-efficient hardware accelerators for GANs. To this end, in this work, we propose to exploit the passive RRAM crossbar arrays for performing key operations of a fully-connected GAN: (a) true random noise generation for the generator network, (b) vector-by-matrix-multiplication with unprecedented energy-efficiency during the forward pass and backward propagation and (C) in-situ adversarial training using a hardware friendly Manhattan’s rule. Our extensive analysis utilizing an experimentally calibrated phenomological model for passive RRAM crossbar array reveals an unforeseen trade-off between the accuracy and the energy dissipated while training the GAN network with different noise inputs to the generator. Furthermore, our results indicate that the spatial and temporal variations and true random noise, which are otherwise undesirable for memory application, boost the energy-efficiency of the GAN implementation on passive RRAM crossbar arrays without degrading its accuracy.

Ultra-Compact Neural Network ADC Exploiting Ferroelectric FET

Ayan Banerjee

and 3 more

February 11, 2022

Development of ultra-compact, low-to-medium precision analog-to-digital converters (ADCs) with unprecedented energy-efficiency is essential to meet the ever-increasing demand for data converters in advanced computing systems including neuromorphic accelerators based on emerging non-volatile memories. To this end, in this work, for the first time, we propose a feedforward neural network ADC based on a network of highly scalable, CMOS-compatible, and energy-efficient ferroelectric-FinFET (Fe-FinFET) synaptic elements. Our lower triangular neural network (LTNN) ADC design, implemented using 7-nm technology from ARM along with an experimentally calibrated compact model for Fe-FinFETs, consumes 5.44 μW of power, 1.03 μm2 of area while operating at a speed of 1.23 megasamples per second for 4-bit precision. The proposed neural network ADC may pave the way for realization of highly efficient neuromorphic processing engines and neuro-optimizers based on cross-point array of emerging non-volatile memories.

A Computationally Efficient Compact Model for Ferroelectric FinFETs Switching with As...

Shubham Sahay

and 4 more

February 22, 2022

In this paper, we develop a Verilog-A implementable compact model for the dynamic switching of ferroelectric Fin-FETs (Fe-FinFETs) for asymmetric non-periodic input signals. We use the multi-domain Preisach Model to capture the saturated P-E loop of the ferroelectric capacitors. In addition to the saturation loop, we model the history dependent minor loop paths in the P-E by tracing input signals’ turning points. To capture the input signals’ turning points, we propose an R-C circuit based approach in this work. We calibrate our proposed model with the experimental data, and it accurately captures the history effect and minor loop paths of the ferroelectric capacitor. Furthermore, the elimination of storage of each turning point makes the proposed model computationally efficient compared with the previous implementations. We also demonstrate the unique electrical characteristics of Fe-FinFETs by integrating the developed compact model of Fe-Cap with the BSIM-CMG model of 7nm FinFET.