Arka Chakraborty

and 3 more

Probabilistic/stochastic computations form the backbone of autonomous systems and classifiers. Recently, biomedical applications of probabilistic computing such as hyperdimensional computing for DNA sequencing, Bayesian networks for disease diagnosis, etc. have attracted significant attention owing to their high energy efficiency. Bayesian inference is  widely used for decision making based on independent (often conflicting) sources of information/evidence. A cascaded chain or tree structure of asynchronous circuit elements known as Muller C-Elements can effectively implement Bayesian inference. Such circuits utilize stochastic bit streams to encode input probabilities which enhances their robustness and fault tolerance. However, the CMOS implementation of Muller C-Element are bulky and energy hungry which restricts their widespread application in resource constrained IoT and mobile devices. To enable Bayesian inference based decision making in IoT devices such as UAVs, robots, space rovers, etc, for the first time, we propose a highly compact and energy-efficient implementation of Muller C-Element utilizing a single Ferroelectric FET. The proposed implementation exploits the unique drain-erase, program inhibit and drain inhibit characteristics of FeFETs to encode the output as the polarization state of the ferroelectric layer. Our extensive investigation utilizing an in-house developed experimentally calibrated compact model of FeFET reveals that the proposed C-Element consumes an ultra-low power of 1.07 fW. We also propose a novel read circuitry for realising a Bayesian inference engine by cascading a network of proposed FeFET based C-Elements for practical applications. Furthermore, for the first time, we analyze the impact of cross-correlation between the stochastic input bit streams on the accuracy of the C-Element based Bayesian inference implementations. For proof of concept demonstration, we utilize the proposed FeFET based Muller C-Element for performing breast cancer diagnosis utilizing Wisconsin data-set.

Musaib Rafiq

and 2 more

The conventional computing platforms based on von-Neumann architecture are highly space- and energy-intensive while handling the emerging applications such as AI, ML, and big data. To overcome the von Neumann bottleneck, compact and light-weight logic-inmemory implementations of Boolean logic gates based on emerging non-volatile memory (e-NVM) such as RRAMs, PCM, STT-MRAMs, etc., were proposed recently. However, these e-NVMs not only exhibit significant temporal and spatial variability, but their large-scale integration with CMOS process is also a technological challenge. To overcome these issues with the emerging non-volatile memories, Ferroelectric FETs based on CMOS-compatible doped Hafnium oxide with the capability of large-scale CMOS integration in the advanced logic nodes were proposed. Considering the high scalability and CMOS-compatibility of the FeFETs, in this work, for the first time, we propose a logic-inmemory implementation utilizing a single ferroelectric fullydepleted-silicon-on-insulator (Fe-FDSOI) FET exploiting the unique drain erase phenomenon. In our proposed logicin-memory implementation, inputs are applied at the gate and drain terminals using a novel input-to-voltage mapping scheme, and output is obtained as the current flowing through the Fe-FDSOI FET. We utilize an experimentally calibrated compact model of the ferroelectric capacitor connected to the baseline industry standard BSIM-IMG compact model for the FDSOI transistor for proof of concept demonstration. We also perform a comprehensive analysis of the performance metrics of the proposed logic-inmemory implementation. Our results indicate that we can realize at least 10 Boolean logic gates with high energy and area-efficiency utilizing the proposed scheme.

Musaib Rafiq

and 2 more

The developments in the nascent field of artificial-intelligence-of-things (AIoT) relies heavily on the availability of high-quality multi-dimensional data. A huge amount of data is being collected in this era of big data, predominantly for AI/ML algorithms and emerging applications. Considering such voluminous quantities, the collected data may contain a substantial number of outliers which must be detected before utilizing them for data mining or computations. Therefore, outlier detection techniques such as Mahalanobis distance computation have gained significant popularity recently. Mahalanobis distance, the multivariate equivalent of the Euclidean distance, is used to detect the outliers in the correlated data accurately and finds widespread application in fault identification, data clustering, single-class classification, information security, data mining, etc. However, traditional CMOS-based approaches to compute Mahalanobis distance are bulky and consume a huge amount of energy. Therefore, there is an urgent need for a compact and energy-efficient implementation of an outlier detection technique which may be deployed on AIoT primitives, including wireless sensor nodes for in-situ outlier detection and generation of high-quality data. To this end, in this paper, for the first time, we have proposed an efficient Ferroelectric FinFET-based implementation for detecting outliers in correlated multivariate data using Mahalanobis distance. The proposed implementation utilizes two crossbar arrays of ferroelectric FinFETs to calculate the Mahalanobis distance and detect outliers in the popular Wisconsin breast cancer dataset using a novel inverter-based threshold circuit. Our implementation exhibits an accuracy of 94.1% which is comparable to the software implementations while consuming a significantly low energy (13.56 pJ)

Tanveer kaur

and 4 more

General-purpose multiply-accumulate (MAC) accelerators have become inevitable in the IoT edge devices for performing computationally intensive tasks such as deep learning, signal processing, combinatorial optimization, etc. The throughput and the energy-efficiency of the conventional digital processors and MAC accelerators are limited due to their sparse design owing to the von-Neumann architecture. Although mixed-signal time-mode MAC accelerators utilizing emerging non-volatile memories appear promising owing to their ability to perform in-memory MAC operation via the physical laws, their application is limited due to their incompatibility and complex integration with the CMOS-process, high sensitivity to process variations, large operating voltage/cell currents, etc. To mitigate these issues, in this work, we propose a time-mode MAC accelerator based on ferroelectric-FinFETs with CMOS-compatible doped-HfO2 in the gate stack. Our rigorous analysis reveals a trade-off between the performance metrics such as computational precision, area- and energy-efficiency of the proposed MAC accelerator. Therefore, we provide the necessary design guidelines to further optimize the performance. Extensive design space exploration and simulations exploiting an experimentally calibrated compact model for the doped HfO2 ferroelectric capacitor along with 7 nm-technology PDK from ARM (ASAP) indicates that the proposed MAC accelerator exhibits a record energy-efficiency of 2.612 PetaOperations/Joule , a considerably high area-efficiency of 88.5 bits/µm2 (including I/O peripheral circuitry) , and a throughput of 4.6 TeraOps/s while supporting a 4-bit MAC operation for a square weight matrix of size 200×200 which is sufficient for realistic inference tasks.

Shubham Sahay

and 2 more

Shubham Sahay

and 2 more

Generative algorithms such as GANs are at the cusp of next revolution in the field of unsupervised learning and large-scale artificial data generation. However, the adversarial (competitive) co-training of the discriminative and generative networks in GAN makes them computationally intensive and hinders their deployment on the resource-constrained IoT edge devices. Moreover, the frequent data transfer between the discriminative and generative networks during training significantly degrades the efficacy of the von-Neumann GAN accelerators such as those based on GPU and FPGA. Therefore, there is an urgent need for development of ultra-compact and energy-efficient hardware accelerators for GANs. To this end, in this work, we propose to exploit the passive RRAM crossbar arrays for performing key operations of a fully-connected GAN: (a) true random noise generation for the generator network, (b) vector-by-matrix-multiplication with unprecedented energy-efficiency during the forward pass and backward propagation and (C) in-situ adversarial training using a hardware friendly Manhattan’s rule. Our extensive analysis utilizing an experimentally calibrated phenomological model for passive RRAM crossbar array reveals an unforeseen trade-off between the accuracy and the energy dissipated while training the GAN network with different noise inputs to the generator. Furthermore, our results indicate that the spatial and temporal variations and true random noise, which are otherwise undesirable for memory application, boost the energy-efficiency of the GAN implementation on passive RRAM crossbar arrays without degrading its accuracy.