Saifullah Saifullah - Authorea

Please note: We are currently experiencing some performance issues across the site, and some pages may be slow to load. We are working on restoring normal service soon. Importing new articles from Word documents is also currently unavailable. We apologize for any inconvenience.

Saifullah Saifullah

Public Documents 3

The Reality of High Performing Deep Learning Models: A Case Study on Document Image C...

Saifullah Saifullah

and 3 more

October 11, 2023

Deep neural networks have demonstrated exceptional performance breakthroughs in the field of document image classification; yet, there has been limited research in the field that delves into the explainability of these models. In this paper, we present a comprehensive study in which we analyze 9 different explainability methods across 10 different state-of-the-art document classification models and 2 popular benchmark datasets, making three major contributions. First, through an exhaustive qualitative and quantitative analysis of various explainability approaches, we demonstrate that majority of them perform poorly in generating useful explanations for document images, with only two techniques, namely, Occlusion and DeepSHAP, providing relatively adequate, human-interpretable and faithful explanations. Second, to identify the features most relevant to the models’ prediction, we present an approach to generate counterfactual explanations. An analysis of these explanations reveals that many document classification models can be highly susceptible to minor perturbations in the input. Moreover, they may easily fall victim to biases in the document data, and end up relying on seemingly irrelevant features to make their decisions, with 25-50% of the predictions overall, and up to 60% for some classes strongly depending on these features. Lastly, our analysis revealed that the popular document benchmark datasets, RVL-CDIP and Tobacco3482, are inherently biased, with document identification (ID) numbers of specific styles consistently appearing in certain document regions. If unaddressed, this bias allows the models to predict document classes solely by looking at the ID numbers and prevents them from learning more complex document features. Overall, by unveiling the strengths and weaknesses of various explainability methods, document datasets and deep learning models, our work presents a major step towards creating more transparent and robust document image classification systems.

DocXClassifier: Towards a Robust and Interpretable Deep Neural Network for Document I...

Saifullah Saifullah

and 3 more

October 11, 2023

This paper presents an inherently explainable deep network for document image classification.

Towards Privacy Preserved Document Image Classification - A Comprehensive Benchmark

Saifullah Saifullah

and 4 more

April 08, 2022

This paper presents a comprehensive benchmarking of privacy preserving techniques for document image classification.