MATERIALS & METHODS
In this paper, the system is presented in four parts. In the first part,
the datasets are identified, the second step novel ensembling technique
is identified, third step training the model with three various
ensembling classifiers and fourth step is to evaluate the models to
choose the better classifier model for fake news prediction. As part of
this experiment, Machine Learning uses Python and Sci-kit libraries,
which are easy to use. For ML algorithms, Sci-Kit Learn is the best
source, with algorithms for nearly every type readily available in
Python, for easy and quick evaluation of ML algorithms. The dataset is
collected from kaggle open source repository , LIAR: a benchmark dataset
for fake news detection.
(Giglietto et al. 2019).
Configuration Google Colab with free GPU The GPU became exhausted after
two iterations, but we created a checkpoint to save the model. Using the
Google GPU cloud infrastructure, models can be trained and deployed more
quickly. Our coding environment is Keras or Tensorflow. Depending on
requirements, we can work with that TF version and utilities, such as
the core, and the functional APIs. The training set of fake news also
contains 800 rows. The number of real news and fake news is the same.
Therefore, it won’t be an imbalanced classification problem. Currently,
5621 subtexts have been extracted from 1280 texts. From 320 texts, 1305
subtexts have been extracted for validation. From 400 texts, 1568
subtexts have been extracted for testing.. Twitter users discussed the
eruption of Taal Volcano in Batangas, Philippines, Coronavirus, the
Bushfires in Australia, and the downing of flight PS752 in Iran. There
is text in this dataset that may be considered profane, vulgar, or
offensive. To complement the existing data on this topic with newly
collected and manually classified tweets, this project was undertaken.
Disasters on social media, which was used in Real or Not?, was the
original source. Kaggle competition on natural language processing and
disaster tweets. Figure 1 represents the class distribution of the
dataset collected and it shows the both classes are equally distributed
as it is a balanced class.