Literature
Survey
With the advent of Phishing webpages, researchers have investigated
supervised and unsupervised learning models for detecting phishing
webpages for instance, Moghimi, Mahmood, and Ali Yazdani Varjani
[14]. support vector machine (SVM) algorithm to classify webpages.
their experiments indicate that the proposed model can detect phishing
pages in internet banking with accuracy of 99.14% true positive and
only 0.86% false negative alarm Afroz and Green Stadt [15]
developed Phish Zoo technique this technique constructs a website
profile using a fuzzy hashing approach in which the website is
represented by several criteria that differentiate one website from
another including images, HTML source code, URL, and SSL certificate. A.
Desai, J. Jatakia, R. Naik, and N. Raul [16] created an extension to
Google Chrome to detect phishing websites content with the help of
machine learning algorithms,S. Parekh, D. Parikh, S. Kotak, and P. S.
Sankhe [17] proposed a model with answer for recognizing phishing
sites by utilizing URL identification strategy utilizing Random Forest
algorithm, X. Zhang, Y. Zeng, X. Jin, Z. Yan, and G. Geng [18]
proposed a phishing detection model to detect the phishing performance
effectively by using mining the semantic features of word embedding,
semantic feature and multi-scale statistical features in Chinese web
pages, y Ma et al. [19], Zhang et al wrote Python scripts to
automatically download confirmed phishing websites‟ URLs from PhishTank.
PhishTank is a collaborative clearing house for data and information
about phishing on the Internet Jeeva and Raj Singh [20] extracted
features related to transport layer security together with URL based
features such as length, number of slashes, number and positions of dots
in URL and subdomain names. Rule mining was used to establish detection
rules using the apriorist algorithm on the extracted features.
Experimental results showed that 93% of phishing URLs were detected.
Jain and Gupta [21] presents an anti- phishing approach, which uses
machine learning by extracting 19 features in the client side to
distinguish phishing websites from legitimate ones, Peng, Harris, and
Sawa [22], NLP is applied to detect phishing emails. It performs a
semantic analysis of the content of emails (as simple text) to detect
malicious intent. Prakash, Kumar, Kompella, & Gupta, 2010 [23],
These systems use an approximate matching algorithm to check whether the
suspicious URL exists in the blacklist or not S. Aonzo, A. Merlo, G.
Tavella, and Y. Frat Antonio, [24] represented the Multifactor
Authentication technique uses two or more authentications to login into
the accounts/systems. One is password and other is code generated by an
app through SMS, phone calls or emails. By this method only
authenticated person can login into his accounts Tech5(Machine Learning
Approach, 60%) was identified as one of the most effective
anti-phishing techniques, one of the early developed whitelist was
proposed by Chen and Guo [25], which was based on users’ browsing
trusted websites. The whitelist monitors the user’s login attempts and
if a repeated login was successfully executed this method prompts the
user to insert that website into the whitelist. One clear limitation of
Chen and Guo’s method is that it assumes that users are dealing with
trustful websites, which unfortunately is not always the case. Zhang H,
Liu G, Chow TWS, Liu W [26] presented a new framework for
content-based phishing detection using a Bayesian approach. Selection
Lee and Kim [27] proposed a suspicious URL detection system called
WARNINGBIRD for Twitter. Li et al. [28] proposed a combination of
linear/nonlinear domain conversion methods to represent the core problem
more clearly and to improve the performance of classifiers in
identifying malicious URLs Yang L, Zhang J, Wang X, Li Z, Li Z [29]
presented a new approach to phishing detection based on an inverted
matrix online sequential over-learning machine that takes into account
three types of features to characterize a website. They used the Sherman
Morrison Woodbury equation to reduce matrix inversion. They introduced
the online queue extreme learning machine to update the training model.
De La Torre Parra et al. [30] proposed a cloud-based distributed
deep learning framework for phishing attack detection. Wu, et al
[36] empirically investigated three simulated anti-phishing toolbars
to determine how they were effective at securing participants from
visiting fraudulent websites Bait Alarm [37] is comparatively more
efficient as VSBPD compares the text and their style in two websites,
Visual Similarity Based Phishing Detection (VSBPD) [38] gives a
warning to the user whenever he tries gives his credentials to an
untrusted website Google Safe Browsing API [39] allows the client
side applications to check if a URL is blacklisted from a list which is
continuously updated by Google, Juan Chen, and Chuanxiong Guo designed
and developed Link Guard algorithm [40] to detect Spoofed hyperlinks
in the phishing mails
Methodology:
There are numerous methods that have been used in the past to duplicate
various websites, such as Facebook, Instagram, GitHub, etc., but all of
these methods were only effective on websites that lacked form
validation and were weak security. In this project, however, we
implemented a new method that can replicate all types of websites that
have form validations and anti-click jackings.