2 RELATED WORK
Researchers on both sides of the Atlantic have classified the most
popular methods of object recognition into three broad groups: those
that utilise motion data, those that rely on feature extraction, and
those that employ template matching. Most of the first research relied
on unsupervised techniques and a wide variety of characteristics. In
detection from panchromatic pictures was constructed using the
scale-invariant feature transformed (SIFT) key points and the graph
theorem. The unsupervised approaches were successful for a restricted
set of objects, but they yielded efficient results for basic structure
types in general. More recent research has centred on supervised
learning techniques for accurately identifying objects of varying
structures in challenging settings. Supervised learning achieves better
results because, during training, it is applied to data that has already
been annotated by hand.
Various supervised learning approaches utilising specially generated
features were used before the mainstream adoption of convolutional
neural network (CNN) architectures. Detecting objects is a two-stage
procedure that uses motion and a convolutional neural network trained on
patches (CNN). As a preliminary step, a lightweight motion detecting
operator is used to approximate where the targets are.[7]
The second phase employs this data alongside a convolutional neural
network to improve the detection accuracy.While Qinhan et al [15]
use of many windows with high item probabilities and subsequent SVM and
HOG algorithms for proposal generation has its advantages; the use of
fixed-size windows is a significant drawback. In order to identify
objects in UAV photographs, Lee et al. [8] used RCNN. A solution for
vehicle detection based on the YOLO deep learning framework is proposed
by Junyan Lu et al. [11]. With the use of deep learning, B.Cui etal
[12] suggested argeted improvements based on the powerful YOLO v5 to
improve the detection performance of small objects finding objects in
satellite photos using this technique.
They applied the Faster RCNN-based RetinaNet framework on the COCW
dataset. Surface-to-air missile (SAM) locations can be pinpointed with
the help of a sliding window methodology for satellite imagery, as
suggested by Marcum et al. [14].The advent of deep learning and GPU
technology has allowed for rapid and efficient progress in the field of
computer vision, particularly when it comes to tackling challenges in
pattern recognition and picture processing. With its ability to
automatically extract characteristics from a picture, deep learning
techniques play a crucial role in the field of object recognition.When
it comes to detecting objects, deep learning excels.First, we need to
gather the massive dataset and begin training on this dataset if we want
to detect category .After training, we feed in a picture for prediction,
and the model spits out category-specific score vectors.
It was proposed by Qian et al. (2020) to maximise the training of tiny
objects without overlapping bounding boxes using a variant of
Faster-RCNN with a new architecture, new metric, and loss.The findings
of the studies demonstrate that the genetic algorithm was effective in
determining the optimal values for the hyperparameters.When it came to
the detection of aeroplanes, vehicles, and ships, the accuracy attained
by improved models was significantly greater than that of the original
models. The findings also indicate that the training timeframes for the
models have been shortened thanks to the application of appropriate
hyperparameters; however this has resulted in a minor loss of precision
when it comes to the detection of ships.