2 RELATED WORK

Researchers on both sides of the Atlantic have classified the most popular methods of object recognition into three broad groups: those that utilise motion data, those that rely on feature extraction, and those that employ template matching. Most of the first research relied on unsupervised techniques and a wide variety of characteristics. In detection from panchromatic pictures was constructed using the scale-invariant feature transformed (SIFT) key points and the graph theorem. The unsupervised approaches were successful for a restricted set of objects, but they yielded efficient results for basic structure types in general. More recent research has centred on supervised learning techniques for accurately identifying objects of varying structures in challenging settings. Supervised learning achieves better results because, during training, it is applied to data that has already been annotated by hand.
Various supervised learning approaches utilising specially generated features were used before the mainstream adoption of convolutional neural network (CNN) architectures. Detecting objects is a two-stage procedure that uses motion and a convolutional neural network trained on patches (CNN). As a preliminary step, a lightweight motion detecting operator is used to approximate where the targets are.[7]
The second phase employs this data alongside a convolutional neural network to improve the detection accuracy.While Qinhan et al [15] use of many windows with high item probabilities and subsequent SVM and HOG algorithms for proposal generation has its advantages; the use of fixed-size windows is a significant drawback. In order to identify objects in UAV photographs, Lee et al. [8] used RCNN. A solution for vehicle detection based on the YOLO deep learning framework is proposed by Junyan Lu et al. [11]. With the use of deep learning, B.Cui etal [12] suggested argeted improvements based on the powerful YOLO v5 to improve the detection performance of small objects finding objects in satellite photos using this technique.
They applied the Faster RCNN-based RetinaNet framework on the COCW dataset. Surface-to-air missile (SAM) locations can be pinpointed with the help of a sliding window methodology for satellite imagery, as suggested by Marcum et al. [14].The advent of deep learning and GPU technology has allowed for rapid and efficient progress in the field of computer vision, particularly when it comes to tackling challenges in pattern recognition and picture processing. With its ability to automatically extract characteristics from a picture, deep learning techniques play a crucial role in the field of object recognition.When it comes to detecting objects, deep learning excels.First, we need to gather the massive dataset and begin training on this dataset if we want to detect category .After training, we feed in a picture for prediction, and the model spits out category-specific score vectors.
It was proposed by Qian et al. (2020) to maximise the training of tiny objects without overlapping bounding boxes using a variant of Faster-RCNN with a new architecture, new metric, and loss.The findings of the studies demonstrate that the genetic algorithm was effective in determining the optimal values for the hyperparameters.When it came to the detection of aeroplanes, vehicles, and ships, the accuracy attained by improved models was significantly greater than that of the original models. The findings also indicate that the training timeframes for the models have been shortened thanks to the application of appropriate hyperparameters; however this has resulted in a minor loss of precision when it comes to the detection of ships.