3 PRELIMINARIES
3.1 Notations & Parameters
The parameters defined for training the model are:
- Img: set the size of the incoming image.
- Batch: figure out how many items to make at once.
- Epochs: set the training epoch count.
- Data: provide the location of yaml file.
- Configuration file (cfg): allows defining how our model will
operate.
- Weights : Specify a non-standard location for the weights.
- Learning rate (lrf): optimising algorithm’s tuning parameter
that controls the size of the step taken at each iteration on the way
to the loss functions minimum.
- Momentum: A new feature of the gradient descent optimization
technique allows the search to gain momentum in a chosen direction,
allowing it to glide over the search space’s flat areas and avoid the
pitfalls of erratic gradients.
- Warmup-echos: By providing more varied instances later in
training, the ”warm-up” helps mitigate the ”primacy impact” of the
initial set. Without it, the models will un-train those early
superstitions, which could add a few epochs of run time before you
reach the necessary convergence.
- Weight decay: machine learning regularisation strategy for
simplifying models and avoiding overfitting.
- Warmup- bias: When an algorithm is biassed, the outcome is
skewed in favour of or against a particular notion. An inaccurate
assumption made during the machine learning process can lead to bias,
which is a type of systematic inaccuracy.
3.2 xView Data set
Over 1 million objects in 60 classes are spread out throughout over
1,400 km2 of footage, making xView one of the largest and most
diversified object recognition datasets accessible. Imagery with a
better resolution is available in the xView dataset, which was compiled
using data gathered from WorldView-3 satellites at a ground sample
distance of 0.3m. There are a total of 60 classes in the dataset. Naming
a few of them include ’Fixed-wing Aircraft’, ’Small Aircraft’, ’Cargo
Plane’, ’Helicopter’, ’Passenger Vehicle’, ’Small Car’, ’Pickup Truck’,
’Utility Truck’,’Truck’ etc.
3.3 Single Stage Object
Detectors
Object Detection in One Step models are a type of object detection
models that only require a single pass through the detection process, as
opposed to the two passes required by two-stage models. Typically,
inference times are reduced in these models. Both the bounding boxes and
the class probabilities within them are predicted by a single
convolutional network.
3.4 YOLO V 5
The YOLO v5 has the same three essential components as any other
single-stage object detector, model backbone, model head and model neck.
Model Backbone is mainly used to extract important features from the
given input image.[21][22][23]. The Cross Stage Partial
Networks constitute the backbone of YOLO v5, which is utilised to
extract highly informative features from an input picture.
Most of the time, Model Neck is used to make feature pyramids. Feature
pyramids make it easier for models to generalise well about scaling of
objects. With different sizes and scales, it helps to find the same
thing. Feature pyramids are very helpful, and they help models do well
with data they haven’t seen before. Other models, like FPN, BiFPN,
PANet, etc. use different types of feature pyramid techniques. PANet is
utilised as the neck in YOLO v5 to obtain the feature pyramids. Model
Head is mostly employed in the latter stage of detection. It creates
class probabilities, objectness scores, and bounding boxes in the form
of a final output vector by applying anchor boxes to the features.
4. PROPOSED TECHNIQUE
4.1 Labelling Of Data
If an annotation’s bounding box was divided over tiles, the ability to
name the tiled pictures was built into the tiling script so that the
photos could still be seen as a whole. The images are split in the ratio
of 90:10 for our training and validation set respectively. The images
from both the sets were further tiled as per memory allocation of both
our models as 512x512. The annotations are split with the dimensions
according to the image splits and scaled relatively for making it
compatible with the YOLO framework. Each training iteration / experiment
consisted of 30 epochs each to maintain the consistency.
Parameter Training &
Tuning
Exp 1- python train.py –img 512 –batch 12 –noval –epochs
30 –data data/custom.yaml –weights yolov5m.pt –cfg
models/yolov5m_custom.yaml –hyp data/hyps/hyp.scratch-med.yaml
–device 0 –name ./experiment_x
Annotation of Image
Figure 1 shows the sample of an annotated image for the model. The steps
nvolved in the annotation of the im age are as under:
Step1: Download the xView dataset from
https://challenge.xviewdataset.org.
Step 2: Split all the images in the ratio of 90:10 for our training and
validation set respectively.
Step 3: Images from both the sets were further tiled as per memory
allocation of models, i.e 512x512 for Yolo v5.
Step 4: The annotations were converted from GeoJSON Polygon objects to
YOLO format using a script. We also split the annotations dimensions
according to the image splits.
Step 5: The tiled dataset was further used to experiment and train our
YOLO models. Each training iteration / experiment consisted of 30 epochs
each to maintain the consistency between all models.