Figure 3 : input embedding is a combination of 3 embeddings
BERT developers have set a specific set of rules to represent languages before feeding into the model.
Tokenization: BERT uses WordPiece tokenization. The vocabulary is initialized with all the individual characters in the language, and then the most frequent/likely combinations of the existing words in the vocabulary are iteratively added.