where \(\alpha\) and \(\beta\) are weighting hyperparameters of the total loss. Both of them
were set to 0.5 in our experiments.
To leverage the benefits of multiple sequences, we utilize the weighted
average ensemble learning-based
method. The outputs of the three separated models are incorporated, thus
contributing to the final ensemble prediction \({r_{ens}}\) as follows: