2.6 Preprocessing, alignments, and analysis of novel genes and
transcripts
NanoFilt (version: 2.8.0) (De Coster & Rademakers, 2023) was used to
filter the raw fastq data and obtain valid data for subsequent analysis
(quality score > 7 and sequences longer than 50 bp). Data
statistics were performed using SeqKit (version: 0.12.0) (Shen et al.,
2016b). Alignment results were then analyzed and quantified using
samtools (version: 1.11; parameters: flagstat) (Li et al., 2009). Flair
(version: 1.5.0; parameters: -t 20) (Tang et al., 2020) was employed to
obtain consistent sequences from the alignment results, then further
aligned to the reference genome. Gffcompare software (version: 0.12.1;
parameters: -R-C-K-M) (Pertea & Pertea, 2020) were used to compare the
transcripts with the known transcripts of the genome and find new
transcripts and new genes (FA download link:
ftp://ftp.ensemblgenomes.org/pub/plants/release-52/fasta/oryza_sativa/dna/;
GTF download link: ftp://ftp.
ensemblgenomes.org/pub/plants/release-52/gff3/oryza sativa/).