Real-time analysis of REAL audio neural network model

In inference, the average time cost for the model to estimate 3-second enhanced audio is 0.08 seconds using an Nvidia 2080s GPU, and 0.78 seconds if it runs on an i7-10700K CPU. Therefore, the model can be deployed in real scenarios with affordable computing cost and low latency, to perform inference in a streaming fashion using the sliding window strategy, which constantly processes and estimates new incoming frames of REAL audio.