Results

[Present here your key results.]

Conclusion

Our results demonstrate REAL can recover the speech signal by exploiting the back-scattered intensities from vibrating surfaces. With strong resistance to acoustic noise and the ability to collect specific audio signals over long distances, REAL provides a feasible solution to tackle the cocktail party problem in the optical channel. It is demonstrated that REAL could direct ‘hear’ the voices from masks and throats in a noisy environment, where the noise characteristics are fully considered in the hardware and the neural networks could help in signal recovery. Further work could include utilizing additional sensing modalities to enhance the overall detection accuracy such as the audio-visual cues and microphone array. With the high signal quality, simple construction, affordability and miniaturization readiness, we anticipate the REAL system will foster a new way in human-robot interaction, benefiting applications in speaker identification, speech understanding and accelerating the development of voice-guided home and field robots.

Acknowledgements

This work was supported by SUSTech startup Fund Y01966105 and DJI-joint Lab Fund K2096Z028. X. Guo thanks DJI for DJI-scholarship and Jiawei Wang for assistance and discussions. X. Guo and S. Ding contributed equally to this work.

Conflict of interest

The authors declare no conflicts of interest.

Supporting Information

[Please don't insert supporting information here! We encourage you to include all your results and data in the main article. However, if you need to submit a supporting information document, you can upload it to AISY's Supporting Information collection on Authorea via https://authorea.com/inst/21456 and insert the DOI here.]