Hayath TM - Authorea

Abstract An Augmented Reality voice interactive assistive indoor navigation application for the visually impaired is described here using ARcore on Android phones. Human–machine interaction mechanism is voice activated with turn-by-turn continuous guidance. The app uses ARCore supported Android smartphones to acquire robust computer vision-based localization. Path creation is done using local anchors, which are then uploaded to the cloud. The ARCore Depth API is used for real-time obstacle detection and the app gives voice warning to the user. A prototype named 3rDi 4 All is implemented with the intended functionality and published on google PlayStore. Design Software Used Unity Engine is used as the framework of the app. Unity’s ARFoundation helps to wrap all functionalities of the ARCore SDK and ARKit SDK under one unified workflow. So essentially, the app uses the ARCore SDK on Android. Cloud Storage for Firebase is also used to add cloud storage functionality for the app. ARCore is Google’s platform for Augmented Reality (AR) experiences for both Android and iOS devices. The ARCore SDK helps in tracking user position, tracked plane detection and raycasting (used for placing virtual objects in the real world). It also helps in camera image access and multithreaded rendering for AR. The depth detection is provided by the ARCore Depth API. Local anchors are used for tracking position and orientation of virtual objects in the real world. Cloud anchors (provided by the ARCore Extensions for ARFoundation) is used for uploading local anchors to Google Cloud to be accessed by any user with a unique ID. Google’s Cloud Storage for Firebase SDK for Unity is used to upload metadata about all paths in a building to the cloud. These metadata files contain information about the paths, such as the name, cloud anchor IDs and time of creation. These metadata files are then downloaded on start-up of the app. We also use Google Maps Platform’s Geolocation and Geocoding APIs to access the user location and convert the output coordinates into an address. The address would then be hashed and used as the name of the metadata file uploaded to Firebase Cloud Storage. As the hashing process is not reversible, the users’ location would remain private. When searching for the metadata file, the app would simply compare the hash of the users’ current location with the name of the metadata file. For the production version of the application, we use the Firebase Authentication SDK to help authenticate the corporate / facility user. This is essential as not everyone should have access to create or override paths. Jimmy To’s Speech and Text asset for Unity is used for Android and iOS Speech-To-Text (STT) and Text-To-Speech (TTS) integration. The asset allows Unity to use the native STT and TTS services available in Android (android.speech for STT and android.speech.tts for TTS) and iOS (Speech Framework for STT and Speech Synthesis API for TTS). Hardware Used The application can be used by any Android smartphone which supports the ARCore Depth API. The Depth API algorithm is based on the processing power of the devise and hence needs mid to high end smartphones. It provides a 3D view of the world by creating depth images using a depth-from-motion algorithm. This means that each pixel in a depth image is associated with a measurement of the distance of the scene from the camera. Multiple images from different angles are taken by the algorithm and comparisons made to estimate the distance to every pixel as the phone is moved by the user. Even with minimal motion from a user, machine learning is used selectively to increase depth processing. Additional hardware available on the device, such as a ToF dedicated depth camera, is also used to increase the accuracy by automatically merging the date from all available sources. This enhances the existing depth image, thus enabling depth even when the camera is not moving. Additional hardware will also provide accurate depth estimation, especially on plain surfaces such as white walls or in dynamic situations with moving people or objects. For the application development we used a Google Pixel 3a. This is an ARCore Depth API supported device. User Interface (UI) The application is used in two modes. The scanner and path creation mode are used to scan and create path by the user of the indoor facility. The navigation mode is the one used by the visually impaired person. Both functionalities are incorporated in the same application thus promoting a minimalist design philosophy. The application starts on the Navigation mode on start up. On double tapping the screen, voice input is necessary to go to the Scanner Mode or in the case of a blind user the destination in input as a voice command (Figure 1). The Scanner Mode UI consists of a debug console, a button to switch to and from the Path Creation and Path Editing sub-modes and a button to switch to Navigation Mode. The Path Creation sub-mode UI, which is shown as default, contains a list of all created paths, a text input area for typing the name of a new path and two buttons for hosting all paths to Firebase Cloud Storage and creating a new path. Each element in the list of paths consists of two buttons to select the path and delete the path and text to show if the path is selected or not. The Path Editing sub-mode UI contains buttons to create, delete, host (to Google Cloud) and resolve anchors and a reticle to show where an anchor would be created (Figure 2). We allow the corporate / facility user to add metadata to each created path. This can enable the app to inform the user about special properties of the current path being navigated. For example, if a path ends at an elevator, the corporate / facility user can add metadata to the path to warn the user about the elevator at the end. The application, when in Navigation Mode, has a very simple UI. It consists of a debug console, text showing if an obstacle has been detected or not, text showing the distance to the nearest object in Centimetres and an image showing the depth map. For STT, a pop-up, used by Androids’ STT service takes in the users’ reply. The UI also has text which shows the users’ reply. All other text in the Navigation Mode UI is for developer debugging. Human Machine Interaction In Scanner Mode, the user interacts with the application using the UI. The application interacts with the user using logs in the UI’s debug console. In Navigation Mode, the user mainly interacts with the app through tapping the screen. When the user taps the screen twice, the application asks them which path to take. The application communicates with the user exclusively through TTS audio giving them turn by turn navigation. If there is an obstacle in front of the user, it warns about the obstacle and takes them around it to arrive at their destination (Figure 3). We have a prototype of the application named 3rDi 4 All published on google PlayStore. Limitations An implementation of both obstacle detection integrated with indoor navigation is attempted in this paper through the use of Augmented Reality. A prototype of the application has been developed that can integrate new features as per user feedback. As of now, the application is limited to smartphones which supports Google ARCore Depth API. Another limitation is that the sensor effectiveness can vary depending on performance accuracy, scene, and light conditions. With more robust computer vision with dedicated depth sensing technology such as ToF sensor that can instantly provide depth map without the need to calibrate camera motion, we hope to increase the sensitivity of object detection and also enable robust navigation in indoor spaces. Conclusions A smartphone application for indoor mobility assistance of the visually impaired has been successfully developed that provides users with an easily portable, multipurpose navigation device with obstacle detection as part of a single package. It can be controlled through a voice command-enabled user interface. The obstacle detection uses ARCore Depth API for real-time obstruction detection and warns users. The navigation paths can be implemented with metadata about the nature of surroundings. The application can be further improved with the use of different technologies like ToF sensors or LiDAR which will lead to better accuracy. This solution has been evaluated using blindfolder volunteers. The results indicate that the application is suitable for turn-by-turn navigation of visually challenged people through indoor spaces and also gives accurate real-time obstacle detection and warning. A prototype named 3rDi 4 All is implemented with the intended functionality and published on google PlayStore.