WHAT: this scenario refers to Critical Infrastructures surveillance, where Deep Learning is meant to be used to define an intelligent video-based detection system for security in and around the critical infrastructure.
WHY: in this scenario, a currently open issue is the long time that lasts between the installation of a system at a specific site and the point in time at which it becomes reliable. This contradicts a mobile and temporary use of such a system, which is the essence of this use-case. Applications are, for instance, the perimeter of critical infrastructure (power generators) and large events (major political meetings, temporary border controls, peacekeeping operations). Usually, an evaluation period of a few weeks up to multiple months is state of the art. In this period, algorithms are parameterized and tweaked for the optimal state. When using sensor fusion methods over time and multiple sensors, especially in combination with machine learning methods, this problem becomes even more difficult and important. An (offline) learning based method is most effective when data from the actual instance (sensor) has been used in the training phase, which contradicts the goal of mobility.
HOW: in this use-case, DL learning will be used for in three different phases:
a) Algorithm design: transfer learning from one scenario to another, performed off-line;
b) Algorithm adaptation/tuning: involving scene modeling, parsing and understanding (using CNNs);
c) Actual event detection, relying on a scene-aware “tracklet” model (using RNN-LSTM) to identify relevant “tracklets”.
ALOHA intends to exploit a fast workflow to transfer results from one sensor to another. This can be tackled by using the ALOHA tool flow, working on DL-based transfer of training data in the specific context and on tools facilitating the set up of video systems using DL methods. A specific challenge is to use deep learning methods to enable technician after a short training to utilize advanced capabilities of modern surveillance techniques.
Security surveillance application with object tracking (left image) and with classification and prediction of behavior (center and right images)
WHAT: this scenario refers to Smart Industry, where DL is used for speech recognition. The objective of this use case is to develop an embedded speech recognition system that would activate/deactivate PLC-controlled tooling machinery or collaborative robot in an industrial environment, without relying on a cloud backend.
WHY: both the speech and the voice recognition markets are expected to grow quite fast from here to 2022 and voice-activated or voice-controlled devices are booming in the recent years thanks to the evolutionary step that voice recognition and speech recognition technologies had performed. It is generally true that, as computing devices of all shapes and sizes increasingly surround us, we will come to rely more on natural interfaces such as voice, touch, and gesture. Despite visual and audio perception are of paramount importance, we can say that speech has historically had a high priority in human communication, being developed long before the introduction of writing. Therefore, it has the potential to be an important mode of interaction with computers and machinery. Moreover, speech can be used at a distance, making it extremely useful for situations where the operator has to keep hands and eyes somewhere else. In the past, developing an intelligent voice interface was a complex task – feasible only if a dedicated development team inside a major corporation like Apple, Google or Microsoft could be made available to system designers. Today, however, due to the emergence of a small but growing number of cloud- based APIs, it is now possible for developers to build an intelligent voice interface for any app or website without requiring an advanced degree in natural language processing. Those API, however, require to have an available, high-speed, computing intensive cloud backend to elaborate all the voice and speech features including noise extraction, noise suppression, voice pattern recognition, user identification, emotion detection, word recognition, semantic pattern elaboration and so on, with expected latencies and loss in precision that could impact critical applications such as those related to machinery interaction.
HOW: in the ALOHA project we intend to address the needs of this scenario by leveraging on DL algorithms. DL seemed to have finally made speech recognition accurate enough to target not strictly controlled environments. Thanks to DL, speech recognition may become the primary interaction way among human and computer, or machinery as in the given scenario. Deep learning has brought Baidu to improve speech recognition from 89% accuracy to 99% accuracy. Speech technologies will go far beyond Smartphones and may be quite soon on home appliances and wearable devices. The usage of neural networks in talker-independent continuous speech recognition is not a new science and, in the years, DL has become a resourceful solution for such a problem.
Using the ALOHA tool flow, the consortium intends to make designers life easier allowing them to characterize and deploy an embedded voice recognition system capable of natural language processing in critical scenarios, such as an industrial application where real-time processing is mandatory to enable fast and safe human-machinery interaction. Starting from existing DL algorithm, suitable for voice recognition and speech pattern recognition, energy-aware and secure software code is going to be automatically deployed over optimal hardware modules, optimized for the given functionality to provide hard real-time execution. The APIs created by means of the ALOHA framework will be used as a ground base to jumpstart the development activities and create a benchmark for DL applied to industrial environment voice recognition techniques.
Examples of industrial machinery (spot welding machinery and picking robot). Pedal command creates unbalance to the operator and less precision in welding
WHAT: This use-case refers to a DL-based smart assistant, which supports emergency room situations, identifying acute intracranial bleeds in non-contrast CT images.
WHY: pilot results of activities studying automated decision support for medical decisions show that initial implementations reach accuracy level that exceeds that of emergency room clinicians. Internal performance results leave room for improvement of response time. Processing happens on a 3D volume, posing significant requirements in terms of computing resources. Desirable response-time depends on the characteristics of each specific clinical case, however it can be estimated that lowering it to around ten seconds would open a wide new landscape of potential clinical use and, consequently, commercial exploitation channels.
Moreover, the implementation of (at least part of) the DL inference on cost-effective embedded computing platforms will also open new options. For example, it will pave the way to implementation of portable devices, suitable to be easily moved from one room to another, connectable to portable imagers, or usable in a wide range of medical operations where network connectivity is not available or unreliable.
On the R&D side, the deployment of new solutions is highly demanding in terms of computational resources and effort. This is very limiting in when attempting to maximize the output of scarce research staff, especially when exploring new clinical areas.
HOW: the consortium wants to assess the benefits provided by the ALOHA tool flow within the development of an embedded medical decision assistant, considering the requirements posed by the specific application in terms of accuracy (few false positives, and even fewer false negatives), performance (possibility of obtaining results in a very short time, utilizing cost-effective hardware).
ALOHA will test the capability of the tool flow to create runtime-adaptive systems. The decision support assistant should be capable of being set by the clinician to work in different modes for different kinds of intervention corresponding to different battery lifetimes or detection precision. Moreover, we will test the ability of the flow to enable fast growth into new clinical areas, thanks to the provided productivity improvement.
Object detection and classification in medical images