Effective modeling of variabilities, exhibited by human’s speech at phoneme, word, and sentence levels, is still an open research problem for SLU systems. Even more, extracting relevant keywords, themes or concept mentions from either a sentence or an entire spoken document is a difficult task even for the most advanced end-to-end systems. Moreover, for speech signals, the recording conditions and the paucity of domain-specific data make it difficult to extract relevant information in different contexts without the use of external knowledge, such as domain-specific ontologies.
Another crucial problem for the available solutions is the interpretability and robustness of neural based SLU systems. AISSPER, proposed to make explicit selection and evidence of appropriate relevant contexts and the relative uncertainty for enhancing interpretability and improving robustness .