Author : Mehul Jain 1
Date of Publication :22nd August 2017
Abstract: Current voice-based digital assistants despite their claims of being intelligent, lack abilities that a true personal assistant must possess like extendable skill set, dynamic adaptation and high context awareness. In this paper, we highlight some design and implementation requirements that must be met in order for the development of next generation digital personal assistants and propose a general architectural backbone that can be used to make headway for such personalized speechoperated assistive technology. In particular, we confer about issues of extensibility of the skill set used by the digital assistant, hypothesis generation and evaluation, extensive user adaptation, and redundant representations handling in the design. Further, we briefly discuss the research and development directions that are undertaken to tackle challenges put by such a system. We then consider a scenario and illustrate the data flow in our architecture.
Reference :
-
- Google Now, http://www.google.com/landing/now/.
- Microsoft Cortana, http://www.windowsphone.com/ en-us/how-to/wp8/cortana/meet-cortana.
- Apple Siri, http://www.apple.com/in/ios/siri/
- Ramanthan Guha, Vineet Gupta, Vivek Ragunathan and Ramakrishna Srikant. (Feb 2015). User Modeling for a Personal Assistant. WSDM’15, Proceedings of the Eighth ACM International Conference on Web Search and Data Mining.
- David Garlan, Bradley Schmerl. (July 2006). Architecture for Personal Cognitive Assistance. Proceedings of the 2006 Conference on Software Engineering and Knowledge Engineering.
- Friedland, N. S., P. G. Allen, G. Matthews, M. Witbrock, D. Baxter, J. Curtis, B. Shepard, P. Miraglia, J. Angele and S. Staab. (2004). "Project halo: Towards a digital aristotle." AI magazine 25(4): 29.
- Tur, G., A. Stolcke, L. Voss, J. Dowding, B. Favre, R. Fernandez, M. Frampton, M. Frandsen, C. Frederickson and M. Graciarena. (2008). The CALO meeting speech recognition and understanding system. Spoken Language Technology Workshop, 2008. SLT 2008. IEEE.
- Gruber, T. (2009). Siri, A Virtual Personal Assistant— Bringing Intelligence to the Interface, Jun.
- Panton, K., C. Matuszek, D. Lenat, D. Schneider, M. Witbrock, N. Siegel and B. Shepard. (2006). Common sense reasoning–from Cyc to intelligent assistant. Ambient Intelligence in Everyday Life, Springer: 1-31.
- Lenat, D., M. Witbrock, D. Baxter, E. Blackstone, C. Deaton, D. Schneider, J. Scott and B. Shepard. (2010). "Harnessing Cyc to Answer Clinical Researchers ' Ad Hoc Queries." AI Magazine 31(3): 13-32.
- Mehra, P. (2012). "Context-aware computing: beyond search and location-based services." Internet Computing, IEEE 16(2): 12-16. [12] L. Li, H. Deng, A. Dong, Y. Chang, and H. Zha. Identifying and labeling search tasks via query-based Hawkes processes. In SIGKDD, 2014.
- C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei. Identifying task-based sessions in search engine query logs. In WSDM, 2011.
- Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall.
- Griol, D., Callejas, Z., López-Cózar, R., &Riccardi, G. (2014). A domain-independent statistical methodology for dialog management in spoken dialog systems. Computer Speech and Language, 28(3), 743–768. http://dx.doi.org/10.1016/j.csl.2013.09.002.
- E. Levin, R. Pieraccini, and W. Eckert. (1998). ―Using Markov decision process for learning dialogue strategies,‖ Proceedings of ICASSP.
- J. Henderson and O. Lemon. (2008) ―Mixture model POMDPs for efficient handling of uncertainty in dialogue management,‖ Proceedings ACL-HLT, pp. 73–76, 2008 [18] S. Larsson and D. Traum. (2000). ―Information state and dialogue management in the TRINDI dialogue move engine toolkit,‖ Natural language engineering.
- D. Bohus and A. I. Rudnicky. (2003) ―RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda,‖ in Proceedings of EUROSPEECH, 2003.
- C. Rich and C. L. Sidner, ―Collaborative discourse, engagement and always-on relational agents,‖ in Proceedings of AAAI, 2010.
- Baptist, L., &Seneff, S. (2000). GENESIS-II: A versatile system for language generation in conversational system applications. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP ’00), 3, 271–274.
- Dethlefs, N., Hastie, H., Cuayáhuitl, H., & Lemon, O. (2013). Conditional random fields for responsive surface realization using global features. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), 1254–1263.
- Rieser, V., Lemon, O., & Keizer, S. (2014). Natural language generation as incremental planning under uncertainty: Adaptive information presentation for statistical dialogue systems. IEEE/ACM Transactions on Audio, Speech and Language Processing, 22(5), 979–994. http://dx.doi.org/10.1109/TASL.2014.2315271
- Dutoit, T. (1996). An introduction to Text-to-Speech synthesis. Dordrecht: Kluwer Academic.
- Callejas, Z., Griol, D., &López-Cózar, R. (2011). Predicting user mental states in spoken dialogue systems. EURASIP Journal on Advances in Signal Processing, 2001, 6. http://dx.doi. org/10.1186/1687-6180-2011-6
- Acosta, J. C., & Ward, N. G. (2011). Achieving rapport with turnby-turn, user-responsive emotional coloring. Speech Communication, 53(9–10), 1137–1148. http://dx.doi.org/10.1016/j. specom.2010.11.006
- Callejas, Z., López-Cózar, D., Ábalos, N., &Griol, D. (2011). Affective conversational agents: The role of personality and emotion in spoken interactions. In D. Pérez-Marín& I. Pascual-Nieto (Eds.), Conversational agents and natural language interaction: Techniques and effective practices (pp. 203–222). IGI Global. http://dx.doi.org/ 10.4018/978-1-60960-617-6. ch009
- Nass, C., & Yen, C. (2012). The man who lied to his laptop: What we can learn about ourselves from our machines. Current Trade.
- Zhu, C., Sheng, W. (2011). Motion- and locationbased online human daily activity recognition. Pervasive and Mobile Computing, 7(2), 256–269. http://dx.doi.org/10.1016/j.pmcj.2010.11.004