The quest to create virtual assistants that can understand and anticipate human behaviour and needs is one of the current lodestars of artificial intelligence research, but is challenged by the diversity and limitations of available datasets, as well as the expense and complexity involved in generating new proprietary ones.
Researchers at Stanford University decided to approach the problem by using descriptions of everyday human activities found in online fiction, namely 600,000 stories from 500,000 writers at online writing community WattPad – input totalling 1.8 billion words – to inform a new knowledge base called Augur, designed to power vector machines in making predictions about what an individual user might be about to do, or want to do next.
As the researchers’ new paper notes, ‘While we tend to think about stories in terms of the dramatic and unusual events that shape their plots, stories are also filled with prosaic information about how we navigate and react to our everyday surroundings. Over many millions of words, these mundane patterns are far more common than their dramatic counterparts. Characters in modern fiction turn on the lights after entering rooms; they react to compliments by blushing; they do not answer their phones when they are in meetings.’
Fiction’s vast repository of human observance is a ripe source of far more mundane knowledge about ourselves and how we conduct our lives than might be indicated by the key events likely to stick in our minds; for every mad captain running his ship against a whale in revenge, there are hundreds of meals, cups of coffee, moments of boredom, instances of purchasing items and straightforward domestic tasks such as sleeping, waking, washing and cooking.
Nonetheless using dramatic stories to teach AIs about human lives can introduce often-comical errors into a machine-based prediction system. The researchers found that an Augur-based prediction system is most likely, when identifying a cat, to predict that the next thing it will do is hiss. The paper suggests that crowdsourcing or similar user-feedback systems would likely be necessary to amend some of the more dramatic associations that certain objects or situations might inspire. As the authors note, ‘If fiction were truly representative of our lives, we might be constantly drawing swords and kissing in the rain.’
The system’s current success rate stands at 71% for unsupervised prediction of what a user will do next, and 96% for recall, or identification of human events. Augur was field-tested in a proof-of-concept Google Glass application called Soundtrack For Life, which selects and plays music based on the user’s current activity.
Functionality as apparently simple as choosing Stravinsky for cooking and something more energetic for intellectual activity requires a significant ability for the AI to put scenes and objects into an appropriate context; if a user is sitting down at what appears to be a restaurant opposite someone who appears to be eating, are they necessarily eating anything themselves? In this sense the AI may need to learn that to be ‘at lunch’ and to be eating are associated but not inevitably synonymous, since many prompts in life appear without necessarily getting answered – or getting answered as expected.
Facebook, one of the leading research entities into AI, recently released 1.6gb of children’s stories to the research community with a similar view towards deriving real-life insights from ‘false’ accounts, while Google’s DeepMind unit is developing similar approaches to teach its artificial intelligence (AI) computing systems to read. Indian researchers are likewise teaching neural networks to understand events during sporting activity by having them analyse text versions of real-time sports commentaries.