Search engines let us easily access a world of information that was hidden just a generation ago – Google alone handles approximately 3.5 billion search queries each day. But while some results are spot-on, the logic behind others can be a bit obscure. This is especially true in the case of “related” results or suggested queries. Sure, those other items are related in some way, but how? And more important, should we care?
Edgar Meij, a senior data scientist on Bloomberg’s news search experience team, is working to solve exactly this problem. He says that improving context on search engine results pages is just the beginning. Eventually, Meij wants our intelligent machines to be able to explain their thinking to us. As algorithms become entrusted with increasingly important decisions, their users, and the people affected by the decisions, will want to know how those decisions were made. “One of the benefits of working with intelligent systems should be that we can explain the reasoning behind the suggestions,” says Meij.
One of the first steps toward that goal is to design an algorithm that can explain the relationship between two terms – called entities – in plain English. In a paper titled, “Generating Descriptions of Entity Relationships,” presented on Tuesday, April 11, 2017 at the 39th European Conference on Information Retrieval (ECIR 2017) in Aberdeen, Scotland, Meij, together with two researchers from the University of Amsterdam, Prof. Dr. Maarten de Rijke and Nikos Voskarides, present a methodology to do just that.
Search engines can determine the relationships between entities by using a specialized database known as a knowledge graph. A knowledge graph can be visualized as a huge and irregular spider web, with each entity represented by an intersection in the web. Each relationship is defined by the thread, or edge, between them. Meij’s research used Freebase, a knowledge graph with about 50 million entities and half a billion facts. From these, the researchers selected 10 relationship types, mentioned a total of about 90,000 times.
An entity might, for example, be “Brad Pitt,” or “Angelina Jolie,” or “12 Years a Slave.” Relationships include “producer.film” or “film.starring” or “spouse.” These descriptions can be enriched by adding additional relationships such as “genre” or “film.” The researchers’ first task was to attach those modifiers to the entities, in a process called clustering entities. Then the researchers chose pairs of entities, scoured Wikipedia for sentences containing them both, and looked for patterns in those sentences. “If you have 20 sentences with two entities, you are looking for the most frequent sub-patterns in those sentences,” says Meij. Those sub-patterns formed candidates for sentence templates.
A sentence template might read: [Person] appeared in [film.] That could yield the sentence, “Brad Pitt appeared in ’12 Years a Slave.'” A more complex template might read: “[Person] appeared in the [genre] [type] [film],” generating, “Brad Pitt appeared in the drama film ’12 Years a Slave.'”
Because of their emphasis on relationships, knowledge graphs allow interactions between disparate, seemingly unrelated data sets that have been siloed away previously. So how could this apply to the financial industry? “This is not some esoteric paper on machine learning,” says Meij. “This is something incredibly practical with lots of current and potential applications in the financial industry from search to understanding supplier relationships in the supply chain.”