Bloomberg and “The Magic” of Machine Learning

Gary Kazantsev is head of the Machine Learning Engineering team at Bloomberg. Machine learning is an increasingly important area at Bloomberg, a company that manages massive amounts of data in a real-time environment. While machine learning is generally about giving computers the ability to learn by using algorithms to analyze data, find patterns or predict outcomes, much of Bloomberg’s efforts today in this area are focused on helping the company’s customers to pluck intelligence and insight from the financial information and data coursing through its network that feeds the Bloomberg Terminal. Fresh off of his presentations at two key industry events, Gary explains what he and his team are doing and how that is helping investors and Bloomberg customers make better, more informed decisions.

The conversation with Gary has been edited for length.

What is fueling heavy investment in machine learning in the financial industry and how does it fit into customers’ workflows?
A lot of our customers’ workflows are being automated, entirely or partially. What they’re doing today is more on the cognitive side: strategy and portfolio selection, formulating the investment theses, etc. People are trying to solve many, many problems in finance using these methods, because they allow for the building of more sophisticated intelligence into trading and client facing workflows. These methods can improve efficiency, or, crucially, allow us to approach problems which heretofore were intractable – due to complicated interactions in the data, complexity of the problem, availability of data or computational resources, and so on.

What is different about the machine learning approach to solving problems like that?
The machine learning approach is fundamentally data-driven and, in a certain way, “bottom up.” We can build systems that make very few assumptions, and are flexible in ways that top down, designed software systems cannot be. Machine learning models can capture very complicated relationships in the input, and can be made to adapt to the changing world very quickly. For example, for one of the systems we built we routinely train 450,000 logistic regression models daily, to accommodate changes in user behavior: if [the user] suddenly started doing something wildly different, their suggestions would change. When you ask [the system] to show you “news stories about Bloomberg from Bloomberg,” it knows that the first “Bloomberg” is a company or a person, and that the second “Bloomberg” is a news source. That’s called semantic parsing, which is a big, open research area. It would be very hard, or impossible, to encode all the rules and considerations like that into a “traditional” software system.

How does machine learning make something like semantic parsing possible?
There are many different approaches being explored in the research community, attempting to learn the relationship between plain text of the user input to more or less complicated representations of the “meaning” of that input. Purely statistical methods require large data sets of known mappings like that. Our approach to this problem is a hybrid, taking into account both a lot of a priori knowledge about the domain of the users’ queries, as well as large amounts of statistical data about popularity of entities, words, phrases, etc. for disambiguation purposes.

How does the system “learn” to answer these questions?
There’s a fairly deep and involved pipeline that hides behind the user interface. There are named entity recognition and disambiguation systems for companies and people which change daily. There’s topic classification for a very large and involved set of categories, real-time indexing, interactions data, and so on. There’s a layer in the middle that does information retrieval and inference. Then, there is the actual semantic parsing step. Then, answer synthesis and ranking. It is not a simple question to answer, because each of these systems is complex and can evolve independently.

How is this different from a consumer-oriented search algorithm like Google’s?
The approach is substantially different because Google is solving a different set of problems. In our case, the SLA is very strict, and there are very high expectations on correctness of the information being delivered. The results of users’ searches are structured in different ways depending on the semantics of the query. Google has been slowly adding structured workflows to answer specific types of questions, but the things they have been focusing on do not necessarily correspond to our customers’ needs because they are dealing with a different domain.

It sounds like a lot of this work is focused on text analysis; where does that data come from?
We ingest in the order of 1.5 million documents per day, in roughly 30 different languages from 125,000 different sources. Social media is a big chunk of it, but there is also our content, plus third-party premium content, regulatory organizations, etc. There are 100,000 different websites that we scrape: third-party providers, governments, data from other legal entities—you name it.

How many engineers work on machine learning at Bloomberg?
More than 100 people. The answer would have been quite different if you had asked me this question five years ago. Since then, the number and diversity of applications of machine learning at Bloomberg has increased dramatically; [the team] has basically been doubling every year.

These problems sound very technologically involved. Can you talk about challenges in developing solutions to some of these problems?
Let’s use sentiment analysis as an example. Sentiment analysis is actually a very complicated topic by itself, in part because a lot depends on the way that you formulate a problem. To give you an example, suppose we ask an editor whether a story is positive or negative, and the story happens to be about a company laying off staff. If that person is a social issues editor, they’re going to say the story is negative, whereas for a trader, it’s going to be positive for your trade. So problem definition is a challenge. This problem is hard even for human beings; to build something that replicates human judgment in this kind of subtle way requires developing new methods. Systems like this often have to execute in real time, which is a nontrivial infrastructural challenge.

Do different data sources require different machine learning approaches?
Yes. For example, continuing with sentiment analysis as an example, a lot of information that is actually relevant to the financial markets is purely factual; there’s no statement of opinion or indicative words in the story at all. The fact that [a company] missed quarterly earnings expectations is simply a fact, but it has a material impact on the market. That makes this problem harder than other opinion mining applications. On the other hand, what makes it easier is that you generally don’t need to deal with things like sarcasm and metaphor: The Wall Street Journal is not known for writing in abstract verse. Basically, [human] annotators had to be trained and data had to be produced in-house; in fact all of it was done in-house. Different models had to be built for news and e.g. social media, because the vocabulary and the structure of the content is completely different. People on Twitter have taken creative misspelling to an art form.

What is the impact for the Bloomberg user?
In many cases [machine learning] is invisible. You run a search for news, and you suddenly get magically better results. You may not know to ask the question ‘why.’ Let’s say you wanted to know which Chinese companies have made investments in the U.S. last year. You could probably answer that question [yourself] but it would take a whole lot of leg work. With the solutions we have implemented, you could just type the question into the search bar and get the answer [on your Terminal screen]. This represents an enormous gain in productivity for people who do this kind of research. More broadly, the impact is automation – both on the Terminal user side, and the Enterprise business side.