Natural Language Processing & Machine Learning at Bloomberg

Across the Bloomberg Terminal, natural language processing and machine learning play a central role.

Throughout the life of the company, Bloomberg has always relied on text as a key underlying source of data for our clients. Over the past decade, we have increased our investment in statistical natural language processing (NLP) techniques that extend our capabilities. Our engineering teams have built state-of-the-art NLP technology for core document understanding, recommendation, and customer-facing systems.

At the heart of our NLP program is technology that extracts structured information from documents — sometimes known as digitization or normalization. At the core of this program is a proprietary, robust real-time NLP library that performs low-level text resolution tasks such as tokenization, chunking and parsing. On top of this core tool set, we have built named entity extractors that detect people, companies, tickers and organizations in natural text, which is deployed across our news and social text databases. These named entity extractors are crucial for enabling our sentiment analysis (BSV<GO> and TREN<GO>) derived indicators that estimate how positive a piece of news is for a particular company. Beyond that, our topic classification engine (e.g., NI OIL<GO>) automatically tags documents with normalized topics to make retrieval and monitoring straight-forward. In the law domain, we have built a legal principles engine that enables lawyers to uncover the underlying case law argumentation that supports a particular decision.

Beyond these core functions, we have built sophisticated fact extractors (or relationship extractors), that pick out specific information from documents in order to ease our ingestion flow. We have also built out a large suite of tools for structured data. One piece of these are table detection and segmentation tools that enable our analysts to increase their scope of ingested data. Additionally, we have built research systems for figure understanding that extract the underlying data from scatter plots. We have also built tools for our reporters that allow them to create self-service topic streams to find pieces of news about the companies or sectors they are responsible for covering.

All of these core NLP tools stay strictly within the domain of text, but we have also built out significant functionality that connects text to other artifacts – either people or stock tickers. Our market moving news indicators (MMN<GO>) automatically detect news headlines that are crucially important and tag them. We have a robustly deployed related stories function that highlights additional relevant information to people when they are reading stories.

Finally, we have invested heavily in tools that simplify client interaction. Our search system (HL<GO>) is very sophisticated, with state-of-the-art ranking and query understanding. Furthermore, we’ve built a natural language query interface (e.g., ‘What is IBM’s market cap<Search>’) where people can ask questions in plain English and get precise answers. This search functionality is deployed across many document collections, but our news search and ranking (NSE<GO>) gets significant attention in particular. For our internal help system, we have automatic routing systems that direct incoming queries to the appropriate internal experts. We also have built automatic answering capabilities that can detect and answer frequently occurring client inquiries.

From a staffing perspective, we have multiple natural language processing and machine learning experts, including former professors and graduates from the best programs. As we build out our team, we are also building out our infrastructure that supports them, such as the creation of a large GPU cluster to speed up the deep learning/neural network models that increasingly make up a large part of our deployed technology. Every year, we publish papers at top academic conferences — recently, our team has published papers at ACL, SIGIR, ICML, and ECML-PKDD and more. Over the past decade, our NLP and ML teams have grown into a formidable force and we anticipate the next decade will see them develop even further.