Bloomberg supports academic research in broadly-construed data science, including natural language processing (NLP), information retrieval, machine learning, and data mining, through its annual Data Science Research Grant Program, which has been in existence since April 2015 (learn more about prior grant recipients and their research).
Today, we are pleased to announce the winners of our fifth round of grants.
Out of more than 200 applications from faculty members at universities around the world, a committee of Bloomberg’s data scientists from across the organization selected the following five research projects:
Bruce Croft (University of Massachusetts Amherst)
Neural Information Retrieval with Limited Data
Information retrieval, or more generally, text search, is a critical function in many financial applications. Finding critical documents, passages, or sentences is part of many financial and legal decision-making processes. Recently, machine learning approaches, and in particular deep neural networks, have yielded significant improvements on several natural language processing and computer vision tasks; however, such breakthroughs have not yet been observed in the area of information retrieval. Besides the complexity of IR tasks, such as understanding user information needs, a main reason for this is the lack of high-quality and large-scale training data for many IR tasks. Professor Croft’s research will study how to design and train machine learning algorithms when there is no large-scale data in hand. In addition to improving search results for financial domains where large-scale labeled data is not available, developing effective approaches to learn from limited data has the additional potential benefit that search tools will be able to be effectively deployed more quickly to a new domain.
Ido Dagan (Bar Ilan University [Israel])
Interactive Explorative Summarization: Closing the Summarization Gap
The abundance of information available is consumed differently by diverse readers, making a classic static summary insufficient in most cases. Professor Dagan’s research will pave the path for the relatively under-explored field of “interactive summarization” by formulating task definitions, devising an evaluation framework and developing interactive summarization methods and system.
Kevin Gimpel and Karl Stratos (Toyota Technological Institute [TTI] at Chicago)
Representing Knowledge by Learning to Link
An essential component of language understanding is identifying when a piece of text points to something outside of itself. Entity linking is a version of this problem in which the textual mention is a named entity. Accurate entity linking is a first key step in financial information processing, for instance, to identify which companies or persons are being discussed in a given tweet. But there are many other instances of linking that go beyond named entities. Generalized linking is a version of this problem in which the textual mention is a general concept. Resolving references in generalized linking may help with natural language understanding (NLU) problems such as coreference resolution. In their research, Drs. Gimpel and Stratos seek to develop a unified framework that can capture the rich compositional structure of a knowledge base and efficiently encourage global coherence of linking decisions in a document.
Xi Chen (NYU Stern School of Business)
Cost-effective Learning for Complex Crowdsourcing Tasks
Crowdsourcing – as an emerging technology for data-intensive tasks – leverages a “large group of people in the form of an open call” to address problems that neither humans nor machines can solve alone. Despite its popularity, two major challenges in crowdsourcing practice are: how to wisely use the limited querying budget and how to identify reliable workers. This research aims at addressing these challenges by developing a dynamic learning algorithm for crowdsourcing under budget constraints with online detection of workers’ quality and tasks’ difficulty and proposing new bandit learning algorithms for worker quality control that utilize rich contextual information of workers.
Emine Yilmaz (University College London)
Task Oriented Information Interaction Systems for Proactive Task Assistance Support
Online systems are increasingly being used to accomplish various complex tasks, such as finding & re-finding information, compiling result summaries, buying products or doing market research. The back-end technologies powering such systems are mostly composed of machine learning-based methods which are often agnostic of the specific tasks being performed. Prof. Dr. Yilmaz’s research will devise machine learning algorithms that can automatically detect current tasks, as well as the sequence of workflow activities that are likely to follow, and build task-oriented user interfaces to minimize the effort spent on achieving this kind of task.
“The grant committee selected these five research projects which resonated strongly with key areas of research at Bloomberg,” said data scientist Gillian Chin, a member of the grant committee from the Office of the CTO. “We’re excited to provide support to each of these projects, which exhibit the possibility of significantly advancing the field of data science and machine learning.”