Bloomberg recently welcomed five exceptional doctoral students who are working in broadly-construed data science, including natural language processing (NLP), machine learning, and artificial intelligence, to its Global Headquarters in New York City as part of the company’s inaugural Bloomberg Data Science Ph.D. Fellowship. Bloomberg has benefited significantly from its Data Science Research Grant Program, which builds relationships with academic researchers around the globe, and created the Fellowship program to engage professionals who may be early in their careers. The goal of this fellowship is to provide support and encouragement of groundbreaking publications in both academic journals and conference proceedings.
Today, we are very pleased to announce the inaugural class of Bloomberg Data Science Ph.D. Fellows.
A committee of Bloomberg’s data scientists from across the organization selected the Fellows because of their proposals’ technical resiliency and strengths, and recommendation letters from their academic advisors, who also accompanied the Fellows on their visit to Bloomberg. Each Fellow’s research interests are strongly aligned with the innovative work being done at Bloomberg to leverage data science and machine learning in the company’s products and services. During the next year, the Fellows will work to advance their research and explore applications integral to Bloomberg. They will also have the opportunity to participate in an internship during the summer of 2019, during which they will collaborate with a Bloomberg team under the guidance of their research advisor to implement their research into one of the company’s applications.
“It’s important to expose people to real-world business problems because it provides better context for how you can impact people’s work,” said Gillian Chin, a data scientist with the Office of the CTO at Bloomberg and a member of the Fellowship committee. “Many people traditionally do so via summer internships, but at Bloomberg, we wish to invest in the applications of ideas borrowed from academic research on a much larger scale. In academia, you tend to have simple applications that are short-term in nature; the more difficult questions we’d like to investigate are complex, with multiple dependencies and constraints, and with a much longer time horizon in terms of execution and impact on our users.”
As an introduction to Bloomberg, the Fellows traveled to New York City for three days to meet with the company’s data science teams and the mentors they will be working with during their upcoming internship. In addition to learning about the variety of work and depth of data science research being done at Bloomberg, the goals of Bloomberg’s data science group, and how Bloomberg operates, the Fellows also had the opportunity to meet the company’s founder, Michael Bloomberg.
“There are a lot of new, interesting challenges that I haven’t thought about in academia – solving these challenges could have a real impact on Bloomberg’s customers,” said Huazheng Wang, a Ph.D. candidate in the Department of Computer Science of University of Virginia. “The most interesting thing is I get to work with Bloomberg researchers on these problems, which are rarely seen in academia.”
The fellowship provides exposure to the finance industry and Bloomberg’s broad, high-quality data. Most research in academia is conducted on public data sets, but being able to utilize real-world data presents new challenges and issues that the Fellows will address in their work.
For example, privacy has a different meaning in finance than it does in other industries, and not all data can be shared in collaborations. “In finance, investors are worried that someone can get ahold of their perspective, algorithms, and ideas,” said Yuval Pinter, a Computer Science Ph.D. student in the School of Interactive Computing at Georgia Institute of Technology. “You don’t want people to know what you’re looking at when you’re investing.”
Markets move fast, and data needs to move at the same speed, while maintaining high accuracy. Other industries care about speed and accuracy, but not to the same degree as the finance sector. “We don’t generally talk about efficiency at the same scale as Bloomberg does,” said Tianze Shi, a Ph.D. candidate in Cornell University’s Department of Computer Science. “We care whether the algorithm can finish in a second, but I didn’t imagine that things worked at that pace. Plus, you have to maintain accuracy at the same time, which is a big challenge.”
Out of about 60 applications from doctorate students at universities around the world, a committee of Bloomberg’s data scientists from across the organization selected the following five Fellows for the 2018-2019 academic year:
Hongyuan Mei (Johns Hopkins University)
Modeling Market Events with a Neural Hawkes Process
Market events such as earning reports, news articles and stock movements can influence one another. Modeling the complicated dependencies among them can help us probabilistically predict future events and impute missing events. The neural Hawkes process is a new machine learning model that is good at capturing such dependencies. It can be scaled up to model real-world arenas where various players in multiple sectors stochastically generate many events with detailed properties, according to the roles of these players and their incomplete knowledge of the previous events.
Yuval Pinter (Georgia Institute of Technology)
Integrating Distributional, Compositional, and Relational Representations of Language
To understand words from a computing perspective means to represent them using mathematics. Pinter is combining three different approaches to obtain these representations and understand what words are based on different linguistic insights. The first is determining which words are used in the same context. For example, the word “dog” is often used with “bark,” “doghouse” and “bone.” The second approach views a word as the sum of its parts. Breaking words into their characters and finer grains creates sub-word components that can be added up to a representation, as the word “doghouse” is composed of the words “dog” and “house.” The third approach utilizes semantic networks that feature explicit knowledge. Linguists might say dogs are animals, which can be used to represent the word “dog” in a similar way to other animals. “Doghouse” is a type of structure. These can then be compared using a dictionary, leveraging the knowledge of how words are related to other words.
Shruti Rijhwani (Carnegie Mellon University)
Unsupervised Transfer for Entity Discovery and Linking
This standard natural language processing task puts words into a structure like an encyclopedia to help understand documents without having a human read them. Low-resource languages are those spoken by small populations that may not have access to social media or there is little data available to understand them. Entity linking is able to process documents in these languages automatically. The methods can be used on any language, including Spanish and Chinese, but this research focuses on languages such as Oromo and Tigrinya, which are spoken in Ethiopia; Sinhalese, which is spoken in Sri Lanka; and Kinyarwanda, which is spoken in Rwanda. Since there is a lack of training data available, information from high resource languages is used to try to develop NLP tools for these low resource languages.
Tianze Shi (Cornell University)
Simple, Efficient, and Accurate Multi-lingual Parsing
Syntactic parsing analyzes sentence structures as a first step to understanding natural language text. For example, with the sentence “I like to swim,” knowing the structure that ‘I’ is the subject to the verbs “like” and “swim” can help an algorithm understand what people are talking about in their natural language text. This research is about designing simple, robust and fast algorithms for extracting these syntactic structures across many languages. It helps provide a deeper understanding of documents or other texts and can have a significant impact on downstream tasks.
Huazheng Wang (University of Virginia)
Collaborative Online Recommendation with Provable Guarantees
Different arms on slot machines give different payouts to players. The goal of this multi-armed bandit problem is to learn while playing which arm is most profitable, and applying this to news stories, understanding one person’s preferences to help refine which stories are recommended to a whole group of readers. Online recommendation systems learn from behavior feedback to estimate one person’s preferences, while a collaborative recommendation system learns from the human interactions within a person’s network and provides real-time feedback so that the system learns with humans in a loop without compromising their privacy. When two people are connected within a network, if one person engages frequently about a particular topic, then there’s a high likelihood of user collaboration and that the other person is interested in the same topic. This learning algorithm leverages connections between different users and their interests to more quickly identify a user’s interests. While the system may provide someone with random recommendations at first, the system will constantly learn and refine itself through human interactions to provide more personalized recommendations over time based on the group’s preferences.
“We received many great applications and saw many worthwhile candidates, and we had a very hard time narrowing this down to five Fellows,” said Chin. “We hope to grow this program in the future, because it’s an important investment in these young students and their work, in addition to the broader data science community.”