Four exceptional doctoral students, who are working in broadly-construed data science, including natural language processing (NLP), vision-and-language tasks, machine learning, and artificial intelligence, visited Bloomberg’s Global Headquarters in New York City in 2019 as part of the Bloomberg Data Science Ph.D. Fellowship. Bloomberg has benefited significantly from its first class of Fellows in 2018-2019, as well as its Data Science Research Grant Program, which builds relationships with academic researchers around the globe.
The goal of this fellowship is to engage professionals early in their careers and to provide support and encouragement for groundbreaking publications in both academic journals and conference proceedings. Today, we are pleased to announce the second class of Bloomberg Data Science Ph.D. Fellows.
A committee of Bloomberg’s data scientists from across the organization selected the Fellows based on their proposals’ technical resiliency and strengths, and recommendation letters from their academic advisors, some of whom accompanied the Fellows on their visit to Bloomberg. The committee’s decisions were based in part on the candidate’s diverse academic focus, with priority given to machine learning, NLP, information retrieval, knowledge graph, and quantitative finance; the quality of the ideas presented in the proposal; the candidate’s achievements and experience; and the idea’s potential business impact.
“Each project will advance the state of the art in their respective academic areas,” explained Songyun Duan, Head of Machine Learning Incubation in Bloomberg’s Office of the CTO. “The results will be published in top-tier conferences.”
During their Fellowship, each of the Fellows will work to advance their research and explore real-world applications that contribute to the innovative work leveraging data science and machine learning in Bloomberg’s products and services. The Fellows will also participate in an internship during the summer of 2020, during which they will collaborate with a Bloomberg team under the guidance of their research advisor to implement their research into one of the company’s applications to solve real-world problems and refine workflows.
As an introduction to Bloomberg, the Fellows traveled to New York City for three days to meet with their mentors and the company’s data science and AI engineering teams. During their visit, they learned more about the variety and depth of the company’s data science research and how the organization operates.
Katherine Keith, a Ph.D. student in Computer Science in the College of Information and Computer Sciences at the University of Massachusetts Amherst, was already familiar with Bloomberg through an internship during the summer of 2019 with Bloomberg’s Data Science team in the Office of the CTO, which offered a hands-on introduction to Bloomberg’s research. Through the relationships she developed during her internship, she published a paper at ACL 2019 titled “Modeling financial analysts’ decision making via the pragmatics and semantics of earnings calls” together with Amanda Stent, NLP Architect in Bloomberg’s Office of the CTO.
During her fellowship, Keith will further her research focused on improving natural language processing methods for computational social science applications. “I’m interested in improving social measurements of text, such as event extraction and extracting semantic and pragmatic signals from text, and improving methods that use these measurements in descriptive and causal inferences,” Keith said.
Contagion risk exists within the financial system when market participants utilize contracts to interact with each other, and the complex financial products that bind these networks create new ‘spiral’ risks that can potentially be measured and controlled with algorithms. Applying financial network analysis to real-world problems has been a challenge though, and Ariah Klages-Mundt, a Ph.D. candidate in applied math at Cornell University’s Center for Applied Mathematics, hopes to overcome hurdles by developing numerical methods and machine learning tools for sensitivity analysis and probabilistic measure of network risks. “I look forward to working with Bloomberg to see first-hand how these tools could realistically be used within the finance industry,” said Klages-Mundt.
Hao Tan, a fourth-year Ph.D. student in the University of North Carolina at Chapel Hill’s NLP Research Group (which is led by Assistant Professor Mohit Bansal), has worked on building the mapping from words and phrases to visual concepts, like objects and relationships, by designing tasks, building neural models, and developing training methods to learn this connection. At Bloomberg, where billions of data points are processed daily, Tan plans to explore the possibility of including visual information like images, videos and data plots by adapting methods developed for natural images to structural figures.
Machine learning algorithms can be used to solve sequential decision-making problems, as most interactive systems – like search engines – are improved through a recurrent loop with various stages involving learning from new data, improving the features and the model, and then testing the new system.
“My goal is to fundamentally speed up this process through new counterfactual inference techniques that move both learning and evaluation from ‘online’ to ‘offline’,” said Yi Su, a Ph.D. student in the Department of Statistics and Data Science (DSDS) at Cornell University, where she is advised by Professor Thorsten Joachims. “Since logged historical data is both biased and partial, machine learning algorithms on this partial information data can be highly sub-optimal. I’m interested in developing new estimators and algorithms to work on this partial-information setting.”
The 2018-2019 Ph.D. Fellows have all completed their internships and have had their fellowships renewed. Bloomberg has already benefited from the academic collaboration with strong Ph.D. candidates with interests similar to the company’s Data Science and AI Engineering teams. This included facilitating publications in top-tier conferences and improving the quality of the company’s products, where applicable, said Duan.
Notably, 2018-2019 Fellow Huazheng Wang, a Ph.D. candidate in the Department of Computer Science of University of Virginia School of Engineering and Applied Science, won the prestigious Best Paper Award at SIGIR 2019 for his work on “Variance Reduction in Gradient Exploration for Online Learning to Rank.” This research, which was supported by Bloomberg, was performed together with his three other students and Professor Hongning Wang, his faculty advisor at the University of Virginia.
Out of 79 applications from doctorate students at universities in the United States, European Union and United Kingdom, a committee of Bloomberg’s data scientists from across the organization selected these four Fellows for the 2019-2020 academic year:
Yi Su (Cornell University)
Off-Policy Evaluation and Learning for Interactive Systems
Search engines, recommender systems, and most other user interactive systems go through a recurrent loop of improvement. This loop typically involves learning from newly collected data, making improvements to the features and the model, and then testing the new system in an online A/B test. My goal is to fundamentally speed up this process through new counterfactual inference techniques that move both learning and evaluation from “online” to “offline.”
Katherine Keith (University of Massachusetts Amherst)
Constructing Subjective Knowledge Bases
The field of information extraction (IE) has made great strides in constructing knowledge bases by extracting facts from large collections of unstructured text. IE methods have been used in many applied settings, including my recent work building a database of police fatality victims. However, extracting facts implies discerning between different realities in order to determine what is “true”. Social scientists, journalists, policy makers, and financial investors may be more interested in understanding a populations’ shared and conflicting subjective beliefs and how they vary temporally and spatially. This leads to the following research questions:
- Can we extract propositions representing authors’ stated beliefs in order to construct subjective knowledge bases?
- Once we have extracted subjective propositions for individual authors, can we infer belief communities by clustering authors with similar beliefs?
Hao Tan (UNC Chapel Hill)
Summarizing Salient Content in Structured Documents with Figures
Describing content in structured images, i.e., plots, charts, and diagrams, is crucial when summarizing or searching over, for example, complex financial or legal news documents. It helps a layperson to understand the salient information inside these complex figures and also enables visually-impaired people to “see” the figure. It also enhances pure-text paragraph summarization systems by providing additional valuable information from figures inside the news/legal document and allows search/retrieval over documents containing such structured figures. Hence, we propose models that learn to generate informative and comprehensive summaries for such structured figures in complex documents, that capture salient and logically entailed (verified) information.
Ariah Klages-Mundt (Cornell University)
Learning Cascade Risks in Complex Economic Networks: Methods to Make Financial Network Analysis Practical for Application
Computational and sensitivity problems currently present a barrier to using network models to quantify risks in networks of interacting firms. For instance, sensitivity from parameter uncertainty is not currently well understood and network computations can be algorithmically hard. I am developing numerical methods and machine learning tools to address these problems. This work will help make financial network analysis practical for application in industry.