Bloomberg proudly announces its newest class of early-career researchers to receive the Bloomberg Data Science Ph.D. Fellowship for 2020-2021. These three doctoral students are engaged in research that complements ongoing artificial intelligence projects at Bloomberg. They will receive financial and professional support to pursue their research interests over the course of the academic year as they work towards the completion of their doctoral degrees.
The fellowship program is an extension of Bloomberg’s Data Science Research Grant Program, which has promoted relationship building between the company and academic researchers around the world in the fields broadly construed as AI, such as machine learning (ML), natural language processing (NLP), information retrieval (IR), recommender systems, time series analyses, and optimization. Working with the company’s CTO Office, more than 50 data scientists, machine learning engineers, and AI researchers participated in the review of more than 100 fellowship applications this year. They selected three recipients on the basis of their outstanding proposals and academic references. Each of this year’s Fellows displayed exceptional novelty in their fields and delivered proposals of relevance to Bloomberg with significant business potential.
Now in its third year, the competitive fellowship has consistently attracted motivated talent. “It is exciting to see the program evolve with more and more exceptional and innovative projects from new cohorts,” said Jing Wang, a senior researcher in Bloomberg’s AI Engineering group who works on enrichment of news and social media content using novel natural language processing methods.
Minjie Xu, a senior researcher in the AI Engineering group who works on clustering and summarization, notes that each year’s crop of Fellows has demonstrated the desire to make a “tangible impact” on the world. Fellows work closely with Bloomberg’s team of AI scientists and engineers, who provide support over the course of the academic year as mentors or co-authors.
“I self-identify more as a collaborator than a mentor when working with them,” Xu said of the Fellows. “On the technical side, we often identify similar problems proactively and even come up with similar solutions. Discussions feel like a breeze and we constantly get inspired by each other’s ideas.”
The Fellows also bring new perspectives and fresh eyes to problems that Bloomberg’s researchers are working on, according to Chen-Tse Tsai, a senior researcher who works on pattern recognition in the AI Engineering group.
“The reviewing process itself is very fulfilling, as we get to use our experience to gauge the soundness of different proposals from the top Ph.D. students and their advisors in the field. We then get to help guide the selected proposals with the experience we’ve gained at Bloomberg working on real products.”
Data Science Fellows receive the opportunity to renew their fellowship annually for up to three years. In the summer of 2021, the three winners will participate in an internship program to leverage their research aims in Bloomberg services or products.
Alexander Spangher, a Ph.D. candidate in the Department of Computer Science at the USC Viterbi School of Engineering, notes that the Fellowship offered a unique opportunity to further his work in NLP and computational journalism. “There is no other journalism company that has 2,800 reporters on staff, deep engineering expertise and senior AI/NLP researchers, and vast and reliable high-quality data resources,” he says. “Real-world data is readily available, product pipelines are swift and robust, and the opportunity to have an impact is real.”
Spangher’s interest in journalism dates back to his high-school days, but his current research interests emerged from his work as a data scientist at The New York Times, where he received an inside look at the challenges traditional newsrooms face in sourcing and gathering material under tight budgets and in an evolving media landscape. Spangher’s research involves building content models for stories that highlight gaps in information, identify potential sources, and retrieve relevant background information. As a Data Science Fellow, Spangher will develop human-in-the-loop pipelines to aid journalists in finding new stories, sources, and information throughout the writing process.
With the support of Bloomberg’s real-world financial data, Sheena Panthaplackel of the University of Texas at Austin’s Department of Computer Science seeks to use her fellowship to probe the interplay between software and natural language. For instance, engineers invest significant time and resources to write source code comments to document crucial data about software usage and error. As Panthaplackel notes, this task can be streamlined by developing AI-driven tools that harness natural language elements which emerge from human-generated search queries, comments, and error reports.
Panthaplackel’s research is intended to facilitate software development with tasks including code comprehension, code maintenance, and issue resolution through natural language elements. This will reduce confusion, minimize software vulnerability, and ultimately save time in the software development process. “With Bloomberg technology being used across the financial markets, Bloomberg engineers and researchers take on an extremely important role, which requires delivering efficient and reliable software products and also quickly adapting software for evolving market conditions,” Panthaplackel says. “I hope to evaluate how my research can be applied to facilitate the workflow for developing this financial software.”
As a Data Science Fellow, Difan Zou, a grad student in the UCLA Statistical Machine Learning Lab (UCLA-ML), which is part of the Department of Computer Science at the UCLA Samueli School of Engineering, intends to employ machine learning tools to enhance Bloomberg’s financial products and to collaborate with other researchers to solve real-world problems of import to the company via machine learning. Zou’s research focuses on Graph Convolutional Networks, or GCNs, which produce representations that support recommender systems. However, GCNs typically do not scale to large graphs. Over the course of the fellowship, Zou hopes to build efficient training systems to bring GCNs to scale, with the intention of surpassing the current leading models.
Each of these three projects offers a unique contribution to its field and demonstrates significant business viability. Further information about each project can be found below.
Looking back at our earlier Ph.D. Fellows’ work
The four 2019-2020 Ph.D. Fellows all completed their internships. They and the five 2018-2019 Ph.D. Fellows have all had their fellowships renewed. To date, the Fellows have published more than 20 papers during their fellowship, with many of them including Bloomberg mentors as co-authors. Some of this past year’s examples include:
- Cornell University’s Tianze Shi, who worked with Senior Research Scientists Igor Malioutov & Ozan Irsoy on “Semantic Role Labeling as Syntactic Dependency Parsing,” a paper accepted to The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020).
- The University of Massachusetts Amherst’s Katherine Keith, who worked with her mentors Christoph Teichmann and Edgar Meij, as well as her Ph.D. advisor Brendan O’Connor, on “Uncertainty over Uncertainty: Investigating the Assumptions, Annotations, and Text Measurements of Economic Policy Uncertainty,” a paper which was accepted to the NLP+CSS 2020 Workshop at EMNLP 2020.
- The University of Virginia’s Huazheng Wang, who worked with Bloomberg’s Qian Zhao and Shubham Chopra and UVA’s Professor Hongning Wang on “Global and Local Differential Privacy for Collaborative Bandits,” which was published at the Fourteenth ACM Conference on Recommender Systems (RecSys 2020).
- Johns Hopkins University’s Hongyuan Mei, whose paper “Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification” was written with Bloomberg’s Minjie Xu and JHU’s Professor Jason Eisner. Published during the Thirty-Seventh International Conference on Machine Learning (ICML 2020), this paper was recently highlighted in FORTUNE’s Eye on A.I. newsletter.
- Shruti Rijhwani of the Language Technologies Institute at Carnegie Mellon University published “Temporally-informed Analysis of Named Entity Recognition” at the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020) with her co-author Daniel Preoţiuc-Pietro.
Another unique aspect of collaborating with the Data Science Fellows is that Bloomberg’s researchers also get to work closely with the fellows’ Ph.D. advisers during the process, all the way from idea generation to paper writing, so this program benefits them as professionals in the field as well.
“During my collaboration with Hongyuan, I found the interactions I had with Professor Jason Eisner, his Ph.D. advisor who is a well-respected pioneering thinker and renowned researcher in the field, to be hugely insightful and eye-opening,” noted Xu. “Having such an opportunity to work closely with leading figures in the field is certainly a very rewarding experience for AI researchers like me.”
In recognition of this – and the current challenges facing academia in light of the pandemic – Bloomberg offered each of the Fellows’ academic advisors an unrestricted gift of $15,000 to help support their student and the doctoral work which Bloomberg is funding.
Bloomberg Data Science Fellowship Recipients, 2020-21
Out of 102 applications from doctorate students at universities in the United States, Canada, European Economic Area (EEA), European Union (EU), United Kingdom or Switzerland, a committee of Bloomberg’s data scientists from across the organization selected these three new Fellows for the 2020-2021 academic year:
Sheena Panthaplackel (University of Texas at Austin)
Learning to Facilitate the Interplay between Natural Language and Software
Software developers use various forms of natural language throughout the development process, such as search queries, comments, issue reports, and commit messages. Broadly speaking, my research is focused on understanding and supporting the role of such natural language elements in order to help developers with tasks like program comprehension, code maintenance, and issue resolution. Recently, I have been working on a machine-learning-based system that aims to reduce time-consuming confusion and vulnerability to bugs by keeping developers informed with up-to-date documentation in software projects.
Alexander Spangher (University of Southern California)
Rich Human-in-the-Loop News Article Generation
Computational journalism is an emerging field seeking to model traditional journalistic processes — in story finding, production, distribution, and evaluation — and offer enhancements using computational approaches. Such advances come at a critical time: journalists’ ability to play a watchdog role in society is severely endangered by industry contraction and budget shortfalls. I seek to use NLP to approach problems in Computational Journalism. Firstly, to help journalists find interesting documents and sources using corpora not typically analyzed in NLP research, like government filings, court cases, and law text. Secondly, to help editors analyze and improve the current news report by monitoring aspects not typically considered, for example source diversity and breadth of coverage. Thirdly, by defining problems, collecting datasets, and inspiring other AI/NLP researchers to embrace this emerging, socially beneficial field.
Difan Zou (University of California, Los Angeles)
Towards More Efficient and Robust Training of Graph Convolutional Networks for Large-Scale Recommender Systems
Graph convolutional networks have achieved state-of-the-art performances on many recommender system benchmarks. Their rapidly increasing size also brings in huge difficulties in training models due to high computational and memory costs. Additionally, the graph structure and node features are typically noisy, incomplete, or even corrupted. The goal of my research will be addressing these problems by developing scalable, efficient, and robust GCN training algorithms for large-scale recommender systems.