As a researcher in the Office of the CTO at Bloomberg, Amanda Stent uses her doctorate in computer science and expertise in Natural Language Processing (NLP) and machine learning to engineer products that parse, identify and respond to human language. Just ahead of WSDM 2017 (February 6-10, 2017) in Cambridge, England, a top international conference sponsored by Bloomberg that highlights the latest web inspired research involving search and data mining, we spoke with Amanda, who joined Bloomberg in August 2016. The conversation — which covers the growth of NLP efforts at Bloomberg, the challenges women in tech are facing and how she once tried to program a computer to comprehend cartoons in The New Yorker — was edited for length and clarity.
Why is it important to engage with the NLP community at conferences like WSDM and ACL (Association of Computational Linguistics), which is also sponsored by Bloomberg?
These are prestigious events where we can learn about what everyone is up to in the field and share what we’re working on. At WSDM, for example, my London-based colleague, Edgar Meij, will lead a tutorial session about information retrieval in knowledge graphs. We will also have a booth set up at the event for potential candidates interested in speaking with us about working at Bloomberg. I’m also personally excited that some of my prior research into lightweight multilingual entity linking is being published at WSDM.
In addition to promoting NLP and machine learning within Bloomberg, we’ve also developed a Data Science Speaker Series and the Bloomberg Data Science Research Grant Program that focus on these topics. We’ve also launched an official Guild around NLP internally to help tackle NLP problems throughout the company, contribute to internal NLP software, and encourage coding and design best practices around NLP solutions.
What are some innovative NLP systems now available at Bloomberg?
Bloomberg uses NLP techniques in several areas to extract relevant, meaningful, and tradable information (pricings, earnings, major events) in real-time for our clients. For example, we can automatically identify references to people, organizations or locations in news stories and social media, and use this to feed search and recommendations for news. We apply similar technology to the extraction of financial information, such as earnings per share or revenue, from corporate filings such as earnings reports.
What are some of the challenges facing NLP and machine learning scientists at Bloomberg?
One challenge is handling the unique types of language, unique entities and unique relationships in the financial industry. There is very little publicly available data to train statistical models for this kind of language use. So we are always looking for intelligent ways to adapt our models from “general-purpose” language to financial language, and to leverage the human effort of our Bloomberg data scientists to improve our NLP systems.
Another challenge is speed – in the financial world, the faster you have information, the more leverage you have. Our goal is to have the fastest, most precise, financial text analytics platform in the world.
What’s your role in advancing NLP systems at Bloomberg?
I partner with our Engineering teams on their strategy and approach to NLP. In addition to helping them explore new technologies, I also consult and work closely with our Product teams to understand how we can deploy NLP to serve our clients’ needs.”
We are always working on scaling our NLP platform and coming up with cool applications to meet the needs of users on the core terminal. For efficiency, we allow engineers to plug in open source solutions; we also build state-of-the-art deep learning based models in-house.
Explain deep learning.
It’s a fancy new term for an old technology known as neural networks, which have become increasingly popular due to advances in hardware (specifically, GPUs and GPU clusters) and processing speed. Think of it as a way to model more complex non-linear relationships within data sets. With deep learning, we can create very precise models for NLP with much less feature engineering than before.
Women are under-represented and face unique hurdles in building a career in tech. What is your perspective, and what has your experience been at Bloomberg as a female computer scientist?
Within IT and computer science overall, we need more diversity in terms of gender, ethnicity and background. It is a shame that such a small percentage of the population makes decisions about the future of technology in our world. I believe that the more voices we have in the room, the more likely companies will be able to innovate and succeed.
At Bloomberg, for only the second time in my career outside of academia, I find that I’m not the only woman in the room. It’s also encouraging that the company supports a healthy work-life balance for everyone, allowing us time to attend tech meetups and conferences, and to further our education.
Imagine Bloomberg in 2020. What will NLP and data scientists have accomplished?
I think there will be more natural ways to communicate with the Bloomberg Professional Service (aka the Terminal) and engage with workflows. Instead of having to remember multiple commands, a customer might be able to query different functions simply by chatting or typing a single request for exactly what information they’re seeking. There might even be a spoken interface to the terminal!
We already offer traders a rich source of information they can use to make smarter decisions and base their transactions on, but we can provide even more information about correlations between financial events and entities, and explanations for financial phenomena. Deep learning and NLP will have an impact on all of these developments – and because, at Bloomberg, humans will always be in the loop, we can provide truly world-class financial analytics, data, and natural language interfaces.
A few years ago, you wrote a research paper about The New Yorker’s popular weekly cartoon caption contest to see if NLP could help select a winner. What did you discover?
The New Yorker currently has a single person sift through the roughly 5,000 entries each cartoon receives to find the best three captions for readers to vote on. Given the significant amount of data available, we wondered if it would be possible to simplify this task by clustering and ranking captions in order to automatically pick the three finalists. Although machine learning and NLP can help rank captions, we learned that humor remains an integral part of human intelligence for which nobody has, thus far, been able to crack the code – especially when it comes to cartoons in The New Yorker. I’m sure that person is still sitting there, likely wearing a t-shirt that reads “Show Me The Data.”