Data Science Q&A with David Rosenberg, Office of the CTO

Q: Tell us about your team…

David Rosenberg: I’m on the data science team in the Office of the CTO. One of our primary responsibilities is to help set the technical direction of Bloomberg as it relates to data science. This involves coordination, consulting, running projects, and both internal and external communication.

A major measure of our success is the extent to which Bloomberg maintains its technical expertise at state-of-the-art levels. We build strategic advantages that will let us continue to outpace the competition.

Q: What’s an example of a project that has “set direction” for Bloomberg?

DR: When I started at Bloomberg about a year ago, there was a lot of excitement about trying out neural network techniques, but not much was actually happening yet.

Neural networks were clearly the best technologies for computer vision and speech recognition problems, but there was some skepticism at the time whether neural network technologies would prove beneficial for Bloomberg’s use cases. So exploring neural networks at Bloomberg became my first major project.

Q: And how did it go?

DR: Successfully! But first we had to overcome some challenges.

Neural networks usually rely on graphics processing units (GPUs), which are not yet part of Bloomberg’s standard compute configurations. Plus, they are also quite expensive. So there was a catch-22 for teams that wanted to try out GPUs: to justify the expense and the hassle of this non-standard hardware, they needed to make the case that GPUs were worth the money and effort. But it’s difficult to make that case without actually running experiments on GPUs.

So the first part of the project was to get the hardware we would need for engineering teams to run their experiments, and the second part was to develop a community of people at Bloomberg interested in doing this type of work. I think we’ve been successful in both aspects.

We currently have a cluster of machines, each with six GPUs, which are available to anybody with a reasonable use-case. And we recently ordered additional machines, each with eight of the latest-generation GPUs. We expect these to be between two and five times faster than the previous generation.

I think we are already, or will soon be, at the point where computing power is not a limiting factor in our neural network research.

Simultaneously, I worked closely with a few Engineering teams to read papers about current work in this field and learned the methods with them. As they began to develop neural network projects, I acted as a consultant and served as someone they could bounce ideas off.

As a result of their work, we’re now up to over 15 active projects that all started in the last year. We have already had several compelling successes with neural networks.

Q: How do you ensure engineers have what they need?

DR: First, we’re in close communication with them on the problems they’re facing in their day-to-day efforts, and we try to find common threads in the challenges that teams are facing. For example, a common request we heard was that there should be a more consistent and coherent set of machine learning tools for our engineers to use. This led to a major project: for the last several months, CTO has worked closely with Engineering and Global Data to plan a data science platform for internal users. The platform will be a set of tools and software libraries that will significantly accelerate the creation of new products that use machine learning and natural language processing (NLP).

Beyond helping ensure that our engineering teams have the tools they need, we try to collaborate and consult with teams where we think we can provide assistance. To be helpful, it’s important for us to keep up with the latest research trends and developments by attending conferences, maintaining connections with academic researchers and reading the latest research publications.

Beyond keeping ourselves current, education is a big part of our mission: we want to ensure that our machine learning engineers are learning about the best of the latest research and have opportunities to form their own connections with academics. We do a whole lot in this direction. For example, we organize a Data Science Seminar Series, in which we have a top-flight data science researcher visit us for a day, at least once per month.  We also build connections with academics by giving yearly research grants.

Q: What machine learning problems are you working with Engineering on?

DR: We want to make it easier for computers to pull information out of documents, and to get that information into the Terminal in a structured format. Right now, there’s a lot of manual effort in the process because accuracy is so important. We want to decrease the amount of human effort so people can work on other tasks. So automation is a big focus area.

Another major objective is improving discoverability across the Terminal. We want to make using the Terminal intuitive and easy, so a new user can be effective right away. As part of that, we want people to be able to search using “natural language.” This will let users write exactly what they are trying to get information about, instead of having to remember mnemonics. Our current system allows them to go beyond keywords and use phrases or sentences. The challenge is to make sure users find exactly what is pertinent to them.

Increased discoverability allows terminal users to type exactly what they want news about -- without mneumonics.

Q: You teach machine learning and computational statistics as part of the masters in data science program at NYU. You were named the Center for Data Science Professor of the Year in 2015 and 2016 — the students chose you! What lessons do you take from Bloomberg and deliver to your students?

We look at how machine learning problems present themselves in practice and the real-life steps one takes to solve a problem.

In class homework or a project, you keep trying methods until your performance on a particular problem is at some acceptable level. But in real life, in a business setting, the problem itself may not be completely defined. The way you state the problem evolves as you understand it better. The way you evaluate your solution evolves as you understand the problem better.

So it’s a much more dynamic and fluid cycle. Presenting that broader viewpoint is useful. And on a more technical level, it’s helpful to share first-hand experience about the successes I’ve had or not had with various methods.

Q: What do you do when you’re not in front of a computer?

DR: I am always enjoying some kind of physical hobby. These days I’m into handstands and other gymnastics strength training. I also enjoy a number of TV shows: “Game of Thrones,” “Archer” and “Veep,” to name a few.

This interview with David Rosenberg was conducted and condensed by Michael Smith, who works in Bloomberg Digital Video Programming.