Spark Summit East featured Bloomberg executives presenting the following talk.
Spark at Bloomberg
Bloomberg has a strong reputation in the financial industry for providing lightning fast analytics on vast quantities of data. In this presentation, we talk about Bloomberg’s analytics stack and how Spark, with its formidable computational model for distributed, high-performance analytics, helps take this to the next level. We talk about the kinds of analytics that are being expressed in Spark and how these pose challenges in terms of what Spark is currently capable of, in terms of functionality and performance. At Bloomberg, instead of building isolated Spark applications for individual problem domains, we are looking at implementing a framework based approach to registering, discovering, and querying RDD/DFs and real-time data streams. RDDs/DFs in the framework are cataloged in a registry, which captures data provenance (backing stores and real-time streams) as well as analytical and domain specific metadata. This allows for composable analytics over continously updated data, with significantly less boilerplate code for data plumbing. The results of these analytics can be registered back in the catalog, to be leveraged in higher order analytics. With such a data catalog, connectors to various internal data systems and standardized serverization runtimes for hosted Spark applications, Spark can allow for seamless integration between disparate datastores and data domains. We round out this talk by discussing a few challenges with building analytics infrastructure over Spark – need for dynamic topic registration, efficient stream reconciliation with updateStateByKey and context sharing for low-latency analytics while achieving efficient resource utilization.
About Spark Summit
Spark Summit, the largest big data event dedicated to Apache Spark was back in NYC in 2016. Presentation were given by leading production users of Spark, Spark SQL, Spark Streaming and related projects.
For more details, click here.