Designing Better Approximations for Machine Learning

In the era of Machine Learning (ML), Natural Language Processing (NLP), and Artificial Intelligence (AI), high-dimensional matrices are ubiquitous. In these fields (and in many others), matrices are used to represent both data and models. For example, search engines can model documents as rows in a matrix. In genome-wide association studies, an individual’s genome (approximately 3 billion base pairs) is modeled as a row, resulting in matrices that have up to millions of rows and billions of columns.

The determinant of a matrix – the (signed) volume of the n-dimensional parallelepiped formed by it – is one of its most important quantities. Since its invention by Cardano and Leibniz in the late 16th century, the determinant has been a fundamental mathematical concept with countless applications in numerical linear algebra and scientific computing. For example, the determinant appears in the popular multivariate Gaussian distribution. With the popularity of ML/NLP/AI, there is increased applicability of algorithms that compute – exactly or approximately – matrix determinants. Determinant computation is compute intensive: it can take O(n3) time to calculate the determinant of a (n,n)-dimensional matrix. Therefore, there is significant demand for rapidly approximating the determinant of matrices so that the final results are still accurate with high probability.

Senior research scientist Anju Kambadur, who leads Bloomberg’s NLP group, has been investigating the use of randomization to approximate a matrix’s determinant and its logarithm and to quantify the effect of such approximations in different ML applications. This fall, two research articles he co-authored will be published in the journal Linear Algebra and its Applications (LAA) and at the 34th International Conference on Machine Learning (ICML 2017), respectively.

In the first paper titled A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix, he and his colleagues devise a fast approximation algorithm based on the Taylor series. Their algorithm runs in:

Here nnz(A) is the number of non-zeros in the matrix A, which is a (n, n) dimensional symmetric positive-definite (SPD) matrix, 0 < δ < 1 represents failure probability and (integer) m and (real) ε are accuracy parameters. In addition, their algorithm provides rigorous theoretical convergence analysis and works for all SPD matrices. Detailed experiments with dense and sparse matrices prove that their algorithm is accurate in approximating the logarithm of the determinant of SPD matrices and that the approximation runs much faster than the exact algorithm.

As Anju says, “the true power of our approximation algorithm is visible when working with sparse matrices, such as those in NLP. Computing exact determinants is not only computationally expensive, but also can introduce non-zeros in previously zeroed out positions of the matrix, thereby increasing memory requirements by an order of magnitude.” This paper is available online and will appear in print in the journal later this fall.

In the second paper titled Faster Greedy MAP Inference for Determinantal Point Processes, Anju and his colleagues devise approximation algorithms for the maximum a posteriori (MAP) inference of determinantal point processes (DPPs) building on ideas from the aforementioned work on approximating the logarithm of the determinant of SPD matrices. DPP is a probabilistic model for subset selection from a finite ground set of n items that captures both quality and diversity. Their algorithm runs in O(n3) time, which is a huge gain when compared to the O(n4) time taken by the exact algorithm. Their work is the first known to develop faster greedy algorithms specialized for the inference of DPP.

In their experiments, they demonstrate the speed and accuracy of their approximations on synthetic and real-world tasks and datasets, which include the NLP task of “matched summarization.” This task, which aims to extract pairs of sentences from two documents that are similar to each other and to summarize the documents, gives useful information for comparing the texts addressed at different times by the same speaker. In this case, they looked at transcripts of debates in 2016 U.S. Republican presidential primaries. The results demonstrated that their approximations lost little in accuracy when compared to the exact method, while achieving significant speedups. This paper is being presented at the 2017 edition of ICML in Sydney, Australia this week.

Anju expects to continue working in the area of fast and accurate approximations. He says, “With an increase in the amount of data that is available and with machine learning models getting bigger, redundancies are unavoidable. These redundancies are also expensive: they can cost unnecessary compute time and memory. Using approximations is one way to cut down labeling, computation, storage and memory costs.”