Yunus Saatçi

now a Staff Machine Learning Engineer at Robinhood

Email:

Publications

Data, modelling and inference in road traffic networks

Richard J. Gibbens, Yunus Saatçi, June 2008. (Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences). DOI: 10.1098/rsta.2008.0020.

Abstract▼ URL

In this paper, we study UK road traffic data and explore a range of modelling and inference questions that arise from them. For example, loop detectors on the M25 motorway record speed and flow measurements at regularly spaced locations as well as the entry and exit lanes of junctions. An exploratory study of these data helps us to better understand and quantify the nature of congestion on the road network. From a traveller’s perspective it is crucially important to understand the overall journey times and we look at methods to improve our ability to predict journey times given access jointly to both real-time and historical loop detector data. Throughout this paper we will comment on related work derived from US freeway data.

Scaling Multidimensional Gaussian Processes using Projected Additive Approximations

E. Gilboa, Yunus Saatçi, John P. Cunningham, 2013. (In 30th International Conference on Machine Learning).

Abstract▼ URL

Exact Gaussian Process (GP) regression has O(N3) runtime for data size N, making it intractable for large N. Many algorithms for improving GP scaling approximate the covariance with lower rank matrices. Other work has exploited structure inherent in particular covariance functions, including GPs with implied Markov structure, and equispaced inputs (both enable O(N) runtime). However, these GP advances have not been extended to the multidimensional input setting, despite the preponderance of multidimensional applications. This paper introduces and tests novel extensions of structured GPs to multidimensional inputs. We present new methods for additive GPs, showing a novel connection between the classic backﬁtting method and the Bayesian framework. To achieve optimal accuracy-complexity tradeoff, we extend this model with a novel variant of projection pursuit regression. Our primary result – projection pursuit Gaussian Process Regression – shows orders of magnitude speedup while preserving high accuracy. The natural second and third steps include non-Gaussian observations and higher dimensional equispaced grid methods. We introduce novel techniques to address both of these necessary directions. We thoroughly illustrate the power of these three advances on several datasets, achieving close performance to the naive Full GP at orders of magnitude less cost.

Scaling Multidimensional Inference for Structured Gaussian Processes

E. Gilboa, Yunus Saatçi, John P. Cunningham, 2015. (IEEE Transactions on Pattern Analysis and Machine Intelligence). DOI: 10.1109/TPAMI.2013.192.

Abstract▼

Exact Gaussian process (GP) regression has O(N3 runtime for data size N, making it intractable for large N. Many algorithms for improving GP scaling approximate the covariance with lower rank matrices. Other work has exploited structure inherent in particular covariance functions, including GPs with implied Markov structure, and inputs on a lattice (both enable O(N) or O(N log N) runtime). However, these GP advances have not been well extended to the multidimensional input setting, despite the preponderance of multidimensional applications. This paper introduces and tests three novel extensions of structured GPs to multidimensional inputs, for models with additive and multiplicative kernels. First we present a new method for inference in additive GPs, showing a novel connection between the classic backfitting method and the Bayesian framework. We extend this model using two advances: a variant of projection pursuit regression, and a Laplace approximation for non-Gaussian observations. Lastly, for multiplicative kernel structure, we present a novel method for GPs with inputs on a multidimensional grid. We illustrate the power of these three advances on several data sets, achieving performance equal to or very close to the naive GP at orders of magnitude less cost.

Comment: arXiv

Scalable Inference for Structured Gaussian Process Models

Yunus Saatçi, 2011. University of Cambridge, Department of Engineering, Cambridge, UK.

Abstract▼ URL

The generic inference and learning algorithm for Gaussian Process (GP) regression has O(N3) runtime and O(N2) memory complexity, where N is the number of observations in the dataset. Given the computational resources available to a present-day workstation, this implies that GP regression simply cannot be run on large datasets. The need to use non- Gaussian likelihood functions for tasks such as classification adds even more to the computational burden involved. The majority of algorithms designed to improve the scaling of GPs are founded on the idea of approximating the true covariance matrix, which is usually of rank N, with a matrix of rank P, where P<<N. Typically, the true training set is replaced with a smaller, representative (pseudo-) training set such that a specific measure of information loss is minimized. These algorithms typically attain O(P2N) runtime and O(PN) space complexity. They are also general in the sense that they are designed to work with any covariance function. In essence, they trade off accuracy with computational complexity. The central contribution of this thesis is to improve scaling instead by exploiting any structure that is present in the covariance matrices generated by particular covariance functions. Instead of settling for a kernel-independent accuracy/complexity trade off, as is done in much the literature, we often obtain accuracies close to, or exactly equal to the full GP model at a fraction of the computational cost. We define a structured GP as any GP model that is endowed with a kernel which produces structured covariance matrices. A trivial example of a structured GP is one with the linear regression kernel. In this case, given inputs living in RD, the covariance matrices generated have rank D – this results in significant computational gains in the usual case where D<<N. Another case arises when a stationary kernel is evaluated on equispaced, scalar inputs. This results in Toeplitz covariance matrices and all necessary computations can be carried out exactly in O(N log N). This thesis studies four more types of structured GP. First, we comprehensively review the case of kernels corresponding to Gauss-Markov processes evaluated on scalar inputs. Using state-space models we show how (generalised) regression (including hyperparameter learning) can be performed in O(N log N) runtime and O(N) space. Secondly, we study the case where we introduce block structure into the covariance matrix of a GP time-series model by assuming a particular form of nonstationarity a priori. Third, we extend the efficiency of scalar Gauss-Markov processes to higher-dimensional input spaces by assuming additivity. We illustrate the connections between the classical backfitting algorithm and approximate Bayesian inference techniques including Gibbs sampling and variational Bayes. We also show that it is possible to relax the rather strong assumption of additivity without sacrificing O(N log N) complexity, by means of a projection-pursuit style GP regression model. Finally, we study the properties of a GP model with a tensor product kernel evaluated on a multivariate grid of inputs locations. We show that for an arbitrary (regular or irregular) grid the resulting covariance matrices are Kronecker and full GP regression can be implemented in O(N) time and memory usage. We illustrate the power of these methods on several real-world regression datasets which satisfy the assumptions inherent in the structured GP employed. In many cases we obtain performance comparable to the generic GP algorithm. We also analyse the performance degradation when these assumptions are not met, and in several cases show that it is comparable to that observed for sparse GP methods. We provide similar results for regression tasks with non-Gaussian likelihoods, an extension rarely addressed by sparse GP techniques.

Gaussian Process Change Point Models

Yunus Saatçi, Ryan Turner, Carl Edward Rasmussen, June 2010. (In 27th International Conference on Machine Learning). Haifa, Israel.

Abstract▼ URL

We combine Bayesian online change point detection with Gaussian processes to create a nonparametric time series model which can handle change points. The model can be used to locate change points in an online manner; and, unlike other Bayesian online change point detection algorithms, is applicable when temporal correlations in a regime are expected. We show three variations on how to apply Gaussian processes in the change point context, each with their own advantages. We present methods to reduce the computational burden of these models and demonstrate it on several real world data sets.

Comment: poster, slides.

Adaptive Sequential Bayesian Change Point Detection

Ryan Turner, Yunus Saatçi, Carl Edward Rasmussen, December 2009. (In NIPS Workshop on Temporal Segmentation). Edited by Zaïd Harchaoui. Whistler, BC, Canada.

Abstract▼ URL

Real-world time series are often nonstationary with respect to the parameters of some underlying prediction model (UPM). Furthermore, it is often desirable to adapt the UPM to incoming regime changes as soon as possible, necessitating sequential inference about change point locations. A Bayesian algorithm for online change point detection (BOCPD) has been introduced recently by Adams and MacKay (2007). In this algorithm, uncertainty about the last change point location is updated sequentially, and is integrated out to make online predictions robust to parameter changes. BOCPD requires a set of fixed hyper-parameters which allow the user to fully specify the hazard function for change points and the prior distribution over the parameters of the UPM. In practice, finding the “right” hyper-parameters can be quite difficult. We therefore extend BOCPD by introducing hyper-parameter learning, without sacrificing the online nature of the algorithm. Hyper-parameter learning is performed by optimizing the marginal likelihood of the BOCPD model, a closed-form quantity which can be computed sequentially. We illustrate performance on three real-world datasets.

Comment: poster, slides.

Beam sampling for the infinite hidden Markov model

Jurgen Van Gael, Yunus Saatçi, Yee-Whye Teh, Zoubin Ghahramani, 2008. (In 25th International Conference on Machine Learning). Helsinki, Finland. Association for Computing Machinery.

Abstract▼ URL

The infinite hidden Markov model is a non-parametric extension of the widely used hidden Markov model. Our paper introduces a new inference algorithm for the infinite Hidden Markov model called beam sampling. Beam sampling combines slice sampling, which limits the number of states considered at each time step to a finite number, with dynamic programming, which samples whole state trajectories efficiently. Our algorithm typically outperforms the Gibbs sampler and is more robust. We present applications of iHMM inference using the beam sampler on changepoint detection and text prediction problems.