GSoC 2011 student

A while back, Bob L. Sturm blogged about a similar implementation of OMP to the one in scikit-learn. Instead of using the Cholesky decomposition like we did, his Matlab code uses the QR decomposition, to a similar (or maybe even identical) outcome, in theory. So lucky that Alejandro pointed out to him the existence of […]

*September 19, 2011*

Thanks to Olivier, Gaël and Alex, who reviewed the code heavily the last couple of days, and with apologies for my lack of activity during a sequence of conferences, Dictionary learning has officially been merged into scikit-learn master, and just in time for the new scikit-learn 0.9 release. Here are some glimpses of the examples […]

*August 11, 2011*

EDIT: There was a bug in the final version of the code presented here. It is fixed now, for its backstory, check out my blog post on it. When we last saw our hero, he was fighting with the dreaded implementation of least-angle regression, knowing full well that it was his destiny to be faster. […]

*August 7, 2011*

After intense code optimization work, my implementation of OMP finally beat least-angle regression! This was the primary issue discussed during the pull request, so once performance was taken care of, the code was ready for merge. Orthogonal matching pursuit is now available in scikits.learn as a sparse linear regression model. OMP is a key building […]

*July 10, 2011*

One of the simplest, and yet most heavily constrained form of matrix factorization, is vector quantization (VQ). Heavily used in image/video compression, the VQ problem is a factorization where (our dictionary) is called the codebook and is designed to cover the cloud of data points effectively, and each line of is a unit vector. This […]

*July 7, 2011*

I am presenting an image denoising example that fully runs under my local scikits-learn fork. Coming soon near you! The 400 square pixels area covering Lena’s face was distorted by additive gaussian noise with a standard deviation of 50 (pixel values are ranged 0-256.) The dictionary contains 100 atoms of shape 4×4 and was trained […]

*May 23, 2011*

I have been working on the integration into the scikits.learn codebase of a sparse principal components analysis (SparsePCA) algorithm coded by Gaël and Alexandre and based on [1]. Because the name “sparse PCA” has some inherent ambiguity, I will describe in greater depth what problem we are actually solving, and what it can be used for. […]

*April 15, 2011*

My GSoC proposal is titled “Dictionary learning in scikits.learn” and in the project, I plan to implement methods used in state of the art research and industry applications in signal and image processing. In this post, I want to clarify the terminology used. Usually the terms dictionary learning and sparse coding are used interchangably. Also […]

November 18, 20111