Browsing All Posts filed under »scikits.learn«

The nasty bug crawling in my Orthogonal Matching Pursuit code

November 18, 2011


A while back, Bob L. Sturm blogged about a similar implementation of OMP to the one in scikit-learn. Instead of using the Cholesky decomposition like we did, his Matlab code uses the QR decomposition, to a similar (or maybe even identical) outcome, in theory. So lucky that Alejandro pointed out to him the existence of […]

Dictionary learning in scikit-learn 0.9

September 19, 2011


Thanks to Olivier, Gaël and Alex, who reviewed the code heavily the last couple of days, and with apologies for my lack of activity during a sequence of conferences, Dictionary learning has officially been merged into scikit-learn master, and just in time for the new scikit-learn 0.9 release. Here are some glimpses of the examples […]

Optimizing Orthogonal Matching Pursuit code in Numpy, part 2

August 11, 2011


EDIT: There was a bug in the final version of the code presented here. It is fixed now, for its backstory, check out my blog post on it. When we last saw our hero, he was fighting with the dreaded implementation of least-angle regression, knowing full well that it was his destiny to be faster. […]

Optimizing Orthogonal Matching Pursuit code in Numpy, part 1

August 7, 2011


After intense code optimization work, my implementation of OMP finally beat least-angle regression! This was the primary issue discussed during the pull request, so once performance was taken care of, the code was ready for merge. Orthogonal matching pursuit is now available in scikits.learn as a sparse linear regression model. OMP is a key building […]

Progress on Orthogonal Matching Pursuit

August 2, 2011


Since orthogonal matching pursuit (OMP) is an important part of signal processing and therefore crucial to the image processing aspect of dictionary learning, I am currently focusing on optimizing the OMP code and making sure it is stable. OMP is a forward method like least-angle regression, so it is natural to bench them against one […]

SparsePCA in scikits.learn-git

July 19, 2011


I am happy to announce that the Sparse PCA code has been reviewed and merged into the main scikits.learn repository. You can use it if you install the bleeding edge scikits.learn git version, by first downloading the source code as explained in the user’s guide, and then running python install. To see what code […]

K-Means for dictionary learning

July 10, 2011


One of the simplest, and yet most heavily constrained form of matrix factorization, is vector quantization (VQ). Heavily used in image/video compression, the VQ problem is a factorization where (our dictionary) is called the codebook and is designed to cover the cloud of data points effectively, and each line of is a unit vector. This […]

Image denoising with dictionary learning

July 7, 2011


I am presenting an image denoising example that fully runs under my local scikits-learn fork. Coming soon near you! The 400 square pixels area covering Lena’s face was distorted by additive gaussian noise with a standard deviation of 50 (pixel values are ranged 0-256.) The dictionary contains 100 atoms of shape 4×4 and was trained […]

Sparse PCA

May 23, 2011


I have been working on the integration into the scikits.learn codebase of a sparse principal components analysis (SparsePCA) algorithm coded by Gaël and Alexandre and based on [1]. Because the name “sparse PCA” has some inherent ambiguity, I will describe in greater depth what problem we are actually solving, and what it can be used for. […]

Customizing scikits.learn for a specific text analysis task

April 29, 2011


Scikits.learn is a great general library, but machine learning has so many different application, that it is often very helpful to be able to extend its API to better integrate with your code. With scikits.learn, this is extremely easy to do using inheritance and using the pipeline module. The problem While continuing the morphophonetic analysis […]