The nasty bug crawling in my Orthogonal Matching Pursuit code

November 18, 2011


A while back, Bob L. Sturm blogged about a similar implementation of OMP to the one in scikit-learn. Instead of using the Cholesky decomposition like we did, his Matlab code uses the QR decomposition, to a similar (or maybe even identical) outcome, in theory. So lucky that Alejandro pointed out to him the existence of […]

Sampling Gamma random variates through the ratio-of-uniforms method

October 9, 2011


One year ago I had the chance to take a class on Monte Carlo simulation with prof. Ion Văduva, and my assignment for the class was to implement exactly what it says in the title of the blog post. I am going to walk you through the idea behind this. General formulation The ratio-of-uniforms is […]

Posted in: python

RANLP 2011 in Hissar, BG

September 20, 2011


Last week was marked by the international RANLP (Recent Advances in Natural Language Processing) conference, taking place in a nice spa in Hissar, Bulgaria. The excellent folks from the computational linguistics group at the University of Wolverhampton were behind it, together with the Institute of Information and Communication Technologies from the Bulgarian Academy of Sciences. […]

Posted in: conferences, nlp

Dictionary learning in scikit-learn 0.9

September 19, 2011


Dictionary learned from Lena patches

Thanks to Olivier, Gaël and Alex, who reviewed the code heavily the last couple of days, and with apologies for my lack of activity during a sequence of conferences, Dictionary learning has officially been merged into scikit-learn master, and just in time for the new scikit-learn 0.9 release. Here are some glimpses of the examples […]

Long overdue update. EuroScipy and SSLST 2011

September 5, 2011


Anybody reading my blog should have expected me to blog about the end of my GSoC. Sorry to disappoint, but I simply did not experience anything similar to an ending. On the contrary, I feel like things have barely started. Also, I apologize for one of the few posts here without pretty pictures! 🙂 For […]

Posted in: Uncategorized

Optimizing Orthogonal Matching Pursuit code in Numpy, part 2

August 11, 2011


EDIT: There was a bug in the final version of the code presented here. It is fixed now, for its backstory, check out my blog post on it. When we last saw our hero, he was fighting with the dreaded implementation of least-angle regression, knowing full well that it was his destiny to be faster. […]

Optimizing Orthogonal Matching Pursuit code in Numpy, part 1

August 7, 2011


After intense code optimization work, my implementation of OMP finally beat least-angle regression! This was the primary issue discussed during the pull request, so once performance was taken care of, the code was ready for merge. Orthogonal matching pursuit is now available in scikits.learn as a sparse linear regression model. OMP is a key building […]

Progress on Orthogonal Matching Pursuit

August 2, 2011


Since orthogonal matching pursuit (OMP) is an important part of signal processing and therefore crucial to the image processing aspect of dictionary learning, I am currently focusing on optimizing the OMP code and making sure it is stable. OMP is a forward method like least-angle regression, so it is natural to bench them against one […]

Posted in: scikits.learn

SparsePCA in scikits.learn-git

July 19, 2011


I am happy to announce that the Sparse PCA code has been reviewed and merged into the main scikits.learn repository. You can use it if you install the bleeding edge scikits.learn git version, by first downloading the source code as explained in the user’s guide, and then running python install. To see what code […]

Posted in: scikits.learn

K-Means for dictionary learning

July 10, 2011


Dictionary learned with K-Means on the LFW dataset with whitening PCA

One of the simplest, and yet most heavily constrained form of matrix factorization, is vector quantization (VQ). Heavily used in image/video compression, the VQ problem is a factorization where (our dictionary) is called the codebook and is designed to cover the cloud of data points effectively, and each line of is a unit vector. This […]