I’ve recently learned a couple of neat tricks to process large amounts of text files more efficiently from my new co-worker @nicolaskruchten. Our use-case is efficiently going through tens of gigabytes of logs to extract specific lines and do some operation on them. Here are a couple of things we’ve done to speed things up.
With every New Year comes new resolutions, and mine was to revamp my blog. So here it is, shining new, and all with flat files. It's computed using the awesome little blogging engine called Blogofile. It's updated using Emacs and git, with a hook that automatically rebuilds every file from source files and Mako templates.
I migrated from a Zine blog which was leaking to death (probably due to a bad/weird paster setup of mine). The whole operation took about 5 hours, including the migration of my 20 other blog posts.
I was wondering how long it could take to write a multivariate classifier in python. With python and numpy it isn't long. We simply need to be able to compute the covariance matrix, the determinant and to inverse a matrix (covariance matrix). Even if the matrix is singular, which mean it can't inverse it, you can compute the pseudo-inverse (Moore-Penrose) easily (i.e.: numpy.linalg.pinv). As expected, assuming too much about the data lead to poor classification.
I am pretty busy these days with my new job ( I am heading a start-up in the telecommunications market ) but I took a couple of hours to craft a couple of scripts.br /br /The first script (ilastfm_gettracks/i) downloads all of a user's song track information from a href="http://last.fm/"Last.fm/a.