Google Summer of Code 2008

GSoC 08\' - Life is Good!

I was amongst the few (ok, a bit over 1000 folks) who got selected for the Google Summer of Code (GSoC) program for 2008 … and well W00T! =)

So GSoC is basically a way for Google to give back to the open source community (which it leverages in its own products and services) by creating a win-win (non-zero sum) situation for the Student and Open Source organizations.

Google hand-picks (this time around 175) open source organizations which have contribute most to the community. Then these organizations list some ideas they’d like developers to work on. Students (like me) then submit proposals based on these ideas or what we think would be good for the community. The proposals get evaluated and ranked. Google then allots slots (quota) to each organization and the top ranked proposals get selected. Google pays a stipend to the student developers and they also give away some Google Swag, like a neat GSoC tee. Upon successful completion of the project, you get a certification from Google.

So I am of course working for “Creative Commons”. After doing research on the community for 2 years, it’s only fair I get involved the use what I’ve learnt to help the community the best I can. I wanted to code a licensing application for them which would’ve technically challenging but not analytical. They already had such an application under development. So my second idea way to automate web publishing of dynamically calculated metrics based on data (on CC usage) collected by Python scripts. Again all technical and not really analytical like most GSoC projects. (Well not analytical since I already did these metrics last academic term, so this would just be a matter of automating it).

Finally Mike (the Vice-President of CC) suggested the idea of looking at the various CC logs (license chooser, deeds, image logs etc). So my final proposal was to look at the logs and analyze them. If there were interesting metrics to be found, I’d automate the analysis process and publish them to the CC Metrics portal.

This project is deceptively simple. Although much simpler to implement than a licensing application, this project also has an analytical component and relies on my finding the metrics and correlations in the myriad of logs. The analysis shouldn’t only be meaningful, but also useful. Remember correlation is *not* causation.

And since it still seemed all too easy I went ahead an turned it up a notch. The analysis will be granular, to the level of per-jurisdiction and per-version and per-license type wherever applicable. Now given that there are 35+ jurisdictions, 6+ license types and upto 5 different versions, this would lead a lot many ways of looking at the same data and breaking it up.

Also I’ll probably have to use python to automate the process as the rest of their scripts are in Python.

I have little experience with Python. The one and only time I used it was to code the CC_Totals_Estimate method for Creative Commons! It was a Python implementation of the method Prof. Giorgos came up with to estimate the total number of CC licensed items by using figures from Yahoo, Google, Flickr (and other large repositories of CC content). So that should be fun

This isn’t a typical GSoC project – in the fact that it is not technically daunting – but it’ll take quite as much intellectual capacity and this is what the community needs. If through my work we can find “where the hiccups in the ‘license chooser’ are”, or “how the deeds are influencing license choice” or anything that makes CC better, I’d say its a summer well spent.

Hoping to find interesting stuff in the logs and coming up with good indicative never-heard-of-before metrics!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: