Computational Journalism Challenges

Table of contents

Kind of like last year's Search-Script-Scrape, except more open-ended and more demanding...but, with more context. The main purpose of these challenges is to not just continue honing your programming skills, but to acquaint you with the many datasets, and their many possibilities and limitations.

I'll add more context to this page, but basically, every week, you'll have to pick several problems to work on. You can work on them with a partner. However, do not team up as a class. And do not pick the same topics, over and over. Your ignorance of data is going to be the main limitation of your project work.

Tentative schedule

  Tier 1 Tier 2 Tier 3+
Week 2 2    
Week 3 3    
Week 4   1  
Week 5 2 1  
Week 6 1 1  
Week 7     1

Note: You can choose to do a single Tier 2 exercise rather than 2 Tier 1 exercises. Or a single Tier 3 exercise just for the fun of it. You don't get extra credit though. And you can't trade downwards, i.e. doing 2 T1 exercises instead of a single T2.

Tier challenge levels

Tier 1

These are meant to be one-off exercises that can generally be done in a single script. And the dataset to fetch and analyze is generally pretty small so that you don't have to worry too much about memory or disk space or bandwidth constraints. The output may be a chart or a few lines of text.

Tier 2

Think of it as a Tier 1 task, except made scalable. Instead of counting the number of crimes per precinct in a given year, you have to fetch a decade's worth of data, collate and wrangle it, and create 10 charts instead of just 1 – or at least, design a chart that can convey 10 times the data. You might also have to work with multiple datasets and APIs and find something interesting in their intersection.

Tier 3

These aren't harder than Tier 2 in a software engineering sense. But the answers are more open-ended and involve more explanation and thoughtfulness on your part. You'll have to be confident in being able to compartmentalize and organize your information…because you might have a lot of it in some of these challenges. Some of these challenges are also an opportunity to try more advanced tools and libraries, such as for statistical learning, that we don't really focus on for the core curriculum.

Tier 4

I don't know if all of these are necessarily harder than Tier 3 challenges. It's just that I generally have a good idea of how to efficiently solve all of the Tier 3 challenges. But the Tier 4 challenges have various unknown unknowns – or at least some kind of quantification that cannot be solved through math alone. So you might end up doing as much research and reporting as actual programming to figure out the right heuristics. Most of these probably aren't (satisfactorily) doable in a week.

Tier 1

Tier 2

Tier 3

Tier 4