Dicts, Lists, and JSON Quiz Part 1

Deliverables

Due date: Friday, April 17

By the end of this assignment, these things will have been done:

  • Inside your Github repo named compjour-hw, create a subfolder named: json-quiz/
  • Inside json-quiz/ will be five files, named and numbered as 1.py, 2.py, etc, each corresponding to the problems listed below.

Each problem has a list of tasks (lettered A through G, or so) and the expected output. When you run each script, it should generate the exact output as expected, i.e. you will be using print() calls.

Some of the problems have partial solutions. The first problem is completely solved for you as an example. Copy that into your repo (as json-quiz/1.py) and run it and see that you get the expected output.

Quick intro to JSON, Dicts, Lists, etc.

Understanding how text turns into data structures is pretty much fundamental to doing anything useful in programming, especially in the journalism/scientific domain.

Lists are how ordered collections of objects (numbers, strings, other data objects) are represented:

[1, 2, 3, 'bingo']

And Dictionaries are how Python represents objects with attributes, i.e. a car is an "object" and that "object" has attributes such as color, mpg, and make:

{"color" : "red", "mpg" : 20.5, "make": "Honda"}

In other words, lists and dictionaries can be used to describe pretty much every real-world object there is. Hence, a lot of programming involves turning real-life data objects into lists and dictionaries, so that they can be processed with code. And that's why we learn these data structures.

Why JSON

While there are several different textual data formats, JSON is the most ubiquitous among modern services, and the most versatile. And just as importantly, in Python (as well as Ruby and JavaScript), the dict (dictionary) and list structures look the same in code as they do in JSON text files.

From Python data object to JSON

For instance, this is a Python dictionary object:

d = {"apples": 20, "pears": 40, "kiwis": 90}
print(d['pears'])
# 40

This is how you write that data structure into a JSON formatted text file, which is referred to as, encoding the data object as JSON.

import json
d = {"apples": 20, "pears": 40, "kiwis": 90}
txt = json.dumps(d, indent = 2)
f = open("/tmp/test.json", "w")
f.write(txt)
f.close()

The resulting text file looks like this (i.e. pretty much identical):

{
  "apples": 20,
  "pears": 40,
  "kiwis": 90
}

From JSON to Python data object

A more frequent situation is that you'll have data in a JSON text file and you want to bring it into your Python program. This is referred to as parsing the JSON text file, or: decoding the JSON text file.

For example, the current Github status is found at this URL:

https://status.github.com/api/status.json

Its contents look something like this:

{"status":"good","last_updated":"2015-04-13T16:36:48Z"}

If we want to write a program that tells us if Github's status is good or not (i.e. a istheLtrainf—ed.com, but for Github), we have to:

  1. Fetch the text file from the URL
  2. Use Python's JSON parser to convert that text into a Python data object (in this case, a dict)

Here's one way to do it:

First, we have to get the contents of that URL (https://status.github.com/api/status.json) as a text string:

import requests
import json
xfile = requests.get("https://status.github.com/api/status.json")
# xfile is not a text file, but a Requests object, so 
#  we need this intermediary step:
txt = xfile.text
print(txt)
# {"status":"good","last_updated":"2015-04-13T16:36:48Z"}

At this point, txt is just text. To make it a dict data object, we use Python's json library to decode (or parse) that text string:

data = json.loads(txt)
print(data['last_updated'])
# 2015-04-13T16:36:48Z

print "Is Github f___ed?"
if data['status'] == 'good':
    print("Github is not f___ed")
else:
    print( "Github may be f___ed")

Helpful references

Zed Shaw's Learn Python the Hard Way has a good chapter on dictionaries. And on lists (and how to loop through them).

Check out the Codecademy lesson on lists and dictionaries.

Problems

1. Realtime visitors to USA.GOV websites

The data is displayed on analytics.usa.gov

Data URL

http://2015.compjour.org/files/code/json-examples/analyticsgov-realtime.json

Original source: https://analytics.usa.gov/data/live/top-pages-realtime.json

Tasks

A. Print the value of the name attribute of the top-level object.

B. Print the value of the taken_at attribute of the top-level object.

C. Print the name attribute of the meta object.

D. Print the number of active_visitors, which is part of the data attribute.

E. Print a comma-separated list of the top-level object's keys (i.e. attributes).*

Note: For E, you may get a slightly different answer because of the fact that the keys/attributes in a Python dict are inherently unordered. In other words, these two dicts are equivalent:

a = {'apples': 10, 'oranges': 50}
b = {'oranges': 50, 'apples': 10}
print(a == b)
# True

Expected output

A. realtime
B. 2015-04-13T03:20:02.401Z
C. Active Users Right Now
D. 78000
E. name, query, meta, data, totals, taken_at

Answer

import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/analyticsgov-realtime.json"
# fetch the data file
response = requests.get(data_url)
text = response.text
# parse the data
data = json.loads(text)

print('A.', data['name'])
print('B.', data['taken_at'])
print('C.', data['meta']['name'])
print('D.', data['data'][0]['active_visitors'])
print('E.', ', '.join(data.keys()))

2. Facebook Graph Object for The White House

Facebook's documentation of the Graph API

Data URL

http://2015.compjour.org/files/code/json-examples/graph.facebook-whitehouse.json

Original source: https://graph.facebook.com/WhiteHouse

Tasks

A. Print the value of the checkins attribute of the top-level object.

B. Print the value of the likes attribute of the top-level object.

C. Print the value of the longitude attribute of the location object.

D. Print the value of the name of the last object inside the category_list list.

Expected Output

A. 1474208
B. 3405748
C. -77.035704501759
D. Government Organization

3. Google Maps Geocoder Response

This is the API that powers the location data behind Google Maps and many other location-focused apps (such as Uber).

Documentation for the Geocoding API Request Format.

Data URL

http://2015.compjour.org/files/code/json-examples/maps.googleapis-geocode-mcclatchy.json

Original source: https://maps.googleapis.com/maps/api/geocode/json?address=McClatchy+Hall+Stanford,CA

Tasks

Note that the top-level object contains a list of results; this is because for any given address-lookup, the address given may have been vague enough to require more than one result (see the result for "Paris". For this problem, though, results contains exactly one object, which I refer to as the "result object".

A. Print the value of the formatted_address attribute for the result object.

B. For the top-level object, print out the value of the status attribute.

C. For the result object, print out the location_type, which is part of the geometry object.

D. Print the value of the lat attribute of the location object that is part of the geometry object (which is, again, part of the result object)

E. Again, inside the geometry of the result object, print the value of the southwest lng attribute, which is inside the geometry's viewport object.

F. For the result object, print the long_name values for the first 2 address_components, joined (by a commma-and-space).

Expected Output

A. Stanford University, Main Quad, 450 Serra Mall McClatchy Hall, Stanford, CA 94305, USA
B. OK
C. ROOFTOP
D. 37.4283125
E. -122.1708429802915
F. McClatchy Hall, Stanford University

Partial answer

import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/maps.googleapis-geocode-mcclatchy.json"
response = requests.get(data_url)
text = response.text
data = json.loads(text)

result_obj = data['results'][0]
print('A.',  result_obj['formatted_address'])
print('C.', result_obj['geometry']['location_type'])

The Spotify music service has an API that will return a list of artists, in descending order of "similarity", that are considered similar to a given artist, in this case, Beyoncé.

img

The documentation for Spotify's Related Artists endpoint.

The "similarity" between artists is "based on analysis of the Spotify community's listening history". From Spotify's 2010 press release on this feature:

Previously we’ve used genre and artist tagging from AllMusic for related artists which worked well, but did not cover a large portion of our catalogue. What we’ve done now is to go through months and months of listening data and look closely at what people listen to.

This allows us to see that users who listen to a lot of The Rolling Stones, for example, are also big fans of Iggy Pop or The Byrds. The new feature pulls some of this information together to show you a range of related artists in one tab.

Data URL

http://2015.compjour.org/files/code/json-examples/spotify-related-to-beyonce.json

Original source: https://api.spotify.com/v1/artists/6vWDO969PvNqNYHIOW5v0m/related-artists

Tasks

For this problem set, I'm being more colloquial and less code-specific in referring to the values that you need to find.

A. How many related artists are in Spotify's response?

B. What is the name of the 5th most-similar artist to Beyoncé?

C. How many followers does the 12th most-similar artist to Beyoncé have?

D. Print a comma-separated list of the genres of the artist most related to Beyoncé.

E. What is the URL for the largest image file for the artist that is least-related (at least in this result set) to Beyoncé?

Expected Output

A. 20
B. Ciara
C. 74258
D. pop christmas,r&b,urban contemporary
E. https://i.scdn.co/image/7e8849593bcf16705c62128ed749de0e543c2e4e

Partial answer

import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/spotify-related-to-beyonce.json"
data = json.loads(requests.get(data_url).text)

print('A.', len(data['artists']))

5. A single tweet from the Library of Congress

See the Twitter documentation on get/statuses/show endpoint.

Data URL

http://2015.compjour.org/files/code/json-examples/single-tweet-librarycongress.json

Original source: https://twitter.com/librarycongress/status/586227094285225984

Fetching the data programmatically, using the Tweepy library to handle the authentication:

# assuming that client is an authenticated instance of tweepy.api.API
tweet = client.statuses_lookup([586227094285225984])[0]
tweet_dict = tweet._json
print(json.dumps(tweet_dict, indent = 2))

Tasks

A. Print the timestamp of when the tweet was created at.

B. Print the timestamp of when the account that sent the tweet was created at.

C. Print the text of the tweet.

D. Print the Twitter account name of the tweet's author.

E. Print the id of the tweet.

F. Print the number of users mentioned

G. Print a comma-separated list of the hashtags used in the tweet.

H. Print a comma-separated list of the displayed URLs in the tweet. Note: your answer should be similar to the answer for G., in that if there were 0 or 5 URLs, the code would still work the same.

Expected Output

A. Thu Apr 09 18:00:05 +0000 2015
B. Fri Jun 29 14:23:25 +0000 2007
C. Passing by one vote, US Senate ratifies a treaty to purchase Alaska #OTD 1867. More: #ChronAm http://t.co/2RXNSMmOkf
D. librarycongress
E. 586227094285225984
F. 0
G. OTD,ChronAm
H. go.usa.gov/3YD7V

Partial answer

import requests
import json
data_url = 'http://www.compjour.org/files/code/json-examples/single-tweet-librarycongress.json'
data = json.loads(requests.get(data_url).text)


### For G.
hashtag_objs = data['entities']['hashtags']
hashtag_texts = []
for h in hashtag_objs:
    hashtag_texts.append(h['text'])
print('G.', ','.join(hashtag_texts))
# alternatively, you could also use the list comprehension syntax:
# hashtag_texts = [h['text'] for h in data['entities']['hashtags']]

All Solutions

1.

import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/analyticsgov-realtime.json"
# fetch the data file
response = requests.get(data_url)
text = response.text
# parse the data
data = json.loads(text)

print('A.', data['name'])
print('B.', data['taken_at'])
print('C.', data['meta']['name'])
print('D.', data['data'][0]['active_visitors'])
print('E.', ', '.join(data.keys()))
        
File found at: /files/code/answers/json-quiz/1.py

2.

import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/graph.facebook-whitehouse.json"
response = requests.get(data_url)
data = json.loads(response.text)

print('A.', data['checkins'])
print('B.', data['likes'])
print('C.', data['location']['longitude'])
print('D.', data['category_list'][-1]['name'])
        
File found at: /files/code/answers/json-quiz/2.py

3.

import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/maps.googleapis-geocode-mcclatchy.json"
response = requests.get(data_url)
data = json.loads(response.text)

obj = data['results'][0]
print('A.', obj['formatted_address'])
print('B.', data['status'])
print('C.', obj['geometry']['location_type'])
print('D.', obj['geometry']['location']['lat'])
print('E.', obj['geometry']['viewport']['southwest']['lng'])

a = obj['address_components'][0]['long_name']
b = obj['address_components'][1]['long_name']
print('F.', a + ', ' + b)
# or, to use list comprehensions:
# print('F.', ', '.join(a['long_name'] for a in obj['address_components'][0:2]))
        
File found at: /files/code/answers/json-quiz/3.py

4.

import requests
import json
data_url = "http://www.compjour.org/files/code/json-examples/spotify-related-to-beyonce.json"
data = json.loads(requests.get(data_url).text)
artists = data['artists']

print('A.', len(artists))
print('B.', artists[4]['name'])
print('C.', artists[11]['followers']['total'])
print('D.', ','.join(artists[0]['genres']))
print('E.', artists[-1]['images'][0]['url'])

# Note: the answer to E depends on that images array being sorted by size
# ...if that weren't the case, we'd have to sort it like so:

# from operator import itemgetter
# images = sorted(artists[-1]['images'], key = itemgetter('width', 'height'))
# print('E.', images[-1]['url'])


        
File found at: /files/code/answers/json-quiz/4.py

5.

import requests
import json
data_url = 'http://www.compjour.org/files/code/json-examples/single-tweet-librarycongress.json'
data = json.loads(requests.get(data_url).text)

print('A.', data['created_at'])
print('B.', data['user']['created_at'])
print('C.', data['text'])
print('D.', data['user']['screen_name'])
print('E.', data['id'])
print('F.', len(data['entities']['user_mentions']))

### For G.
hashtag_objs = data['entities']['hashtags']
hashtag_texts = []
for h in hashtag_objs:
    hashtag_texts.append(h['text'])
print('G.', ','.join(hashtag_texts))
# alternatively, you could also use the list comprehension syntax:
# hashtag_texts = [h['text'] for h in data['entities']['hashtags']]


### For H
urls = data['entities']['urls']
urltxts = []
for u in urls:
    urltxts.append(u['display_url'])
print('G.', ','.join(urltxts))
        
File found at: /files/code/answers/json-quiz/5.py