Sorting fun with JSON Quiz Part 2

A continuation of the first part of this quiz

Due date: Tuesday, April 28

Deliverables

Due date: Tuesday, April 28

By the end of this assignment, these things will have been done:

6. Congressmembers' Twitter accounts, as curated by C-SPAN

The Twitter list functionality is mostly used by power users, which means it's a decent place to start to compile a list of social media accounts for a given group (because anyone who goes through the trouble to put the list together is going to make it half-correct).

Sunlight Foundation keeps a spreadsheet of social media contacts here (it's part of the unitedstates/congress public data project). Note that their count is slightly different than C-SPAN's, and it's likely a result of C-SPAN's list not purporting to be an up-to-date official list.

Twitter API documentation for get/lists/members endpoint

Data URL

http://2015.compjour.org/files/code/json-examples/twitter-cspan-congress-list.json

Original source: https://twitter.com/cspan/lists/members-of-congress/members

# assuming that client is an authenticated instance of tweepy.api.API
members = client.list_members(owner_screen_name = 'cspan', slug = 'members-of-congress', count = 1000)
data = [m._json for m in members]
print(json.dumps(data, indent = 2))

Tasks

A. What are the total number of accounts in the list?

B. Find the number of accounts that have more than 10000 followers.

C. Find the number of accounts that are "verified".

D. Find the highest number of followers among all the accounts.

E. Find the highest number of tweets among all the accounts.

F. Find the account with the highest number of followers, then print: "{account's screen_name} has {account's followers_count} followers"

G. Find the account that has the highest number of tweets and is also not "verified", then print: "{account's screen_name} has {account's statuses_count} tweets"

H. Print the average number (rounded to nearest integer) of followers among all the accounts.

I. Print the median number of followers among all the accounts.

Partial answer

import requests
import json
import os
data_url = 'http://www.compjour.org/files/code/json-examples/twitter-cspan-congress-list.json'
tempfilename = "/tmp/congresslist.json"
# if you're on Windows, do this:
# tempfilename = os.path.expandvars('%TEMP%\\congresslist.json')

# Because this file is relatively large, let's save it to a tempfile, so that
# subsequent runs read from that file
if os.path.exists(tempfilename):
    tfile = open(tempfilename, "r")
    j = tfile.read()
else:    
    j = requests.get(data_url).text
    tfile = open(tempfilename, "w")
    tfile.write(j)

tfile.close()
accounts = json.loads(j)
## woof, that was a lot of lines just to load a file...


############# 
## Task B:
x = 0
for a in accounts:
    if a['followers_count'] > 10000:
        x += 1

## or more concisely:
# x = len([a for a in accounts if a['followers_count'] > 10000])
print("B.", x)


#############
## Task D:
counts = []
for a in accounts:
    counts.append(a['followers_count'])
maxval = sorted(counts, reverse = True)[0]
# alternatively:
# maxval = sorted([a['followers_count'] for a in accounts], reverse = True)[0]

## or:
# counts = []
# for a in accounts:
#    counts.append(a['followers_count'])
# maxval = max(counts)

## or:
# maxval = max(a['followers_count'] for a in accounts)
print("D.", maxval)



##############
## Task F:
from operator import itemgetter
y = sorted(accounts, key = itemgetter('followers_count'), reverse = True)
x = y[0]
# alternatively:
# x = max(accounts, key = itemgetter('followers_count'))
print("F.", x['screen_name'], 'has', x['followers_count'], 'followers')


###############
## Task H:
totes = 0
for a in accounts:
    totes += a['followers_count']

# alternatively
# totes = sum([a['followers_count'] for a in accounts])
print('H.', round(totes / len(accounts)))

Expected Output

A. 571
B. 231
C. 543
D. 1955200
E. 47169
F. SenJohnMcCain has 1955200 followers
G. reppittenger has 3668 tweets
H. 28909
I. 8385

7. A month's worth of significant earthquakes

The USGS Earthquake Hazards Program provides several feeds of varying time lengths (last hour, last week, last month) for earthquakes of a specified magnitude. Their GeoJSON data format is documented here. If you're interested in making a "QuakeBot", this would be the source.

Data URL

http://2015.compjour.org/files/code/json-examples/earthquake.usgs-significant_month.json

Original source: http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/significant_month.geojson

Tasks

A. Print the title of this particular feed.

B. Print the number of earthquakes contained in the sample feed.

C. Print the largest magnitude value of the earthquakes.

D. Print the number of earthquakes that occurred in "oceanic regions" (Hint: read the documentation here).

E. Print the title of the earthquake with the smallest magnitude

F. Print the title of the earthquake with the most number of "felt" reports.

G. Print the date of the most recent earthquake in YYYY-MM-DD HH:MM format, e.g. "2015-02-22 17:10" (note: for this, and subsequent tasks, the answers should be in reference to Greenwich Mean Time, i.e. UTC)

H. Print the date of the oldest earthquake in WEEKDAYNAME, MONTHNAME DD format, e.g. "Tuesday, February 22"

I. Print the number of earthquakes that occurred on a weekday.

J. Print the number of earthquakes that happened between 5AM and 9AM.

K. Print the title of the earthquake farthest away from Stanford, California

L. Print the title of the earthquake farthest away from Paris, France

M. Print the URL for a Google Static Map that marks the locations of the earthquakes in orange markers on a world map (i.e. having a zoom factor of 1) that is 500 pixels wide by 400 pixels high.

N. Same as above, but use red markers to denote earthquakes with magnitudes 6.0 or stronger.

Expected Output

A. USGS Significant Earthquakes, Past Month
B. 6
C. 7.5
D. 3
E. M 3.6 - 1km NNW of San Ramon, California
F. M 3.6 - 1km NNW of San Ramon, California
G. 2015-04-02 07:06
H. Wednesday, March 18
I. 5
J. 3
K. M 7.5 - 56km SE of Kokopo, Papua New Guinea
L. M 6.5 - 99km ENE of Hihifo, Tonga
M. https://maps.googleapis.com/maps/api/staticmap?zoom=1&size=500x400&markers=color:orange%7C37.792,-121.9868333%7C-15.5149,-172.9402%7C-15.388,-172.9038%7C-4.7632,152.5606%7C-18.3534,-69.1663%7C-36.0967,-73.6259
N. https://maps.googleapis.com/maps/api/staticmap?zoom=1&size=500x400&markers=color:orange%7C37.792,-121.9868333&markers=color:red%7C-15.5149,-172.9402%7C-15.388,-172.9038%7C-4.7632,152.5606%7C-18.3534,-69.1663%7C-36.0967,-73.6259

Partial solution

import requests
import json
durl = 'http://www.compjour.org/files/code/json-examples/earthquake.usgs-significant_month.json'
data = json.loads(requests.get(durl).text)
quakes = data['features']


#######################
# Task C
print("C.", max([q['properties']['mag'] for q in quakes]))


#######################
# Task E
def get_mag(quake):
    return quake['properties']['mag']

q = min(quakes, key = get_mag)
print("E.", q['properties']['title'])



#######################
# Task G
import time
# the USGS time attribute is precise to the millisecond
# but we just need seconds:
qsecs = [q['properties']['time'] / 1000 for q in quakes]
# the feed was probably sorted in reverse chronological order, but
# just to make sure...
qsecs = sorted(qsecs, reverse = True)
tsec = qsecs[0] 
timeobj = time.gmtime(tsec)
print('G.', time.strftime('%Y-%m-%d %H:%M', timeobj))

#######################
# Task I
# assuming qsecs is the same as from Task G
tobjs = [time.gmtime(s) for s in qsecs]
wdays = [s.tm_wday for s in tobjs]
x = [d for d in wdays if d in range(0, 6)]
print('I.', len(x))



#########################
# Task K
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat /2 ) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers.
    return c * r

def distance_from_stanford(quake):
    stanford_lng = -122.166
    stanford_lat = 37.424
    coords = quake['geometry']['coordinates']
    lng = coords[0]
    lat = coords[1]
    return haversine(lng, lat, stanford_lng, stanford_lat)

q = max(quakes, key = distance_from_stanford)
print('K.', q['properties']['title'])


#########################
# Task M
basemap_url = 'https://maps.googleapis.com/maps/api/staticmap?zoom=1&size=500x400'
markers_str = 'markers=color:orange'
for q in quakes:
    coords = q['geometry']['coordinates']
    lng = str(coords[0])
    lat = str(coords[1])
    s = '%7C' + lat + ',' + lng
    markers_str += s

print('M.', basemap_url + '&' + markers_str)  

8. NYT Best Sellers List

The New York Times offers a variety of APIs, including its own collection of Congressional vote and campaign finance data. It also has APIs into its own content, including its articles and its best sellers list.

Data URL

http://2015.compjour.org/files/code/json-examples/nyt-books-bestsellers-hardcover-fiction.json

Original source: http://api.nytimes.com/svc/books/v3/lists/2015-01-01/hardcover-fiction.json?sort-by=rank&api-key=YOURAPIKEY

Tasks

A. Count the number of books published by Scribner

B. Find the number of books with the word "detective" (case-insensitive) in their descriptions.

C. Find the book with the most weeks on the list and print its title and the number of weeks it's been listed (as pipe-separated values, i.e. PSV).

D. Find the book that had the lowest rank (i.e. highest rank numerically) last week. Print its title, current rank, and last week's rank, as PSV.

E. Count the books that are new this week (i.e. had a rank of 0 last week)

F. Print the title and rank (as PSV) of the highest-ranked book that is new this week.

G. Find the book that was ranked last week and had the biggest increase in rank this week.

H. Find the book that was ranked last week and had the biggest drop in rank this week. Print its title, current rank, and change in rank (as PSV).

I. Among books ranked last week, find and print the sum of the positive changes in rank.

J. Among books ranked last week, find the sum of the negative changes in rank. Print the number of books that dropped rank and the sum of their rank changes (as PSV).

K. Print the number of characters in the longest title.

L. Print the average number of characters for titles (rounded to the nearest integer).

Expected Output

A. 3
B. 3
C. THE GOLDFINCH|56
D. SOMEWHERE SAFE WITH SOMEBODY GOOD|16|14
E. 6
F. REDEPLOYMENT|9
G. THE GOLDFINCH|11|2
H. THE BOSTON GIRL|15|-3
I. 4
J. 6|-12
K. 33
L. 16

Partial answer

import requests
import json
data_url = 'http://www.compjour.org/files/code/json-examples/nyt-books-bestsellers-hardcover-fiction.json'
data = json.loads(requests.get(data_url).text)
books = data['results']['books']

################## 
# Task G.
# define a helper function
def calc_rank_change(book_obj):
    return book_obj["rank_last_week"] - book_obj["rank"]

books_ranked_last_week = [b for b in books if b['rank_last_week'] > 0]
x = max(books_ranked_last_week, key = calc_rank_change)
s = "|".join([x['title'], str(x['rank']), str(calc_rank_change(x))])
print("G.", s)

################## 
# Task I
# (assuming books_ranked_last_week and calc_rank_change() have 
#    been defined as above)
changes = [calc_rank_change(b) for b in books_ranked_last_week]
x = [v for v in changes if v > 0]
s = sum(x)
print("I.", s)

###################
# Task K
print('K.', max([len(b['title']) for b in books]))

9. Instagrams of a Congressmember

schock instagram

As inspired by this February 2015 Instagram-informed investigation, via Associated Press's By Jack Gillum and Stephen Braun .

The AP tracked Schock's reliance on the aircraft partly through the congressman's penchant for uploading pictures and videos of himself to his Instagram account. The AP extracted location data associated with each image then correlated it with flight records showing airport stopovers and expenses later billed for air travel against Schock's office and campaign records.

Note: Since the AP revelations Rep. Schock has made his Instagram account invite-only. Also, he resigned from Congress.

Note 2: This problem was never officially assigned, but I include it here for fun

Data URL

http://2015.compjour.org/files/code/json-examples/instagram-aaron-schock.json

Original source: https://api.instagram.com/v1/users/472797365/media/recent?access_token=YOURACCESSTOKEN&count=50

Tasks

A. Print the number of images versus number of videos in the feed.

B. Print the top 3 filters most frequently used, in "name:count", separated by a pipe symbol.

C. Print the number of items that are geocoded.

D. Print the number of items that have a listed location of "United States Capitol"

E. Print the top 3 named locations, as "name:count", separated by a pipe symbol.

F. Find the geocoded item that took place farthest from the "United States Capitol" and print the name of the item's location.

G. Print the first 20 letters of the caption, URL of the thumbnail, likes count, and comments count (as pipe-separated values) of the item with the highest sum of comments and likes.

H. Print the number of days (rounded to the nearest integer) between the oldest and newest item in this feed.

I. Calculate the rate of items posted per week (rounded to the nearest tenth decimal place) in this feed.

J. Find the longest gap in days (rounded to nearest day) between consecutively posted images.

K. Find the largest difference in comment count (as an absolute number) between consecutively posted images.

L. Print the month (as a number, e.g. 4 for April) of the most recently posted item. *

M. Print the ratio of items posted on the weekends versus the weekdays (round to two decimal places).

N. Find the day in which the most items were posted. Then print the name of that day (e.g. Monday) and the percentage of total items posted on that day (rounded to two decimal places), as pipe-separated values.

Expected Output

A. 30|3
B. Normal:25|Ludwig:3|Mayfair:2
C. 30
D. 3
E. United States Capitol:3|Perito Moreno Glacier - Patagonia :2|Pulenta Vineyard, Mendoza, Argentina:1
F. Yangon, Myanmar
G. Lunch with this old|http://scontent-b.cdninstagram.com/hphotos-xap1/t51.2885-15/s150x150/e15/10809784_562189993924641_1683766928_n.jpg|961|26
H. 86
I. 2.7
J. 13
K. 55
L. 2
M. 0.24
N. Monday|0.24

All Solutions

6.

import requests
import json
import os
data_url = 'http://www.compjour.org/files/code/json-examples/twitter-cspan-congress-list.json'
tempfilename = "/tmp/congresslist.json"
# if you're on Windows, do this:
# tempfilename = os.path.expandvars('%TEMP%\\congresslist.json')

# Because this file is relatively large, let's save it to a tempfile, so that
# subsequent runs read from that file
if os.path.exists(tempfilename):
    tfile = open(tempfilename, "r")
    j = tfile.read()
else:
    j = requests.get(data_url).text
    tfile = open(tempfilename, "w")
    tfile.write(j)

tfile.close()
accounts = json.loads(j)
## woof, that was a lot of lines just to load a file...
##################################################

#############
## Task A:
print('A.', len(accounts))

#############
## Task B:
x = 0
for a in accounts:
    if a['followers_count'] > 10000:
        x += 1

## or more concisely:
# x = len([a for a in accounts if a['followers_count'] > 10000])
print("B.", x)


#############
## Task C:
x = len([a for a in accounts if a['verified'] == True])
print("C.", x)


#############
## Task D:
counts = []
for a in accounts:
    counts.append(a['followers_count'])
maxval = sorted(counts, reverse = True)[0]
# alternatively:
# maxval = sorted([a['followers_count'] for a in accounts], reverse = True)[0]

## or:
# counts = []
# for a in accounts:
#    counts.append(a['followers_count'])
# maxval = max(counts)

## or:
# maxval = max(a['followers_count'] for a in accounts)
print("D.", maxval)


###############
## Task E:
print("E.", max(a['statuses_count'] for a in accounts))


##############
## Task F:
from operator import itemgetter
y = sorted(accounts, key = itemgetter('followers_count'), reverse = True)
x = y[0]
# alternatively:
# x = max(accounts, key = itemgetter('followers_count'))
print("F.", x['screen_name'], 'has', x['followers_count'], 'followers')

##############
## Task G:
from operator import itemgetter
vaccs = sorted(accounts, key = itemgetter('statuses_count'), reverse = True)
accs = [a for a in vaccs if a['verified'] == False]
x = accs[0]
print("G.", x['screen_name'], 'has', x['statuses_count'], 'tweets')


###############
## Task H:
totes = 0
for a in accounts:
    totes += a['followers_count']

# alternatively
# totes = sum([a['followers_count'] for a in accounts])
print('H.', round(totes / len(accounts)))


###############
## Task I:
from operator import itemgetter
z = sorted(accounts, key = itemgetter('followers_count'))
m = z[len(z) // 2]
print("I.", m['followers_count'])
        
File found at: /files/code/answers/json-quiz/6.py

7.

import requests
import json
durl = 'http://www.compjour.org/files/code/json-examples/earthquake.usgs-significant_month.json'
data = json.loads(requests.get(durl).text)
quakes = data['features']

#######################
# Task A
print("A.", data['metadata']['title'])

#######################
# Task B
print("B.", len(quakes))

#######################
# Task C
print("C.", max([q['properties']['mag'] for q in quakes]))

#######################
# Task D
print("D.", len([q for q in quakes if q['properties']['tsunami'] == 1]))

#######################
# Task E
def get_mag(quake):
    return quake['properties']['mag']

q = min(quakes, key = get_mag)
print("E.", q['properties']['title'])

#######################
# Task F
def get_felts(quake):
    return quake['properties']['felt']

q = max(quakes, key = get_felts)
print("F.", q['properties']['title'])

#######################
# Task G
import time
# the USGS time attribute is precise to the millisecond
# but we just need seconds:
qsecs = [q['properties']['time'] / 1000 for q in quakes]
# the feed was probably sorted in reverse chronological order, but
# just to make sure...
qsecs = sorted(qsecs, reverse = True)
tsec = qsecs[0]
timeobj = time.gmtime(tsec)
print('G.', time.strftime('%Y-%m-%d %H:%M', timeobj))


#######################
# Task H
# assuming qsecs is the same as from Task G
x = time.strftime('%A, %B %d', time.gmtime(qsecs[-1]))
print('H.', x)


#######################
# Task I
# assuming qsecs is the same as from Task G
tobjs = [time.gmtime(s) for s in qsecs]
wdays = [s.tm_wday for s in tobjs]
x = [d for d in wdays if d in range(0, 6)]
print('I.', len(x))

#######################
# Task J
# assuming tobjs is the same as from Task I
hrs = [s.tm_hour for s in tobjs]
print('J.', len([h for h in hrs if h in range(5,9)]))


#########################
# Task K
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = sin(dlat /2 ) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
    c = 2 * asin(sqrt(a))
    r = 6371 # Radius of earth in kilometers.
    return c * r

def distance_from_stanford(quake):
    stanford_lng = -122.166
    stanford_lat = 37.424
    coords = quake['geometry']['coordinates']
    lng = coords[0]
    lat = coords[1]
    return haversine(lng, lat, stanford_lng, stanford_lat)

q = max(quakes, key = distance_from_stanford)
print('K.', q['properties']['title'])

#########################
# Task L
# assuming haversine has been defined as above
def d_paris(quake):
    paris_lng = -0.8655079
    paris_lat = 44.562918
    coords = quake['geometry']['coordinates']
    lng = coords[0]
    lat = coords[1]
    return haversine(lng, lat, paris_lng, paris_lat)

q = max(quakes, key = d_paris)
print('L.', q['properties']['title'])

#########################
# Task M
basemap_url = 'https://maps.googleapis.com/maps/api/staticmap?zoom=1&size=500x400'
markers_str = 'markers=color:orange'
for q in quakes:
    coords = q['geometry']['coordinates']
    lng = str(coords[0])
    lat = str(coords[1])
    s = '%7C' + lat + ',' + lng
    markers_str += s

print('M.', basemap_url + '&' + markers_str)

#########################
# Task N
orange_str = 'markers=color:orange'
red_str = 'markers=color:red'

for q in quakes:
    coords = q['geometry']['coordinates']
    lng = str(coords[0])
    lat = str(coords[1])
    s = '%7C' + lat + ',' + lng

    if q['properties']['mag'] >= 6:
        red_str += s
    else:
        orange_str += s

print('N.', basemap_url + '&' + orange_str + '&' + red_str)
        
File found at: /files/code/answers/json-quiz/7.py

8.

import requests
import json
data_url = 'http://www.compjour.org/files/code/json-examples/nyt-books-bestsellers-hardcover-fiction.json'
data = json.loads(requests.get(data_url).text)
books = data['results']['books']

#################
# Task A
print('A.', len([b for b in books if b['publisher'] == "Scribner"]))

#################
# Task B
print('B.', len([b for b in books if "detective" in b['description'].lower()]))

#################
# Task C
from operator import itemgetter
x = max(books, key = itemgetter('weeks_on_list'))
print('C.', '%s|%s' % (x['title'], x['weeks_on_list']))

#################
# Task D
x = max(books, key = itemgetter('rank_last_week'))
print('D.', '%s|%s|%s' % (x['title'], x['rank'], x['rank_last_week']))

##################
# Task E
books_unranked_last_week = [b for b in books if b['rank_last_week'] == 0]
print('E.', len(books_unranked_last_week))

#################
# Task F
x = min(books_unranked_last_week, key = itemgetter('rank'))
print('F.', '%s|%s' % (x['title'], x['rank']))


##################
# Task G.
books_ranked_last_week = [b for b in books if b['rank_last_week'] > 0]
# define a helper function
def calc_rank_change(book_obj):
    return book_obj["rank_last_week"] - book_obj["rank"]

x = max(books_ranked_last_week, key = calc_rank_change)
s = "|".join([x['title'], str(x['rank']), str(calc_rank_change(x))])
print("G.", s)

##################
# Task H.
x = min(books_ranked_last_week, key = calc_rank_change)
s = "|".join([x['title'], str(x['rank']), str(calc_rank_change(x))])
print("H.", s)


##################
# Task I
changes = [calc_rank_change(b) for b in books_ranked_last_week]
x = [v for v in changes if v > 0]
s = sum(x)
print("I.", s)

###################
# Task J
changes = [calc_rank_change(b) for b in books_ranked_last_week]
x = [v for v in changes if v < 0]
s = sum(x)
print("J.", "%s|%s" % (len(x), s))


###################
# Task K
print('K.', max([len(b['title']) for b in books]))

###################
# Task L
x = round(sum([len(b['title']) for b in books]) / len(books))
print('L.', x)
        
File found at: /files/code/answers/json-quiz/8.py

9.

import requests
import json
data_url = 'http://www.compjour.org/files/code/json-examples/instagram-aaron-schock.json'
data = json.loads(requests.get(data_url).text)

items = data['data']

#################
# Task A
ix = len([i for i in items if i['type'] == 'image'])
vx = len([i for i in items if i['type'] == 'video'])
print("A.", "%s|%s" % (ix, vx))


###########################
# Task B
from operator import itemgetter
filter_dict = {}
for i in items:
    fname = i['filter']
    if filter_dict.get(fname):
        filter_dict[fname] += 1
    else:
        filter_dict[fname] = 1

## alternatively:
## from collections import Counter
# filters = [i['filter'] for i in items]
# filter_dict = Counter(filters)

### now create a list of lists:
filter_list = list(filter_dict.items())
### now sort that list, get top 3
top3 = sorted(filter_list, key = itemgetter(1), reverse = True)[0:3]
## or, if you made filter_dict a Counter() object
# top3 = filter_dict.most_common(3)

### and now map each tuple as a properly formatted string
top3_strs = []
for t in top3:
    x = str(t[0]) + ':' + str(t[1])
    top3_strs.append(x)

## alternatively:
# top3_strs = ["%s:%s" % t for t in top3]

## FINALLY:
print("B.", '|'.join(top3_strs))



#################
# Task C
located_items = [i for i in items if i['location']]
geocoded_items = [i for i in located_items if i['location'].get('latitude')]
print("C.", len(geocoded_items))

#################
# Task D
capitol_items = [i for i in geocoded_items if i['location'].get('name') == 'United States Capitol']
print("D.", len(capitol_items))

#################
# Task E
# Same as task B
from collections import Counter
locations = [i['location']['name'] for i in located_items if i['location'].get('name')]
top_locs = Counter(locations).most_common(3)
top_loc_strs = ["%s:%s" % t for t in top_locs]

## FINALLY:
print("E.", '|'.join(top_loc_strs))



#################
# Task F
# using this haversine formula: http://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points
cap = capitol_items[0]
cap_lat = cap['location']['latitude']
cap_lng = cap['location']['longitude']

from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = sin(dlat /2 ) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
    c = 2 * asin(sqrt(a))
    r = 6371 # Radius of earth in kilometers.
    return c * r

def distance_to_cap(item):
    lng = item['location']['longitude']
    lat = item['location']['latitude']
    return haversine(lng, lat, cap_lng, cap_lat)

i = max(geocoded_items, key = distance_to_cap )
print('F.', i['location']['name'])

#################
# Task G
def sum_likes_and_comments(item):
    return item['comments']['count'] + item['likes']['count']

i = max(items, key = sum_likes_and_comments)
print('G.', '|'.join([i['caption']['text'][0:19],
  i['images']['thumbnail']['url'],
  str(i['likes']['count']),
  str(i['comments']['count'])
]))

#################
# Task H
from operator import itemgetter
y = max(items, key = itemgetter('created_time'))
x = min(items, key = itemgetter('created_time'))
span_seconds = int(y['created_time']) - int(x['created_time'])
span_days = span_seconds / (60 * 60 * 24)
print('H.', round(span_days))

#################
# Task I
span_weeks = span_seconds / (7 * 60 * 60 * 24)
print('I.', round(len(items) / span_weeks, 1))



#################
# Task J
prev_time = int(items[-1]['created_time'])
max_diff = 0
for i in reversed(items):
# note that when i is the first item, this will be 0...which is fine,
# even if it's a meaningless, redundant operation
    this_time = int(i['created_time'])
    max_diff = max(max_diff, this_time - prev_time)
    prev_time = this_time

max_days = round(max_diff / (60 * 60 * 24))
print('J.', max_days)

#################
# Task K
pv = int(items[-1]['comments']['count'])
mdiff = 0
for i in reversed(items):
# note that when i is the first item, this will be 0...which is fine,
# even if it's a wasted op:
    nv = int(i['comments']['count'])
    mdiff = max(mdiff, abs(nv - pv))
    pv = nv

print('K.', mdiff)


#################
# Task L
import time
x = int(items[0]['created_time'])
timeobj = time.localtime(x)
print('L.', timeobj.tm_mon)

##################
# Task M
import time
from collections import Counter
def footime(item):
    s = int(item['created_time'])
    return time.localtime(s)

w = [footime(i).tm_wday for i in items]
ct = Counter(w)
weekend_count = ct[5] + ct[6]
print("M.", round(weekend_count / len(items), 2))

##################
# Task N
# assuming the same setup from Task M
# starting from:
# ct = Counter(w)
dayslist = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
d = ct.most_common(1)[0]
dayname = dayslist[d[0]]
dpct = round(d[1] / len(items), 2)
print("N.", '%s|%s' % (dayname, dpct))
        
File found at: /files/code/answers/json-quiz/9.py