USAJobs Midterm Part 1

Reference

Directions

The exercises with asterisks already have the solution. A problem with a solution usually contains hints for subsequent problems, or sometimes for the problem directly preceding it.

All the problems come with expected results, but your results may be slightly different, because the job listings change on an hourly/daily basis.

For the problems that are already done, you must still turn them in. The expectation is that you at least type them out, line by line, into iPython so each step and its effect are made clear to you.

Deliverables

Due date: Monday, May 4

Create a folder in your compjour-hw repo named: usajobs-midterm-1.

For each exercise, create a separate file, e.g.

|-- compjour-hw/
    |-- usajobs-midterm-1/
       |-- 1-1.py
       |-- 1-2.py
       |-- 1-3.py 
       (etc)

Some exercises may require you to produce a file. For example, exercise 1-6 requires that you create a webpage. Store that webpage in the same directory as the script files, e.g.

|-- compjour-hw/
    |-- usajobs-midterm-1/
       |-- 1-6.py
       |-- 1-6.html
       (etc)

Exercises

1. * Find the total number of job listings in New York

Query the data.usajobs.gov API for job listings for the state of New York and print the total number of job listings in this format:

New York has XYZ job listings.

Takeaway

This is just a warmup exercise that requires little more than knowing the basics of executing Python code and how to use an external library like Requests. Pretty much every exercise from here on out will require this pattern of:

  1. Decide what kind of query you want to make to data.usajobs.gov
  2. Use requests.get()
  3. Parse the response as JSON and do something with it.

In the posted solution, observe how variables are used from a stylistic point. I set up my requests.get() call with:

state_name = 'New York'
atts = {"CountrySubdivision": state_name, 'NumberOfJobs': 1}
resp = requests.get(BASE_USAJOBS_URL, params = atts)

But I could've done it in one line:

resp = requests.get(BASE_USAJOBS_URL, params = {"CountrySubdivision": 'New York', 'NumberOfJobs': 1})

However, that one line is now hard to read because of its width. And while state_name = 'New York' may seem overly verbose, look at how state_name is re-used in the final print() statement, which saves me from having to type out "New York" twice.

Result

New York has 384 job listings.

Solution

import requests
BASE_USAJOBS_URL = "https://data.usajobs.gov/api/jobs"
state_name = 'New York'
atts = {"CountrySubdivision": state_name, 'NumberOfJobs': 1}
resp = requests.get(BASE_USAJOBS_URL, params = atts)
data = resp.json()
print("%s has %s job listings." % (state_name, data['TotalJobs']))
        
File found at: /files/code/answers/usajobs-midterm-1/1-1.py

2. Find the total number of job listings in Alaska and Hawaii.

Same as problem 1-1, except print one line for each state. And print a third line that contains the sum of the two state's total job counts:

Alaska has XXX job listings.
Hawaii has YYY job listings.
Together, they have ZZZ total job listings.

Takeaway

You can almost get by with copying the solution for Exercise 1-1 and pasting it in twice, and then changing the variables for Alaska and Hawaii, respectively. And that's fine (for now). But notice how if you didn't follow my posted solution and take the time to do:

state_name = 'New York'

And instead, did:

resp = requests.get(BASE_USAJOBS_URL, params = {"CountrySubdivision": 'New York', 'NumberOfJobs': 1})

Then for this exercise, you would have to make 4 manual changes (2 for each state) in both the requests.get() call and the corresponding print() statement. Imagine how much of a pain that becomes when you have to repeat a snippet of code 10 or 10,000 times, and you should get a better sense of why variables are useful.

Quick note: To total the two job counts up, examine in specific detail how the job counts are represented in the API text response; just because they look like numbers doesn't mean that Python, when parsing the JSON, will treat them like numbers.

Result

Alaska has 207 job listings.
Hawaii has 204 job listings.
Together, they have 411 total job listings.

3. Using a for-loop, find the total number of job listings in China, South Africa, and Tajikistan.

The output should be in the same format as Exercise 1-2, but you must use a for-loop.

Takeaway

Pretty much the same code as exercises 1-1 and 1-2, except with a for-loop. Your code should end up being slightly shorter (in terms of line count) compared to 1-2, and it should just feel a little more elegant than copy-pasting the same snippet 3 times over.

Result

China currently has 13 job listings.
South Africa currently has 4 job listings.
Tajikistan currently has 7 job listings.
Together, they have 24 total job listings.

4. Get and store job listing counts in a dictionary

For each of the U.S. states of California, Florida, New York, and Maryland, get the total job listing count and store the result in a dictionary, using the name of the state as the key and the total job count – as an integer – for the corresponding value:

{'StateName1': 100, 'StateName2': 42}

Takeaway

It is rarely useful to create a program or function that just spits out made-for-human-reports text like "Alabama has 12 total jobs." More realistically, you create programs that will output or return a data structure (as in this case, a dictionary), so that other programs can easily use the result.

Result

{'California': 755, 'New York': 356, 'Maryland': 380, 'Florida': 361}

5. * Get and store job listing counts as a list

For the same states as Exercise 1-4, get their total job listing counts, but store the result in a list. More specifically, a list in which each member is itself a list, e.g.

[['StateName1', 100], ['StateName2', 42]]

Takeaway

Same concept as Exercise 1-5. It's worth noting how the exact same data can be sufficiently represented either as a dictionary or a list. However, think about the difference in how an end user accesses the data members. For example, compare how you would get Maryland's number of jobs if the result (as in 1-5) is a dictionary:

result['Maryland']

– to how you would access that same data point from a list:

result[2][0]

(hint: one data structure is more human-friendly than the other, in this situation)

Result

[['California', 755], ['Florida', 361], ['Maryland', 380], ['New York', 356]]

Solution

import requests
BASE_USAJOBS_URL = "https://data.usajobs.gov/api/jobs"
names = ['California', 'Florida', 'Maryland', 'New York']
thelist = []
for name in names:
    atts = {'CountrySubdivision': name, 'NumberOfJobs': 1}
    resp = requests.get(BASE_USAJOBS_URL, params = atts)
    jobcount = resp.json()['TotalJobs']
    thelist.append([name, jobcount])

print(thelist)
        
File found at: /files/code/answers/usajobs-midterm-1/1-5.py

6. Create an interactive Google Bar Chart showing the job counts

For the same 4 states in Exercise 1-4, produce the HTML needed to display the job count data as an interactive Google Bar Chart.

Takeaway

Learning the front-end stack of web development (e.g. HTML, CSS, JavaScript, the Document Object Model, asynchronous programming) is beyond the scope of this class. However, if you can accept that the code for a webpage itself ends up being just text, then it should seem possible that, if given a working template – even one with an interactive element – you could create your own customized webpage by just replacing the parts specific to your data (and isn't that what most programming consists of?)

Also, take note of how the output format in Exercise 1-5 is directly relevant to making this exercise (as well as 1-7, 1-8, and 1-9) trivially easy.

(Hint: read on to Exercise 1-7 if you have no clue how to start on this)

Copy the HTML from this example file and adapt it as necessary:

http://2015.compjour.org/files/code/answers/usajobs-midterm-1/sample-barchart-1.html

(Note: If you open the sample webpage in your browser, it will render all of the chart code…which is not what you want. Try using requests.get() instead, to get the bare HTML. Or use View Source (but not Inspect Element ) )

Your program must create a HTML file named: 1-6.html

Result

http://2015.compjour.org/files/code/answers/usajobs-midterm-1/1-6.html

img

7. * Create an interactive Google Pie Chart for the 4 states

For the same 4 states in Exercise 1-4, produce the HTML needed to display the job count data as an interactive Google Bar Chart.

Takeaway

To belabor the point that, with a working template and the ability to read instructions, you can create a variety of charts and pages to your liking. The code to solve this problem should be virtually identical to Exercise 1-6.

Copy the HTML from this example file and adapt it as necessary:

(Hint: Besides replacing the data element, you also have to do the necessary change to make a pie instead of a bar chart)

http://2015.compjour.org/files/code/answers/usajobs-midterm-1/sample-barchart-1.html

(Note: If you open the sample webpage in your browser, it will render all of the chart code…which is not what you want. Try using requests.get() instead, to get the bare HTML. Or use View Source (but not Inspect Element ) )

Your program must create a HTML file named: 1-7.html

Result

http://2015.compjour.org/files/code/answers/usajobs-midterm-1/1-7.html

img

Solution

import requests
# same code from problem 5
BASE_USAJOBS_URL = "https://data.usajobs.gov/api/jobs"
names = ['California', 'Florida', 'Maryland', 'New York']
thelist = []
thelist.append(["State", "Job Count"])
for n in names:
    atts = {'CountrySubdivision': n, 'NumberOfJobs': 1}
    resp = requests.get(BASE_USAJOBS_URL, params = atts)
    jobcount = int(resp.json()['TotalJobs'])
    thelist.append([n, jobcount])


# Throw the boilerplate HTML into a variable:

chartcode = """
<!DOCTYPE html>
<html>
  <head>
    <title>Sample Chart</title>
    <script type="text/javascript" src="https://www.google.com/jsapi"></script>
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css">

  </head>
  <body>
    <script type="text/javascript">
      google.load("visualization", '1.1', {packages:['corechart']});
      google.setOnLoadCallback(drawChart);
      function drawChart() {
        var data = %s

        var datatable = google.visualization.arrayToDataTable(data);
        var options = {
          width: 600,
          height: 400,
          legend: { position: 'none' },
        };
        var chart = new google.visualization.PieChart(document.getElementById('mychart'));
        chart.draw(datatable, options);
    }
    </script>

      <div class="container">
        <h1 style="text-align:center">Hello chart</h1>
        <div id="mychart"></div>
      </div>
  </body>
</html>
"""


htmlfile = open("1-7.html", "w")
htmlfile.write(chartcode % thelist)
htmlfile.close()
        
File found at: /files/code/answers/usajobs-midterm-1/1-7.py

8. Create an interactive Google Geochart for all 4 states

Same setup as exercises 1-6 and 1-7, except create a Geochart visualization.

As per the Google documentation, you must translate state names to their corresponding ISO_3166-2:US codes, e.g. California is US-CA (which all end up being their standard postal abbreviation, prepended by US-)

For your convenience, I've produced this JSON file which contains a dictionary that maps each full state name to its corresponding postal abbreviation:

http://stash.compjour.org/data/usajobs/us-statecodes.json

For the chart, copy the HTML from this example file and adapt it as necessary:

http://2015.compjour.org/files/code/answers/usajobs-midterm-1/sample-geochart-1.html

Your program must create a HTML file named: 1-8.html

Takeaway

Again, observe the wide variety of charts you can make using the same data-gathering/processing code from the previous exercises. A map is not ideal for this kind of data, but since Google makes it so easy, might as well try it out.

Result

http://2015.compjour.org/files/code/answers/usajobs-midterm-1/1-8.html

img

9. Create an interactive Google Geochart for all 50 states

Same setup as Exercise 1-8, except repeat for all 50 states and Washington D.C.. You will re-use virtually all of the code from 1-8, but you need to add code to generate a list of all the states.

(Do not hand-type all 51 names in; doing so misses the point of this exercise and will result in zero-credit for this problem).

For your convenience, I've produced this JSON file which contains a dictionary that maps each full state name to its corresponding postal abbreviation:

http://stash.compjour.org/data/usajobs/us-statecodes.json

Use the same sample chart HTML as per Exercise 1-8:

http://2015.compjour.org/files/code/answers/usajobs-midterm-1/sample-geochart-1.html

Takeaway

This exercise is the exact same pattern/process as Exercise 1-8. The only difference is the amount of data to process: 51 names as opposed to 4. But the only significant increase in our work is to actually get those names in such a way that we feed it into our existing program – everything else, from calling the API to making the map, involves no more sweat from us whether the data has 10 names or 10,000 names.

And in this case, the challenge of getting those 51 names is yet another challenge made relatively easy with an understanding and appreciation of for-loops and data structures. I've given you a machine-readable list of state names as JSON; extracting the names (and their abbreviations) is no different than the process of extracting data from the USAJobs API itself.

Result

http://2015.compjour.org/files/code/answers/usajobs-midterm-1/1-9.html

img

All Solutions

1-1.

import requests
BASE_USAJOBS_URL = "https://data.usajobs.gov/api/jobs"
state_name = 'New York'
atts = {"CountrySubdivision": state_name, 'NumberOfJobs': 1}
resp = requests.get(BASE_USAJOBS_URL, params = atts)
data = resp.json()
print("%s has %s job listings." % (state_name, data['TotalJobs']))
        
File found at: /files/code/answers/usajobs-midterm-1/1-1.py

1-2.

import requests
BASE_USAJOBS_URL = "https://data.usajobs.gov/api/jobs"
atts = {"CountrySubdivision": 'Alaska', 'NumberOfJobs': 1}
ak_resp = requests.get(BASE_USAJOBS_URL, params = atts)
ak_data = ak_resp.json()

atts = {"CountrySubdivision": 'Hawaii', 'NumberOfJobs': 1}
ha_resp = requests.get(BASE_USAJOBS_URL, params = atts)
ha_data = ha_resp.json()

print("Alaska has %s job listings." % ak_data['TotalJobs'])
print("Hawaii has %s job listings." % ha_data['TotalJobs'])
t = int(ak_data['TotalJobs']) + int(ha_data['TotalJobs'])
print("Together, they have %s total job listings." % t)
        
File found at: /files/code/answers/usajobs-midterm-1/1-2.py

1-3.

import requests
BASE_USAJOBS_URL = "https://data.usajobs.gov/api/jobs"
countries = ['China', 'South Africa', 'Tajikistan']
total_jobs = 0
for cname in countries:
    atts = {'Country': cname,  'NumberOfJobs': 1}
    resp = requests.get(BASE_USAJOBS_URL, params = atts)
    tjobs = int(resp.json()['TotalJobs'])
    print("%s currently has %s job listings.." % (cname, tjobs))
    total_jobs += tjobs

print("Together, they have %s total job listings." % total_jobs)

        
File found at: /files/code/answers/usajobs-midterm-1/1-3.py

1-4.

import requests
BASE_USAJOBS_URL = "https://data.usajobs.gov/api/jobs"
names = ['California', 'Florida', 'Maryland', 'New York']
thedict = {}
for c in names:
    resp = requests.get(BASE_USAJOBS_URL, params = {'CountrySubdivision': c, 'NumberOfJobs': 1})
    thedict[c] = int(resp.json()['TotalJobs'])


print(thedict)
        
File found at: /files/code/answers/usajobs-midterm-1/1-4.py

1-5.

import requests
BASE_USAJOBS_URL = "https://data.usajobs.gov/api/jobs"
names = ['California', 'Florida', 'Maryland', 'New York']
thelist = []
for name in names:
    atts = {'CountrySubdivision': name, 'NumberOfJobs': 1}
    resp = requests.get(BASE_USAJOBS_URL, params = atts)
    jobcount = resp.json()['TotalJobs']
    thelist.append([name, jobcount])

print(thelist)
        
File found at: /files/code/answers/usajobs-midterm-1/1-5.py

1-6.

import requests
# same code from problem 5
BASE_USAJOBS_URL = "https://data.usajobs.gov/api/jobs"
names = ['California', 'Florida', 'Maryland', 'New York']
thelist = []
thelist.append(["State", "Job Count"])
for n in names:
    atts = {'CountrySubdivision': n, 'NumberOfJobs': 1}
    resp = requests.get(BASE_USAJOBS_URL, params = atts)
    jobcount = int(resp.json()['TotalJobs'])
    thelist.append([n, jobcount])


# Throw the boilerplate HTML into a variable:

chartcode = """
<!DOCTYPE html>
<html>
  <head>
    <title>Sample Chart</title>
    <script type="text/javascript" src="https://www.google.com/jsapi"></script>
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css">

  </head>
  <body>
    <script type="text/javascript">
      google.load("visualization", '1.1', {packages:['corechart']});
      google.setOnLoadCallback(drawChart);
      function drawChart() {
        var data = %s

        var datatable = google.visualization.arrayToDataTable(data);
        var options = {
          width: 600,
          height: 400,
          legend: { position: 'none' },
        };
        var chart = new google.visualization.BarChart(document.getElementById('mychart'));
        chart.draw(datatable, options);
    }
    </script>

      <div class="container">
        <h1 style="text-align:center">Hello chart</h1>
        <div id="mychart"></div>
      </div>
  </body>
</html>
"""


htmlfile = open("1-6.html", "w")
htmlfile.write(chartcode % thelist)
htmlfile.close()
        
File found at: /files/code/answers/usajobs-midterm-1/1-6.py

1-7.

import requests
# same code from problem 5
BASE_USAJOBS_URL = "https://data.usajobs.gov/api/jobs"
names = ['California', 'Florida', 'Maryland', 'New York']
thelist = []
thelist.append(["State", "Job Count"])
for n in names:
    atts = {'CountrySubdivision': n, 'NumberOfJobs': 1}
    resp = requests.get(BASE_USAJOBS_URL, params = atts)
    jobcount = int(resp.json()['TotalJobs'])
    thelist.append([n, jobcount])


# Throw the boilerplate HTML into a variable:

chartcode = """
<!DOCTYPE html>
<html>
  <head>
    <title>Sample Chart</title>
    <script type="text/javascript" src="https://www.google.com/jsapi"></script>
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css">

  </head>
  <body>
    <script type="text/javascript">
      google.load("visualization", '1.1', {packages:['corechart']});
      google.setOnLoadCallback(drawChart);
      function drawChart() {
        var data = %s

        var datatable = google.visualization.arrayToDataTable(data);
        var options = {
          width: 600,
          height: 400,
          legend: { position: 'none' },
        };
        var chart = new google.visualization.PieChart(document.getElementById('mychart'));
        chart.draw(datatable, options);
    }
    </script>

      <div class="container">
        <h1 style="text-align:center">Hello chart</h1>
        <div id="mychart"></div>
      </div>
  </body>
</html>
"""


htmlfile = open("1-7.html", "w")
htmlfile.write(chartcode % thelist)
htmlfile.close()
        
File found at: /files/code/answers/usajobs-midterm-1/1-7.py

1-8.

# nothing here yet
        
File found at: /files/code/answers/usajobs-midterm-1/1-8.py

1-9.

import requests
BASE_USAJOBS_URL = "https://data.usajobs.gov/api/jobs"
STATECODES_URL = "http://stash.compjour.org/data/usajobs/us-statecodes.json"
names = requests.get(STATECODES_URL).json()
## Everything from 1-8 on is the same:
thelist = []
thelist.append(["State", "Job Count"])
for name, abbrev in names.items():
    print("Getting: ", name)
    atts = {'CountrySubdivision': name, 'NumberOfJobs': 1}
    resp = requests.get(BASE_USAJOBS_URL, params = atts)
    jobcount = int(resp.json()['TotalJobs'])
    label = "US-" + abbrev
    thelist.append([label, jobcount])



chartcode = """
<html>
  <head>
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css">
    <script type="text/javascript" src="https://www.google.com/jsapi"></script>
  </head>
  <body>
    <script type="text/javascript">
      google.load("visualization", "1", {packages:["geochart"]});
      google.setOnLoadCallback(drawRegionsMap);

      function drawRegionsMap() {

        var data = %s
        var datatable = google.visualization.arrayToDataTable(data);
        var options = {'region': 'US', 'width': 600, 'height': 400, 'resolution': 'provinces'};

        var chart = new google.visualization.GeoChart(document.getElementById('mychart'));

        chart.draw(datatable, options);
      }
    </script>


      <div class="container">
        <h1 style="text-align:center">Hello chart</h1>
        <div id="mychart"></div>
      </div>
  </body>
</html>
"""



htmlfile = open("1-9.html", "w")
htmlfile.write(chartcode % thelist)
htmlfile.close()
        
File found at: /files/code/answers/usajobs-midterm-1/1-9.py