The project is due on Tuesday, June 7. It is worth 35 percent of your grade. You are allowed to work with one partner.
Your project must follow all of the requirements listed in the Requirements section and document your work in your project repo's README.md.
Sample TEST_README.md
Jeff Barrera's BikeCrashMapper
My quickie sample app: Congressmembers and their fancy colleges
This final project requires you to tell a compelling story based on data you've researched, wrangled, and analyzed. You will also be expected to create and deploy your work as a public-facing web appplication.
By now, you've written enough Flask app boilerplate to be able to go through the motions for creating and deploying a web app that reads a data file and outputs a basic list/table of the data.
For this final project, I want you to take the time to come up with a thoughtful story, important enough for you to want to tell, and based off of data that is interesting for you to explore. You don't have much time to overcome technical obstacles. But you do have time to do research on a limited-scope topic and then to think about how you can build a data app that better informs the world.
Create a new Github repo for this final project; name it as you see fit for your project.
This section explains what your app must have to be considered complete.
And as part of the project, your final project repo must contain a README.md file that contains how you've fulfilled the requirements. For the section that pertains to the Deployment requirements, the text can be as short as:
I decided to use Frozen Flask to deploy my app as static HTML to Github Pages, because it was easy enough to render and cache all of the possible pages and routes.
Must contain at least one join
The records in your multiple datasets should be related some how, either by an exact shared key, or a combination of fields..
purpose | table_a | table_b |
---|---|---|
Match person to votes | Congressmember biographies; bioguide_id | Voting record; bioguide_id |
Calculate SAT scores by poverty rate | School SAT scores, school_ID; school_ID | School free lunch eligibility ratio; school ID |
Correlate constituent wealth and rep. party | Census per-capita income; state_abbrev + house_district | Elected congressmembers; state_abbrev + house_district |
Correlate location of business with types of health violations | Restaurant licenses (with locations); license_id | Restaurant health violations; restuarant_license_id |
Link medical industry payments to doctors employed by med schools | OpenPayments; employer name | CollegeScoreCard; school_name |
Must create at least one new categorical variable
Catgegorical refers to a variable that, in a nutshell, does not have a numerical value.
An example: adding a field called, 'lifephase'
to the Congressmembers dataset, which is set to "old"
if the
Congressmember is 60 years or older; 'middleage'
if the member is 40 to 60, and ; 'youngin'
otherwise.
Not a very useful category in this case, but you get the idea…
Good | Bad |
---|---|
Congressmember's state of birth | Congressmember's state represented |
Categorize crime report date as Weekend or Weekday | Categorize crime report date as day of month, e.g. 1-31 |
Categorize crime report date by Morning/Evening | Categorize crime report by hour |
Categorize Congressmember's alma mater | Categorize Congressmember as "old" or "young" |
Must create at least one new continuous variable
A continuous variable is typically numeric and can include datetimestamps.
In the previous example, I explained how to turn a birthdate
field into a categorical variable.
The birthdate, when it's just a string '1960-12-12'
, is technically categorical. Turning it into a timestamp would make it continuous. Or, it could be turned into an integer: number of days between today and birthdate
.
Good | Bad |
---|---|
Age when first elected into Congress | Number of character's in Congressmember's last name |
Number of days served in Congress | Number of seconds lived |
Average amount of campaign contributions received by cycle | Total campaign contributions receivedd |
Rate of change of population from 2015 to 2010 | Absolute population change from 2010 to 2015 |
Must contain at least one summarization
subject | summarization | group by |
---|---|---|
Congressmembers, number of terms elected | record count | bioguide_id |
Congressmembers, average age by state and party | average(birthdate-today) | state, party |
stop and frisks | record count | race, year, police_district |
average score of restaurant inspection results | average, sum(score) | restaurant_id |
R
_and I
, or just D
.bioguide_id
last_name
<last_name>, <first_name>
birthdate
The URL paths and rendered webpages that your app must contain.
| Jane Smith |
| Rep. | Jane Smith | R | Iowa |
| Rep. | Jane Smith | R | Iowa | $952,000 |