HW03: Wrangling data
Overview
Due Thursday, October 24th (11:59 PM).
In the previous assignment you practiced working with dplyr
and ggplot2
on a clean and small-sized dataset. The goal of this assignment is to further develop your dplyr
and ggplot2
skills and practice cleaning, wrangling, and exploring social science data in a realistic research context (e.g., dealing with messy and larger datasets).
Accessing and cloning your hw03
repository
- Go at this link to accept the invitation and create your private
hw03
repository on GitHub. Once you do so, your repo will be built in a few seconds. It follows the naming conventionhw03-<USERNAME>
- Once your repository has been created, click on the provided link to access it.
- Finally, follow the same steps you completed for
hw01
to clone the repository to your R Workbench.
General workflow
See Homework 1 for details.
Assignment description
Explore the Data
The Supreme Court Database contains detailed information of every published decision of the U.S. Supreme Court since its creation in 1791. It is one the most utilized database in the study of judicial politics.
In the hw03
repository, you will find two data files:
scdb-case.csv
scdb-vote.csv
These contain the exact same data you would obtain if you downloaded the files from the original website, but reformatted to be stored as relational data files. That is, scdb-case.csv
contains all case-level variables, whereas scdb-vote.csv
contains all vote-level variables.
The data is structured in as follows:
scdb-case.csv
contains one row for every case and one column for every variablescdb-vote.csv
contains one row for every vote by a justice in every case and one column for every variable
The current data contain information on every case decided from the 1791-2020 terms.1 There are several variables to explore, and I encourage you to do so, but these are the main ones you will need to answer most prompts:
caseIssuesId
chief
dateDecision
decisionDirection
decisionType
declarationUncon
direction
issueArea
justice
justiceName
majority
majVotes
minVotes
term
Answer the questions
Your repository for this assignment includes an .Rmd
file with three tasks, each containing a set of questions, some specific and others more open-ended. Please, answer all of them.
- Task 1: Recode variables
- Task 2: Answer specific questions, and formulate (and answer) a question according to your interests
- Task 3: Answer more open-ended questions, and formulate (and answer) a question according to your interests
Please note:
- The questions, especially the open-ended ones, are designed to help you think and plan before diving into coding, similar to challenges you might face in real-world research settings. They won’t lay out exact step-by-step instructions, as they encourage you to apply and expand on the code you’ve learned so far. One of the main goals of this assignment (and future ones) is for you to tackle these challenges independently. We know you can do it!
- All assignments are designed to be completed in multiple coding sessions. Start early, and save, stage, commit, and push often!
- You are not allowed to use AI tools to generate R code to complete this and future assignments. The only acceptable uses of AI tools are the following two: debugging (but only after you have made an attempt on your own) and generating examples of how to use a specific function (but also check the function documentation, and the course materials – if there are many ways to code something, you are expected prioritize what we have learned vs. what we have not). While AI tools can be helpful outside of class, this is a coding course, and it’s all about learning to code in R, not learning to use AI!
- Do not submit code that you cannot fully explain to yourself and someone else.
Formatting Guide
Refer to HW2 guidelines for formatting tables and graphs. Ensure you follow proper R coding style and make appropriate use of comments in your code (refer to the readings and lecture on this topic).
Submit the assignment
To submit the assignment, follow these steps:
First push to your repository the last version of your assignment before the deadline. Make sure to stage-commit-push the following files:
scouts.Rmd
: main file you will add your code toscotus.md
: you will generate this file from the .Rmd by knitting it- the folders with all the graphs you generated in your
.Rmd
(we need this file to be able to see your graphs and grade your homework, if you do not submit it, we will mark down on reproducibility)
When you are ready to submit, copy your repository URL (e.g.
https://github.com/css-fall24/hw3-brinasab
) and submit it on Canvas under HW03 before the deadline. Do not submit files on Canvas, we only need the link to your repositoryAs part of your submission, at the end of the
.Rmd
for this homework, include a few reflections on your experience with this assignment, list any help you might have received (from both humans and AI), as specified in the instructions.
Assessment
All homework assignments are evaluated using a rubric which you can access from the assignment submission folder on Canvas. See also here for more.
Further guidelines for this specific homework to help you assess your work before submitting it:
In the past, “Excellent” or “Very Good” work included submissions that completed all components of the assignment correctly and accurately. Code ran correctly, followed proper style, and was well-documented without excessive comments. The code was appropriately complex and aligned with the prompt and course material (e.g., demonstrating a nuanced understanding of concepts and using them effectively throughout the assignment). Variables are correctly re-coded, possibly with the use of different approaches that we have learned in the course. Graphs and tables were well-executed and carefully chosen (e.g., matching variable types with graph types), with appropriate labels, colors, and enhanced default settings. The analysis was clear and easy to follow, with graphs properly interpreted. Explanation of the student’s logic to solve a task, when required, was clear and on-point. There was a strong understanding of required packages, extending beyond the basics. Additionally, the repository showed a history of multiple informative commits, reflecting the progression and backup of work. Command of R Markdown syntax was evident, with no errors.
Acknowledgments
The initial version of this homework was developed by Benjamin Soltoff (“Computing for the Social Sciences” licensed under the CC BY-NC 4.0 Creative Commons License). Further implementations have been developed by Sabrina Nardin.
Terms run from October through June, so the 2020 term contains cases decided from October 2020 - June 2021. ↩︎