HW02: Exploring data
Overview
Due Thursday, October 17th (11:59 PM).
In HW01 you have demonstrated knowledge of your software setup, Git/GitHub via RStudio, and Markdown. The goal of this second assignment is to practice transforming and visually exploring data with dplyr
and ggplot2
.
Accessing and cloning your hw02
repository
- Go at this link to accept the invitation and create your private
hw02
repository on GitHub. Once you do so, your repo will be built in a few seconds. It follows the naming conventionhw02-<USERNAME>
- Once your repository has been created, click on the provided link to access it.
- Finally, follow the same steps you completed for
hw01
to clone the repository to your R Workbench.
General workflow
See Homework 1 for details.
Assignment description
Your goal for this assignment is to apply what you have learned so far by answering a set of questions using a cleaned dataset that we provide: the “mass shooting” dataset.
The United States experiences far more mass shooting events than any other developed country in the world. Policymakers, politicians, the media, activists, and the general public acknowledge the widespread prevalence of these tragic events. However, effective policies to prevent such incidents should be grounded in empirical data
In July 2012, in the aftermath of a mass shooting in a movie theater in Aurora, Colorado, Mother Jones published a report on mass shootings in the United States since 1982. Importantly, they provided the underlying data set as an open-source database for anyone interested in studying and understanding this criminal behavior.
Obtain the data
This dataset in included in the rcis
library on GitHub.
If you are working on Workbench, you should have everything already installed. Simply load the library by typing in your console
library(rcis)
, then load the dataset by typingdata("mass_shootings")
. Type?mass_shootings
for detailed information on the variables and other coding information. I’d suggest to work with R version 4.2 on WorkbenchIf you are using R on your local computer, you first need to install the
rcis
by typing in your consoleremotes::install_github("css-materials/rcis")
. If you don’t already have theremotes
library installed, you will get an error. Go back and install it first usinginstall.packages()
, then installrcis
. Finally, the mass shootings dataset can be loaded usingdata("mass_shootings")
. Use the help function in R?mass_shootings
for detailed information on the variables and other coding information.
Answer the questions
Your repository for this assignment includes a set of questions, some very specific and others more open-ended. Answer all of them.
Please note:
- The questions, especially the open-ended ones, are designed to help you think and plan before diving into coding, similar to challenges you might face in real-world research settings. They won’t lay out exact step-by-step instructions, as they encourage you to apply and expand on the code you’ve learned so far. One of the main goals of this assignment (and future ones) is for you to tackle these challenges independently. We know you can do it!
- All assignments are designed to be completed in multiple coding sessions. Start early, and save, stage, commit, and push often!
- You are not allowed to use AI tools to generate R code to complete this and future assignments. The only acceptable uses of AI tools are the following two: debugging (but only after you have made an attempt on your own) and generating examples of how to use a specific function (but also check the function documentation, and the course materials). While AI tools can be helpful outside of class, this is a coding course, and it’s all about learning to code in R, not learning to use AI!
- Do not submit code that you cannot fully explain to yourself and someone else.
Formatting Guide
Formatting graphs
While you are practicing Exploratory Data Analysis, your final graphs should be appropriate for sharing with outsiders. That means your graphs should have:
- A title
- Labels on the axes (type
?labs
in your Console for details)
Consider adopting your own color scales, taking control of your legends (if any), playing around with themes, and generally customizing your graphs to improve their visual appeal and clarity.
Formatting tables
When presenting tabular data (using dplyr::summarize()
), use the kable()
function from the knitr
package to format the table for the final document. Keep reading for an example on how to use this function.
The code below displays a basic table summarizing where gun deaths occurred:
# calculate total gun deaths by location
count(mass_shootings, location_type)
## # A tibble: 6 × 2
## location_type n
## <chr> <int>
## 1 Airport 1
## 2 Military 6
## 3 Other 49
## 4 Religious 6
## 5 School 18
## 6 Workplace 45
Instead, use kable()
to format the table, add a caption, and label the columns:
count(mass_shootings, location_type) %>%
kable(
caption = "Mass shootings in the United States by location",
col.names = c("Location", "Number of incidents")
)
Table: Table 1: Mass shootings in the United States by location
Location | Number of incidents |
---|---|
Airport | 1 |
Military | 6 |
Other | 49 |
Religious | 6 |
School | 18 |
Workplace | 45 |
Run ?kable
in the console for additional options. We expect you to use this function for formatting tabular data in this and future assigments.
Submit the assignment
To submit the assignment, follow these steps:
First push to your repository the last version of your assignment before the deadline. Make sure to stage-commit-push the following files:
mass-shootings.Rmd
: you will add your code to this filemass-shootings.md
: you will generate this file from the .Rmd by simply knitting it, like you did for HW01 (we need this file to be able to see your graphs and grade your homework, if you do not submit it, we will mark down on reproducibility)mass-shootings_files/
: this folder contains all the graphs that you generated in your.Rmd
When you are ready to submit, copy your repository URL (e.g.
https://github.com/css-fall24/hw2-brinasab
) and submit it on Canvas under HW02 before the deadline. Do not submit files on Canvas, we only need the link to your repository.As part of your submission, at the end of the
.Rmd
for this homework, include a few reflections on your experience with this assignment, list any help you might have received (from both humans and AI), as specified in the instructions.
Assessment
All homework assignments are evaluated using a rubric: see here.
Below are further guidelines for this specific homework to help you assess your work before submitting it.
In the past, “Excellent” or “Very Good” work included submissions that completed all components of the assignment correctly and accurately. Code ran correctly, followed proper style, and was well-documented without excessive comments. The code was appropriately complex and aligned with the prompt and course material (e.g., demonstrating a nuanced understanding of concepts and using them effectively throughout the assignment). Graphs and tables were well-executed and carefully chosen (e.g., matching variable types with graph types), with appropriate labels, colors, and enhanced default settings. The analysis was clear and easy to follow, with graphs properly interpreted. There was a strong understanding of required packages, extending beyond the basics. Additionally, the repository showed a history of multiple informative commits, reflecting the progression and backup of work. Command of R Markdown syntax was evident, with no errors.
Acknowledgments
The initial version of this homework was developed by Benjamin Soltoff (“Computing for the Social Sciences” licensed under the CC BY-NC 4.0 Creative Commons License). Further implementations have been developed by Sabrina Nardin.