class: center, middle, inverse, title-slide .title[ # MACS 30500 LECTURE 1 ] .author[ ### Topics: Course Logistics. Markdown. Git/GitHub from RStudio. ] --- class: inverse, middle # Agenda * Welcome to the course! * R and R Markdown * Git and GitHub from RStudio <!-- Three main topics, each 35ish minutes Official class time is 3:30 to 5:20 pm. Explain: 80 min too rushed, now we have 110 plenty. Probably need around 90 or 100 so plan to finish most days by 5:10ish Today we will talk about course objectives, topics, expectations, evaluation, software setup, and do a group activity on your expectations and tips to succeed in this course, etc. Let me start by introducing myself and the TA... then will go around and introduce each other SUGGESTIONS FOR NEXT TIME: add tips to be successfull pen and paper! make a plan before you code get into the habit of checking the function docs keep things organized! e.g. all your slides and RMd files, customize them know where to find what and when we covered it (vs chatGPt...) past students said: say everyone who puts effort and works hard on this class has gotten at least a B+ and usually in the A range --> --- class: inverse, middle # Part 1: Welcome to the course! --- ## Course Objectives The goal of this course is to acquire basic computational skills. **We focus on two things:** 1. Learn R (especially the "tidyverse") 2. Learn Best Practices for Reproducible Research **Expectations:** * We won't learn coding to become Computer Scientists. We learn it as a means to an end: analyze social science data! * You won't become an R expert in a few weeks. But you will learn some computational skills and techniques and, MOST IMPORTANTLY, you will gain the confidence needed to learn new techniques as you encounter them in your own research. --- ## Course Topics Main topics of this course: * Git/GitHub, R Markdown * Data Visualization * Data Transformation * Exploratory Data Analysis * R Programming: Control Structures, Functions, Data Structures * Debugging and Defensive Programming * Reproducible Research * Web-Scraping --- ## How we will do this We will start small and progressively build more complex code ```r print("Hello world!") ``` ``` ## [1] "Hello world!" ``` --- ## How we will do this We will have a mix of lectures, in-class exercises, and weekly homework assignments (most of your learning will happen there!) There are a lot of moving parts in this course (Website, Canvas, Ed Discussion, Workbench, Git/GitHub, R Markdown, in-class code, slides, etc.)... things won't always work smoothly. That's normal and part of the process. You will encounter errors while coding, many errors! And you will spend time debugging them. That's normal and part of learning. You will spend time finding resources to help: * Google, ChatGPT are OK (within limits, see Syllabus Policies on Plagiarism)... * ...but the documentation and the class materials should be the first places to go! There will be moments of frustration. That's, again, normal. Be patient with yourself! --- ## Some tips that students shared... * "One of the hardest things for me was remembering where I read a concept or saw specific code." *Advice: Organization is key (e.g., create your own version of the slides and in-class materials; your own system for retrieving concepts; etc.).* <br> * "Homework assignments are longer and more challenging than I originally anticipated." *Advice: Start early! You don’t need to know everything before you begin: just get started, and you'll learn (review book, slides, and in-class code) as needed.* <br> * "I’m in week 4, and I feel lost." *Advice: Remember that the material builds on itself. It's normal to feel overwhelmed at first, but the bigger picture will become clearer over time IF you stay consistent with your efforts every week. Also: come see us!* --- ## Some tips that students shared... * "There were times when I found functions that seemed to work, but weren’t covered in the slides or class materials, so I wasn’t sure if they were appropriate to use." *Advice: Generally, stick to the functions taught in class, as there's a reason for focusing on them. However, you can always demonstrate both approaches—comparing the function used in class with the one you wrote.* <br> * I wish I had reviewed the in-class exercises afterward: they would have helped me prepare for the homework assignments, which were generally more challenging." *Advice: Play with the in-class exercises right after class or the day after when they are still fresh in your memory!* <!-- Add to point above a version of the following: HWs assignments are take-home with no limited time and open-books, as such they are designed to be more challenging that in-class exercises Add one question on Make at habit to go to Office Hours and post on Ed this is the type of class for which OHs are a key element. That's why we offer more OHs opportunities than the average class Use the drop-in and by appointment model as in Fall 2024 and ask TAs to schedule OHs before HW are due and aim at covering all days --> --- ## Is this course a good fit for me? See [Should I take this course?](https://computing-soc-sci.netlify.app/faq/whom-course-for/) under FAQ in our website --- ## Course Expectations See [Course Expectations](https://computing-soc-sci.netlify.app/faq/course-obj-expectations/) under FAQ: * Attend classes and bring a laptop * Complete the readings * Ask questions and collaborate * 15-minute rule * Resources: AI, Google, StackOverflow, your peers (study group!) * Office Hours and ED Discussion * See the [how to ask for help](https://computing-soc-sci.netlify.app/faq/asking-questions/) page --- ## Collaboration is good *to a point* Researchers usually collaborate with one another on projects. Developers work in teams to write programs. Learning from others and/or the internet is good... BUT you are expected to complete your own work, understand what the code is doing, and be prepared to explain it line-by-line to someone else... --- ## Plagiarism Examples: DON'T copy code from your peers. DO ask a classmate to help you debug your code (*help you*, not do it for you) DON'T copy chunks of code from the internet. DO rewrite it and make sure you understand every line you write DON'T ask ChatGPT to do it for you. DO use AI tools to debug your code or to show examples to help you better understand -- In your homework, we will ask you to list the resources you consulted. For example: "I collaborated with student First and Last Name," "I asked ChatGPT to help with plotting a bar chart," etc. --- ## Homework and Evaluation See [Evaluation Guidelines](https://computing-soc-sci.netlify.app/faq/homework-evaluations/) under FAQ in the course website and the course Schedule (linked under Lectures): * Weekly assignments (the last one will serve as final project) * Submission policies for missed assignments, late work, etc. * Evaluation Philosophy and Rubric <!-- * **Weekly programming assignments and Final Project**: see [here](https://computing-soc-sci.netlify.app/homework/) * **Submission Policies** for missed assignments, late work, etc.: see [here](https://computing-soc-sci.netlify.app/faq/homework-evaluations/#missed-assignments-resubmission-late-work-etc) * **Evaluation Philosophy and Rubric**: see [here](https://computing-soc-sci.netlify.app/faq/homework-evaluations/#evaluation-philosophy) **Tips**: * do the readings before class, even if you do not understand everything * start early to work on the homework assignments * assignments (starting from the 2nd) will be a bit challenging, especially if you have never programmed before: they won't tell you exactly what to do to solve the problem * submit code that runs * form a study group, and ask for help if you get stuck --> --- ## Software Setup Go to our course website under [Setup](https://computing-soc-sci.netlify.app/setup/), and follow the instructions in the provided order Two options: 1. Use R Studio Workbench (recommended, it minimizes problems) 2. Install the software locally f you're having trouble with the setup, please stay after class for assistance. Only students officially enrolled in the course have access to R Workbench—let me know if you're not enrolled. **Make it a priority to have everything setup and running before next lecture!** --- ## Activity: get to know each other and share expectations <!-- In [this shared document]() --> In groups of 4-5 people, please share: * your name, and what you study * your expectations or goals for this course * one (or more) concerns you might have and strategies you plan to adopt to be successful in this course * any questions you have for us --- class: inverse, middle # Part 2: R scripts and R Markdown documents --- ## Let's begin by opening and describing our Software R Workbench: https://macss-r.uchicago.edu/ <!-- Open Workbench link: log in, start new session Describe the panels: Start from console, type in simple code, show it shows up in environment. Move to file, say this is where you communicate btw your computer and workbench, for example import a file from my machine into R workbench (e.g. with class roster on desktop), create a class-materials folder to keep things orgnaized Then create a R script, save it in class materials: show difference compared to previously typed code in console (now is saved) Create a Rmd as empty doc, stop here, move back to slides say: let's talk about R markdown first! I might go a bit fast on some of these slides but we ll be there for your reference. We ll do an in-class activity that ll review same content taught here --> --- ## R Scripts, R Markdown, and Markdown **R Scripts**: extension `.R` **R Markdown**: extension `.Rmd` **Markdown**: extension `.md` Remember: * R Scripts: only support code and comments * R Markdown and Markdown: support code, comments, and text! * R Markdown: an extension of Markdown designed for R We will use all of them, but mostly `.Rmd` and `.md`: `.Rmd` for the bulk of your homework assignments, while `.md` will become familiar to you as README in the Git/GitHub repo! *Since we are going to use mostly `.Rmd` files, let's review them more in-depth...* <!-- ## R Markdown: Knitting process <img src="rmarkdownflow.png" width="80%" style="display: block; margin: auto;" /> When you knit the document: 1. R Markdown sends the .Rmd file to knitr: http://yihui.name/knitr/ 1. Knitr executes all of the code chunks and creates a new plain markdown (.md) file which includes the code and its output 1. This plain markdown file is then converted by pandoc into any number of output types including html, PDF, Word document, etc.: http://pandoc.org/ --> --- ## R Markdown: Three main components 1. A **YAML header** surrounded by `---` at the top of the file 1. **Text** 1. **Chunks** of R code surrounded by ` ``` ` To insert them, you have different options: * Use the green “Insert” button icon in the editor toolbar * Manually type the chunk delimiters ` ```{r} and ``` ` * Keyboard shortcut: * Mac: Press Cmd + Option + I * Windows: Press Ctrl + Alt + I --- ## YAML header specifications ```default --- title: "Homework 1" author: "Sabrina Nardin" output: html_document --- ``` * **Y**et **A**nother **M**arkup **L**anguage * Standardized format for storing hierarchical data in a human-readable syntax * Provide basic info of your `.Rmd` file and decided how it is rendered See the readings for more on YAML header (e.g., parameters, bibliogrpahies, citations, etc.)! --- ## R Markdown: Basic Syntax ```` _Italics_ and *Italics* ```` *Italics* and _Italics_ -- ```` __Bold__ and **Bold** ```` **Bold** and __Bold__ <!-- double underscore here and tilda ~ --> -- ```` ~~Strikethrough~~ ```` ~~Strikethrough~~ ```` `inline code` ```` `inline code` --- ## R Markdown: Basic Syntax **Use a backslash `\` to make R Markdown special characters visible, e.g., to interpret them literally:** -- ```` I want to use \* for emphasis, not for italics: \*great\* ```` I want to use \* for emphasis, not for italics: \*great\* -- ```` I do not want a list here, I want the literal number followed by a dot: 1\. ```` I do not want a list here, I want the literal number followed by a dot: 1\. --- ## R Markdown: Unordered lists Use either `*`, `-`, or `+`, then a space, then the text: .pull-left[ ```` + item 1 + sub + sub - item 2 - sub - sub + item 3 - sub * sub - item 4 * sub * sub ```` ] .pull-right[ + item 1 + sub + sub - item 2 - sub - sub + item 3 - sub * sub - item 4 * sub * sub ] --- ## R Markdown: Ordered lists Write the number 1, followed by a period or a round bracket, then a space, then the text. For nested lists, indent once and use `+`, `*`, or `-` followed by a space: .pull-left[ ```` 1. item 1 + sub + sub sub + sub 1. item 2 * sub * sub 1) item 3 - sub + sub ```` ] .pull-right[ 1. item 1 + sub + sub sub + sub 1) item 2 * sub * sub sub * sub 1. item 3 - sub + sub sub ] --- ## R Markdown: Headers Use `#` to add headers. The more `#`, the smallest the header: .pull-left[ ```` # Main title, 1st level ## Section title, 2nd level ### Subsection title, 3rd level #### Subsection title, 4th level Write regular-sized sentences without `#` ```` ] .pull-right[ # Main title, 1st level ## Section title, 2nd level ### Subsection title, 3rd level #### Subsection title, 4th level Write regular-sized sentences without `#` ] -- NB: in R scripts or in `.Rmd` code chunks, `#` are used for comments and the number of `#` does not matter for comments (but it does for section titles!) --- ## R Markdown: links, tables, pictures **Link**: write the linked text in brackets `[]`, followed immediately by the URL in parentheses `()`. ```` [R Studio](https://www.rstudio.com/) ```` **Picture**: make sure the picture is in the correct folder, then type `![text](picture link, "title")`. The title is optional. **Table**: use `-----` for rows and `|` for columns. See https://www.markdownguide.org/extended-syntax/ --- ## R Markdown: Code chunks options **`eval = FALSE`** code is not evaluated; code appears in the report, results do not appear in the report. Useful to show an example code in your report, or when code has an error you want to show. Default is `eval = TRUE` **`include = FALSE`** code is evaluated; code does not appear in the report, nor do results. Useful when you do not want to clutter your report with too much code. Default is `include = TRUE` **`echo = FALSE`** code is evaluated; code does not appear in the report, results appear in the report. Useful to show your output to people that are not interested in the code that produced it. Default is `echo = TRUE` **`error = TRUE`** code is evaluated and output appears in the report even if there is an error. Useful to knit with errors. Default is `error = FALSE` **`message = FALSE`** or **`warning = FALSE`** prevents messages or warnings from appearing in the report. Default is `message = TRUE` or `warning = TRUE` --- ## R Markdown: Code chunks options <img src="chunk_options.jpg" width="80%" style="display: block; margin: auto;" /> Source: https://r4ds.had.co.nz/r-markdown.html#chunk-options Full list: http://yihui.name/knitr/options/ --- ## R Markdown: the Knitting process R Markdown files allow to generate well-formatted documents (md, pdf, word, html, etc.) that include text, code, and output. To create such documents, you “Knit” or “render” them. You have three options: 1. by clicking the “Knit” button in the script editor panel of your R Markdown file and select the desired output 1. by adding the desired output in the YAML header such as: `github_document`, `pdf_document`, `word_document`, `html_document`, etc. (notice: without `\(\LaTeX\)` installed, the pdf won't work) 1. by using `render()`, as explained [here](https://pkgs.rstudio.com/rmarkdown/reference/render.html). For example, run in your console: `rmarkdown::render("my-document.Rmd", output_format: html_document`) --- ## R Markdown: Knitting process When you "knit" your document, the following things happen: 1. R Markdown sends the `.Rmd` file to `knitr`: http://yihui.name/knitr/ 1. Knitr executes all of the code chunks and creates a new plain Markdown `.md` file which includes the code and its output 1. This plain Markdown file is then converted by `pandoc` into any number of output types including html, PDF, Word document, etc.: http://pandoc.org/ <img src="rmarkdownflow.png" width="80%" style="display: block; margin: auto;" /> --- class: inverse, middle # Practice basic R Markdown syntax * Go to our website under Lecture 1 to download the class materials for today, and complete the in-class practice * Craving more practice? Check out the interactive tutorial linked in Lecture 1 on our website! --- class: inverse, middle # Part 3: Git and GitHub --- ## Motivation #### What are the advantages of learning to program and using version control to track our progress VERSUS using a GUI software and manually saving different versions of our file? --- ## Two different approaches **TASK**: Write a report analyzing the relationship between ice cream consumption and crime rates in New York City. **APPROACH**: Jane and Sally approach this task differently -- .pull-left[ ### Jane: a GUI workflow 1. Searches for data files online 1. Cleans the files in Excel 1. Analyzes the data in Stata 1. Writes her report in Google Docs ] -- .pull-right[ ### Sally: a programmatic workflow 1. Creates a folder specifically for this project * `data` * `graphics` * `output` 1. Searches for data files online 1. Cleans the files in R 1. Analyzes the files in R 1. Writes her report in R Markdown ] --- class: middle <img src="https://i.imgflip.com/1szkun.jpg" width="70%" height="70%" style="display: block; margin: auto;" /> --- ## Automation * Jane forgets how she transformed and analyzed the data * Extension of analysis will be difficult! * Sally uses *automation* * Re-run programs * No mistakes * Much easier to implement *in the long run* --- ## Reproducibility Reproducibility is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them Also allows the researcher to precisely replicate their analysis --- ## Version control * Tracking revisions with multiple copies * `analysis-1.r` * `analysis-2.r` * `analysis-3.r` * Repository on your computer for which the Version Control Software tracks all changes (who, when, what, etc.). Example: Git * Usually implemented in conjunction with remote serves to store backups of your repository. Example: GitHub *About Git and GitHub...* --- ## Git and GitHub This is a short summary. Check the readings for today for a more in-depth explanation! Often these two terms are used together, but they are not the same thing (see [here](https://computing-soc-sci.netlify.app/faq/what-are-git-github/) for more): * **Git:** version control system used to track changes in repositories (which are like folders in a project) over time * **GitHub:** cloud-based hosting service that hosts and lets you manage your Git repositories (GitHub is where you store / distribute repositories) <img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7MQClrRp--/c_imagga_scale,f_auto,fl_progressive,h_420,q_auto,w_1000/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7ce4oq75sia6ni6q46s3.png" width="35%" style="display: block; margin: auto;" /> --- ## General Git workflow Steps to memorize: * Make changes locally (your machine or Workbench) to your files * Save your changes locally * Pull-Stage-Commit-Push: * Pull from GitHub (this is like "refreshing" your copy of the files on your machine, especially important when you collaborate with others) * Stage and Commit your changes to your local Git repo * Push your changes online to GitHub While 99% of the time you'll use Git to add, commit, push, and pull, there are many more Git commands available. After this class, you'll have enough exposure to Git to quickly learn new commands and become an expert Git user! --- class: inverse, middle ## Git and GitHub in-class practice * Now we practice this workflow within RStudio by completing the [Using Git with RStudio](https://computing-soc-sci.netlify.app/setup/git/git-with-rstudio/) tutorial from our website * Once Homework 1 is released on the website (tomorrow) accept it, it will have instructions to complete the homework-specific workflow. Remember to provide us your GitHub username and accept the invitation to join our GitHub organization! --- class: inverse, middle ## More Git... Come back to this section for further tips after you have familiarized yourself with the basic workflow! --- ## Git Reminders **Stage and Commit Often:** Think of it as taking a snapshot of your project. Save, stage, and commit multiple times during a work session. **Write Meaningful Commit Messages:** Keep your messages concise but informative to remind yourself (and others) of the changes. There are many tips online for "commit message best practices." **Push Often, But Less Frequently Than You Commit:** Some people push every time they commit, while others push multiple commits at once. Find what works best for you. **On Git Interfaces:** In this class, we'll use the integrated RStudio interface. Other popular options include GitHub Desktop and the terminal/shell. --- ## What to commit / not to commit **What files to commit:** * Source files (code, README, etc.) * Markdown files * Data files **What files to not commit:** * Temporary files * Log files * Files with private details * Any file greater than 100 megabytes -- `.gitignore` * It is a system file that tells Git which files/directories to ignore * Examples of gitignore files [here](https://github.com/github/gitignore): look for the gitignore file for R (standard file that will take care of most of your needs) * For more info to create a .gitignore file see [here](https://www.w3schools.com/git/git_ignore.asp?remote=github) --- ## Git Large File Storage What if I have a large file I want to track with Git? * See [Git Large File Storage](https://git-lfs.github.com/) * This is a separate software for tracking large files * Integrates with Git * Generally charges a fee --- ## Git and merge conflicts Typically these occur when you haven't pulled before pushing changes in a collaborative project. If you work alone on a given repo this is less relevant, unless you modify a file directly from GitHub (which is not a good practice!). When you have a merge conflict there are two 'current' versions of a file: one in your local Git repo and one in the online GitHub repo! These two should always mirror each other, when not, a conflict arises So, they need to be reconciled: * sometimes they can be automatically resolved * sometimes you have to solve them manually (review the code side-by-side and confirm the changes) --- ## Avoiding (most) Git problems 1. Change your file locally, then stage and commit them (early and often) 1. Push regularly to GitHub 1. Pull before you push To test your understanding and for further practice using Git with the integrated R Studio Git GUI, complete the ["Using Git with RStudio"](https://computing-soc-sci.netlify.app/setup/git/git-with-rstudio/) tutorial on our course website --- ## Burn it all down <img src="https://imgs.xkcd.com/comics/git.png" width="35%" style="display: block; margin: auto;" /> .footnote[Source: [xkcd](https://xkcd.com/1597/)]