Course Material
Overview
0.1
Software Requirements and Installation Instructions
0.1.1
Git
0.1.2
Spreadsheet Editor
0.1.3
SQLite
0.1.4
R
0.1.5
RStudio
0.1.6
Required R Packages
1
Project Organization
1.1
Directory structure
1.1.1
Overlap
1.1.2
Minimizing duplication
1.1.3
Self-containedness
1.1.4
Tradeoffs
1.2
Golden rules
1.3
Be flexible
1.4
Documentation
1.5
Naming files
1.5.1
Computer-readable file names
1.5.2
Human-readable file names
1.5.3
File names that work well with default ordering
1.6
RStudio Projects
1.7
Relative and absolute paths
1.7.1
Path separators
1.7.2
References
2
Version Control with Git
2.1
Command line basics
2.2
Configuring Git
2.3
Getting help
2.4
Creating a repository
2.5
Tracking files
2.6
Ignoring files
2.7
The 3 states
2.8
Un-staging files
2.9
Recovering a previous version of a file
2.10
Removing Git
2.11
Git everyday
2.12
References
3
Collaborative Science with GitHub
3.1
Adding a remote to a local repository
3.2
Pushing to a remote repository
3.3
Cloning a repository
3.4
Synchronizing changes among collaborators
3.5
Resolving conflicts
3.6
Avoiding conflicts
3.7
Working with branches
3.8
Forking a repository
3.9
Pull requests
3.10
References
4
Best Practices in the Use of Spreadsheets
4.1
What are spreadsheets good for?
4.2
Human-readable vs. computer-readable data
4.3
Tidy data
4.4
Problematic practices
4.4.1
Multiple variables in a single column
4.4.2
Rows for variables, columns for observation
4.4.3
Multiple tables in a single spreadsheet
4.4.4
Multiple sheets
4.4.5
Using formatting to convey information
4.4.6
Putting units in cells
4.4.7
Using problematic column names
4.4.8
Conflating zeros and missing values
4.4.9
Using problematic null values
4.5
Document, document, document!
4.6
References
5
Relational Databases
5.1
What is a relational database?
5.2
Why bothering with relational databases?
5.3
Database components
5.4
Database design and architecture
5.5
Data types
5.6
A first look at our practice database
6
Basics of SQL Language
6.1
The SQL language
6.2
Writing SQL queries
6.2.1
Limiting results
6.2.2
Sorting results
6.2.3
Finding unique values
6.2.4
Filtering
6.2.5
Calculations
6.2.6
Aggregate functions
6.2.7
Aliases
6.2.8
Grouping
6.2.9
Filtering based on computed values
6.2.10
Null values
6.2.11
Joins
6.2.12
Nested
SELECT
statements
6.2.13
Order of operations
6.3
Building a database
6.3.1
Creating a new database in SQLite Studio
6.3.2
Creating tables
6.3.3
Adding constraints
6.3.4
Order of table creation
6.3.5
Populating the tables
6.3.6
Autoincrements as primary keys
6.3.7
Foreign keys
6.3.8
Crossing existing information to derive new tables
7
Interfacing Databases in R with RSQLite
7.1
The RSQLite package
7.2
Establising a database connection
7.3
Sending queries to the database
7.4
A note on reproducibility
8
Dynamic Documents with RMarkdown
8.1
What is a dynamic document?
8.2
Markdown and RMarkdown
8.3
Installing RMarkdown
8.4
Writing an RMarkdown document
8.4.1
Creating the document
8.4.2
YAML headers
8.4.3
Code chunks
8.4.4
Inline code
8.4.5
Text formatting
8.4.6
Embedding images
8.4.7
Adding a table of contents
8.4.8
Knitting the document
8.4.9
The sky is the limit!
9
Automatically Generated Websites with GitHub Pages
9.1
Bookdown
9.1.1
Creating a book
9.2
Publishing a book with GitHub Pages
9.2.1
Step 1: Set up compatibility with GitHub Pages
9.2.2
Step 2: Set up Git repository
9.2.3
Step 3: Link Git repository to remote GitHub repository
9.2.4
Step 4: Enable GitHub Pages
9.3
Maintaining the website
9.4
References
10
Introduction to R
10.1
Getting started
10.2
Basics of R programming
10.2.1
Assigning objects
10.2.2
Adding comments
10.2.3
Headers and description
10.2.4
Navigating through the script
10.2.5
Functions and their arguments
10.2.6
Writing functions
10.2.7
Data types
10.2.8
Vectors
10.2.9
Factors
10.2.10
Data frames
10.2.11
Dimensions
10.2.12
Subsetting
10.2.13
Logical conditions
10.2.14
Lists
10.2.15
Control structures
10.2.16
Repeating operations
10.3
References
11
Troubleshooting in R
11.1
How to interpret error messages
11.1.1
Locating the problematic piece of code
11.1.2
Isolating the error
11.1.3
Deciphering error messages
11.1.4
Syntax errors
11.1.5
Errors from using package functions
11.2
How to look for help
11.2.1
R documentation
11.2.2
Google
11.2.3
StackOverflow et al.
11.3
How to troubleshoot a loop
11.4
How to troubleshoot a function
11.5
Reproducible examples
11.6
The rubber duck
12
Dependency Management in R
12.1
What’s the need for dependency management?
12.2
The
renv
package
12.2.1
Usage
12.3
References
13
Data Wrangling with tidyverse
13.1
Welcome to the tidyverse
13.2
Tidyverse functions
13.2.1
Subsetting columns
13.2.2
Subsetting rows with logical conditions
13.2.3
Concatenating operations with pipes
13.2.4
Creating new columns
13.2.5
Limiting results
13.2.6
Tibbles
13.2.7
Joining tables
13.2.8
Changing the order of columns
13.2.9
Calculations by group
13.2.10
Sorting results
13.2.11
Extracting columns as vectors
13.2.12
Conditional value assignment
13.3
Style
13.4
References
14
Data Visualization with ggplot2
14.1
A Grammar of Graphics
14.2
Building plots
14.2.1
Basic scatterplot
14.2.2
Basic histogram
14.2.3
Basic density plot
14.2.4
Other
geom
functions
14.3
Customizing plots
14.3.1
Changing axis labels
14.3.2
Changing colors
14.3.3
Modifying legends
14.3.4
Themes
14.3.5
Adding transparency
14.3.6
Overlaying plots with different aesthetics
14.3.7
Modifying symbol appearance
14.3.8
Fill vs. color
14.3.9
Modifying axis values
14.3.10
Adding an extra dimension with facets
14.3.11
Error bars
14.3.12
Plot model predictions with uncertainty
14.3.13
Plot paths of ordered observations
14.4
Colorblind-friendly plots with
viridis
14.5
Arranging multiple plots together with
patchwork
14.6
Saving plots
14.7
References
15
Dates and Times in R
15.1
What makes dates and times special
15.2
Definitions
15.3
Under the hood
15.4
The
lubridate
package
15.4.1
Parsing dates and times
15.4.2
Extracting components of a date-time object
15.4.3
Time zones
15.4.4
Time spans
15.5
References
16
Introduction to Geospatial Data in R
Published with bookdown
Reproducible Data Science
Chapter 16
Introduction to Geospatial Data in R
Coming soon