Your computer should not be a mess.
To work efficiently with data, you want to organize your computer in a certain way. The same applies to version control. To work efficiently with version control applications (don’t panic if you ignore what version control is—We’ll cover it soon), you want to organize your computer in a certain way.
Organizing your folders and files in that certain way will make everything much easier for you.
RStudio projects and working directories
You’ll crunch numbers on a daily basis during your doctoral studies. Most SGPP folks are versed in R, a programming language for data analysis.
R users organize their work around a thing called RStudio projects. We’ll show how to create RStudio projects soon, but let’s begin by saying that RStudio projects are files with a particular extension: .Rproj files.
R users are also familiar with the concept of working directory. A working directory is just a folder on your computer where an .Rproj file is located.
To be precise, the working directory is the folder where you should keep all the files (e.g., the CSV file containing some survey data), scripts (e.g., the text files where you write and save your code), and any input that is relevant to the RStudio project you’re working on.
Once again, the working directory is just a folder with an RStudio project inside. The RStudio project, an .Rproj file, and the working directory, a folder, share the same name.
R has a function that prints the current working directory’s path: getwd()
.
If you execute the getwd()
function in the R console, R would print the path of the current working directory, say, /Users/edwinalvarado/Documents/r_projects/unofficial_primer is the folder where I keep all the elements that are required to render this website, which is itself an RStudio project under the name unofficial_primer.Rproj.
Computer organization
Regarding folders and files organization, you should create two main folders on your computer, namely, r_projects and github_projects.
You can locate both folders right in your Documents folder. In my computer, these two folder paths look something like this:
/Users/edwinalvarado/Documents/r_projects.
/Users/edwinalvarado/Documents/github_projects.
You should gather all RStudio projects’ working directories (i.e., folders) in the r_projects folder.
In r_projects, you want to create one specific working directory (i.e., one specific folder) for each course or project (paper, study notes, dissertation, etc.) you work on.
At some point, you’ll have a large number of working directories:
/Users/edwinalvarado/Documents/r_projects/pol601.
/Users/edwinalvarado/Documents/r_projects/pol602.
/Users/edwinalvarado/Documents/r_projects/pol603.
/Users/edwinalvarado/Documents/r_projects/second_year_paper.
/Users/edwinalvarado/Documents/r_projects/comps.
/Users/edwinalvarado/Documents/r_projects/prospectus.
And so on… Hopefully, you’ll become this systematic and professional when organizing your computer!
The RStudio projects in the r_projects folder are local; that is, they exist in—and only in—your computer.
On the other hand, if you want to load an RStudio project to GitHub so that you can share it with others or track changes using GitHub’s version control tools, the working directory in question should instead go to the github_projects folder.
Creating RStudio projects
To create an RStudio project—and, thus, a working directory—for, say, POL 601, you follow these steps:
Open RStudio.
Click on File > New Project, which will open a new window.
Select New Directory.
Choose an appropriate directory_name (use underscores instead of white spaces), say, pol_601 or pol601.
Click on Browse and make sure to set the r_projects folder in the Create project as subdirectory of field.
Click on Create Project.
By completing the steps above, you’ll create a new RStudio project and, hence, a new working directory in r_projects.
Bear in mind that a working directory is just a folder containing an RStudio project.
In my case, as the example goes, the working directory’s path would be /Users/edwinalvarado/Documents/r_projects/pol601. There, in that folder, I’ll find a pol601.Rproj file, which is an RStudio project.
The next time I have to do some programming related to POL 601, I will simply click on its corresponding RStudio project: pol601.Rproj. RStudio will then open a new session rooted in the working directory corresponding to POL 601.
Importantly, the good habit of organizing our work around RStudio projects guarantees that we don’t need to worry about manually setting the working directory, which tends to be taxing for most beginners.
Here’s a reference with further details about RStudio projects.
And you’ll simply repeat this tutorial for every course or project that comes along the way in your new life as an R user.
To sum up, we have learned that R users organize all their programming work around RStudio projects. We described how to create RStudio projects. We showed that creating an RStudio project results in a new working directory (i.e., a new folder) that we should locate in the r_projects folder to keep them all together.
It is key to notice that the new working directory must also have a certain structure. You should create at least these three sub-folders:
r_projects/pol601/data, where you store the raw data.
r_projects/pol601/scripts, where you store the text files containing all the programming that loads, wrangles, tidies, models, or visualizes the data.
r_projects/pol601/output, where you store wrangled data, model output (modeling data might be time-consuming, so you don’t want to re-run models again and again), visualizations, etc.
When you create a Quarto document to produce a PDF report (I have a post about it), instead of creating a sub-folder for reports, I suggest keeping that .qmd file right at the root of the working directory.
I shared a model working directory via GitHub. It looks like this:
As you see, rendering a PDF document like assignment_1.pdf yielded a host of ancillary files with the same prefix, assignment_1, making the working directory seem packed. However, I still suggest that beginners keep the .qmd file at the root—rather than moving it to a sub-folder—for convenience when setting relative paths within the Quarto document. I don’t want to get technical about it, so please just trust me!
Finally, it is sensible to avoid file deletion. You never know whether that trashy script you’re about to delete will have a last dance in the future. Instead of deleting, you create an old sub-folder in the same folder where the junk file is and move it there along with any other old stuff.
Creating GitHub repositories
Often, you want your RStudio projects to be GitHub repositories as well. Why do we care about GitHub this much? Because GitHub is such a great asset!
GitHub is a version control tool, meaning that you can recover previous coding versions. Because GitHub is in the cloud, you don’t have to stress about your computer getting stolen or messed up. All your programming is safe at GitHub! Furthermore, GitHub is handy to share code with instructors, co-authors, and colleagues.
We all love GitHub! It has a relatively steep learning curve, though.
To keep all your GitHub repositories together, as mentioned earlier, you must create a folder on your computer, github_projects.
Of course, you must also create a GitHub account and download GitHub Desktop.1
1 I suggest that you get used to running GitHub commands in the RStudio’s Terminal. Using the Terminal approximates better the real-world jobs in the data science industry.
The next steps are to create a GitHub repository and populate it with an RStudio project:
Log in to the GitHub website.
Click on your profile picture (upper right corner).
Click on Your repositories.
Click on New (a green button) to create a new repository.
Choose an appropriate name (say, pol601) and make sure that the repository is set to be private.
Click on Create repository, which will lead you to the main page of the new repository.
Click on Code (a green button).
Click on Local.
Click on Open with GitHub Desktop, which will lead you to GitHub Desktop.
In GitHub Desktop, click on Browse and make sure that the repository will be located in the github_projects folder.
The GitHub repository now exists locally on your computer and, more specifically, is located in the github_projects folder.
Now, it is time to populate that github_projects/pol601 folder with an RStudio project.
So you follow the following steps:
Open RStudio.
Click on File > New Project, which will open a new window.
Select Existing Directory (instead of New Directory, which we chose when creating an RStudio on your computer only).
Click on Browse and make sure to choose the github_projects/pol601 folder (an existing directory as you created it in advance).
Click on Create Project.
Thus, you’ll have created a new folder in github_projects that is both an RStudio project and a GitHub repository.
Job done! You are all set to program great stuff and exploit GitHub’s strenghts in version control and cloud sharing. Here’s where you need a solid Intro to GitHub.