Being a first-year PhD student is not the easiest. You’ll have to learn lots of new stuff in no time, and it is OK to feel anxious about it.

To me, it is sensible to invest serious time in honing the programming skill set. A decent fluency in R or Python can make your doctoral journey way more pleasant.

But programming is not the only flank you have to protect in the years to come. There are many other technicalities that you must pick up as quick as possible.

You’ll hear about \(\LaTeX\). Many of your future co-authors write their manuscripts in \(\LaTeX\), so you will suffer the misery of rendering \(\LaTeX\) documents with no previous knowledge of it…

You’ll hear about GitHub. Version control tools are a must for collaborative work, so you’ll soon be grappling with all these arcane GitHub commands…

You’ll hear about the Unix shell. Advanced tasks like interacting with UA’s High Performance Computing (HPC) demand some computational dexterity, so you’ll need to master how humans communicate with computers via the command line…

Where to start?

Open Science

Every skill mentioned above pertains to Open Science, a hugely relevant paradigm in scientific research nowadays. The sooner you learn about Open Science, the better.

I pretty much like the definition coined by Jeffrey Gillan (UA’s Data Science Institute):

“A series of reforms that interrogate every step in the research life cycle to make it more efficient, powerful and accountable in our emerging digital society.”

Open Science interrogates the way we do science. UNESCO’s definition is wordier but points in the same direction:

“Open Science is defined as an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community.”

In a nutshell, we—scientists—do science, but we do a very particular type of science: reproducible science.

Reproducibility means that our data and code are accessible, transparent, and executable. Importantly, reproducibility allows others to fully replicate our work’s results.

So we do not merely publish our research findings. Rather, we share all the research materials and explicitly say how we utilized them.

Of course, Open Science does not happen by accident. We must take purposeful actions across numerous areas to comply with Open Science standards.

We must excel at documentation. Our code must be well-annotated, and each step leading to our conclusions must be documented. We must also provide guidelines for code execution. Open-source applications like OSF are great for documentation. In the context of GitHub repositories, a README file should be included with a detailed overview of the project workflow.

We must excel at project management. Our code must be contained in a GitHub repository with appropriate version control functionalities, and we must also share every necessary input to execute the code. Of course, project management includes data management. UA provides access to the DMPTool to prepare data management plans that observe funding agencies’ requirements.

We must excel at data infrastructure. Our data and other research materials must be made accessible through digital storage platforms; any third person should be able to understand how raw data inputs turn into research outputs. The conditions for gathering, sharing, and archiving data must be clearly stated. Moreover, the data must be Findable, Accessible, Interoperable, and Reusable (see: FAIR Principles). UA’s institutional repository is known as ReDATA.

We must excel at communication. Our publications must follow the best practices and standards in scientific writing, which involves using the correct software. Overleaf is an online \(\LaTeX\) editor for collaborative and real-time writing. Quarto allows you to combine formatted text, code chunks, code output, \(\LaTeX\) conventions, and references. Read my blog post on Quarto documents.

UA’s excellent data literacy ecosystem

As I mentioned earlier, you’ll have so much ground to cover in the coming years! It is easy to get overwhelmed.

The good news is that UA’s ecosystem supplies so many opportunities to hone your computational skills while engaging with folks doing very cool stuff in “this R1 institution that, quite literally, puts things into space” (as Professor Peter A. Jansen said in one lecture).

Here, my top recommendations to make the most of the UA’s Data Literacy and Open Science landscape:

Take advantage of University Libraries’ Data Cooperative

University Libraries’ Data Cooperative organizes weekly workshops during the semester. These lovely workshops cover everything from introductory programming to machine learning methods, data management, data visualization, etc. Plus, the instructor of these R workshops, Jeff Olivier, is such a great guy.

The Data Cooperative holds open office hours, where you can bring that ugly line of code that is not running, as well as workshops on supported platforms such as the DMPTool, OSF, and ReDATA, which I mentioned previously in this post. Upcoming events, here.

Attend Data Science Institute’s events

Subscribe to the Data Science Institute’s email list. It is the best channel for information about data science events held across the campus. Have a look at the calendar!

The Data Science Institute also has an official newsletter, The Data Drip, and leads great outreach projects such as Data Science Ambassadors, Data Science Fellows, or Roots for Resilience.

Get in touch with Data Science Ambassadors

The College of Social & Behavioral Sciences likely has a Data Science Ambassador who can give you a hand with your programming questions. You can make an appointment at any time.

Once you have gained some experience, nominate yourself to serve as a Data Science Ambassador.

Have a look at Kelsey Gonzalez’s R programming course

Sociologist Kelsey Gonzalez earned her PhD from UA. A former Data Science Ambassador, Kelsey is now a Lead Data Scientist and AI Product Manager in the Chief Analytics Office at IBM, demonstrating that social scientists can land excellent jobs in the industry.

Furthermore, Kelsey developed an excellent course: R programming with Tidyverse. It is free and available on YouTube. You definitely want to have a look!

Exploit UA’s education partnerships

The Data Science Ambassadors program gives you access to the Carpentries Instructor Certification. The Roots for Resilience program provides the Foundational Open Science Skills (FOSS) workshop series.

These education partnerships are a massive asset. Take advantage of it.

Master the High Performance Computing

UA has enormous computational power. We, UA grad students, can run jobs in the High Performance Computing. This infrastructure is nothing to sneeze at. Use it to leverage your work.

This very good collection of videos will introduce you to UA’s supercomputers.

Am I missing something? Let me know.