Final Project

The Final Project provides you a means to explore a topic in depth, and to practice the skills you have developed this semester.

The Final Project provides you a means to explore a topic in depth, and to practice the skills you have developed this semester. You will use the scientific method (refer to Chapter 6: Research Design), a data set, and statistical analysis to answer a research question of your choice.

You are expected to acquire a data set (more on this below), write a paper describing your project, and share a revealjs Quarto deck that provides a high-level look at your work. I have provided you with a content template for your paper and presentation, but please be creative and submit something unique. Below is a detailed description of the project and all required criteria.


The final project consists of two parts and should have 4 (2 PDF, 2 QMD) submission:

  1. 1. Paper (1500-2000 words). Must include (5-7 graphs). PDF and QMD
  2. 2. Presentation (10-12 slides). PDF and QMD

Paper

A 1500-2000 word research paper (excluding abstract, figures, tables, and references). Use APA formatting, or a style that is appropriate for your academic/professional field. Submit this paper in the appropriate D2L assignment dropbox.

Paper content template

The template also has some commented instructions in the different sections. Consider those and the following when writing. Beyond using APA formatting for your paper, citations, tables, and figures, your paper should have the following content sections:

Introduction

Provide information about your paper, what the reader can expect, and the construction and format of what’s to come. Give the reader whatever they need to understand the context of your project.

Review of the literature

Give more detailed information about your topic of interest and examples of previous research that has been conducted on your topic. What impact did this research have in helping further understand your topic AND/OR what further questions about your topic did these prior research findings create? Be sure to properly cite the literature you found.

Research problem

Describe the area that needs to be further explored related to your topic of interest. We would generally call that the “gap in the literature” that you’re filling with your work. State the research question that your statistical test can help answer. (This doesn’t need to be earth-shattering or paradigm-shifting. It’s a class exercise, after all.)

Design

Give information about where your data set came from and explain what inferential statistical test you are going to use. Describe the dataset structure and any relevant variables and scales. Leave no assumptions.

Results

Describe the relevant descriptive statistics, the outcome of your inferential statistical test, and be sure to include the variables in your description. Don’t rely on tables and figures to do the explaining for you.

Discussion

What are the implications of your results? Did they surprise you? Do they support or contradict prior research findings related to your topic?

References

Your list of cited references. Your document does this automatically when your citations are in the .bib file and you’ve cited them appropriately. Using a citation manager like those listed on the resources page is highly encouraged as it will help you (a) keep your citations organized and accurate, and (b) give you a BibTeX entry automatically.

Statistics

Your project must provide at the very least descriptive statistics, correlations, and t-tests. You may substitute a regression model (not just a scatterplot with a trendline!) or ANOVA for t-tests if you prefer and if the data is appropriate.

Slide Deck

Your slide deck should be created with Quarto in revealjs format. You have a template for this in your Final Posit Cloud project. This presentation should be submitted to the Final Project D2L dropbox along side your research paper and slide deck.


Where to start?

We’ll be using the TidyTuesday datasets for this project. TidyTuesday is a “weekly data project in R from the R4DataScience learning community” and provides participants with clean, interesting data for exploratory analytic practice, specifically with “an emphasis… on understanding how to summarize and arrange data to make meaningful charts with ggplot2, tidyr, dplyr, and other tools in the tidyverse ecosystem.

Before sitting down and throwing tests and code at the project, consider the following:

  • What is my research project? What do I want to learn? What do I find interesting?
  • Where is my data coming from?
  • How will this data be analyzed? What’s appropriate?
  • How will I interpret my findings?
  • What are the implications and findings of my data?
  • What new insights or models of understanding does the data suggest?
  • How do I communicate what I have learned?

The process for writing the paper may seem a bit backward, since you’re beginning with the data you’re analyzing and finding supporting/foundational research to inform on the topic afterwards. So, the process you’ll want to follow looks something like this:

  1. Explore the TidyTuesday datasets and find one that is interesting to you. Examples are listed below.
  2. Begin researching that topic outside the data using Google Scholar or Academic Search Ultimate. You should be looking for scholarly journal articles that relate to your topic. Remember to begin this process with the library’s website to ensure you’re getting access to the content as a student. Do not rely on popular websites, though it is acceptable to use one or two if, for example, pointing out a newsworthy event.
  3. Report the background of your chosen data topic as a literature review.
  4. Do exploratory data analysis on your data, always keeping in mind what the literature has told you about the field.
  5. Share your results and any discoveries you’ve made. If the data supports the literature you found, say so. If you find that the literature you located and the data don’t seem to match up, state that!

TidyTuesday details

The reason for choosing TidyTuesday data for this project is its relative ease to work with. Datasets will typically come ready to use, either already “clean” or providing you with a cleaning script. This will save you considerable amounts of time, as data cleaning is often the most arduous part of a project like this. Pick your data wisely as you don’t want to make it harder on yourself than it needs to be!

This also allows you to use a standard coding syntax for your analysis, so everyone is speaking the same language, as it were.

Using TidyTuesday also allows you to use the tidytuesday package to load data, rather than relying on downloading CSV or Excel files. Each project readme file will have the syntax for loading the data, such as: tuesdata <- tidytuesdayR::tt_load('2021-02-09') (keep in mind this requires you to install the package first: install.packages("tidytuesdayR")). When you do need to load a CSV, the data loading syntax is also provided in the readme.

Don’t get locked out!

IMPORTANT: Remember your dataload.R script! You don’t want to load data from GitHub every time you knit your document! Your template is designed to help you load data from GitHub once and write it to a local CSV file. Run dataload.R once after updating it.

Also, make sure to cite the TidyTuesday project properly. Using the Quarto Insert citation function, you can easily add an R package to your list of references. If you want to try editing your bibliography manually, you can include the following in your .bib file and cite it in-text as [@tidytuesday]:

  @misc{tidytuesday, 
    title = {Tidy Tuesday: A weekly data project aimed at the R ecosystem}, 
    author = {Mock, Thomas}, 
    url = {https://github.com/rfordatascience/tidytuesday}, 
    year = {2021} 
  }

Example Topics

Here are examples of areas of interest to guide your projects. You are encouraged to pick topics that have some societal meaning and will allow you to, at least within the scope of this project, provide suggestions and solutions to problems.

Useful R packages

Here are some useful R packages and instructions to use in the final paper. Given what you are required to include (descriptives and inferential statistics), you may want to look at the following:


Rubric

Gateway requirements: after completing the grading declaration for the project, the submitted document meets length requirement (the equivalent of 6-8 content pages excluding abstract, figures, tables, and references), uses scholarly writing standards like properly formatted in-text citations and references, and addresses the criteria in the following rubric.

Criteria Does not meet Approaches Exceeds
Knowledge: Demonstrates full understanding of all of the topics and accurately applies concepts learned in the course. More than one original conclusion made or idea suggested. 0 10 20
Depth of Analysis: In-depth analysis and elaboration in all sections of the paper. Critical thinking is evident throughout. Supporting material is used accurately. Conflicting material or opinions are addressed satisfactorily. Conclusions are supported in the body of the paper. 0 10 20
Research: Study Design, Methods, Statistics (descriptives, correlations, & t-tests), and Results are all present and correctly explained. Required statistics and tests are present. 0 10 20
Organization, Focus, Clarity: Writing is logically organized and tightly focused. Main ideas are clearly stated and supported with details. Information is on-topic and flows from one logically- presented, clearly explained topic to the next. 0 10 20
Slide Deck: Deck is presented in a way that demonstrates understanding approaching mastery of the intersection between statistical tests and Quarto output to provide meaningful dissemination of ideas and results. 0 10 20