Final Project
The Final Project provides you a means to explore a topic in depth, and to practice the skills you have developed this semester. You will use the scientific method (refer to Chapter 6: Research Design), a data set, and statistical analysis to answer a research question of your choice.
You are expected to acquire a data set (more on this below), write a paper describing your project, and share a presentation slide deck that provides a high-level look at your work. I have provided you with a content template for your paper, but please be creative and submit something unique. Below is a detailed description of the project and all required criteria.
The final project consists of two parts and should have 3 (2 PDF, 1 QMD) submission:
- Paper (1500-2000 words). Must include (5-7 graphs). PDF and QMD
- Power Point Presentation (10-12 slides). PDF
Paper
A 1500-2000 word research paper (excluding abstract, figures, tables, and references). Use APA formatting, or a style that is appropriate for your academic/professional field. Submit this paper in the appropriate D2L assignment dropbox.
Paper content template
The template also has some commented instructions in the different sections. Consider those and the following when writing. Beyond using APA formatting for your paper, citations, tables, and figures, your paper should have the following content sections:
Introduction
Provide information about your paper, what the reader can expect, and the construction and format of what’s to come. Give the reader whatever they need to understand the context of your project.
Research problem
Describe the area that needs to be further explored related to your topic of interest. Describe the gap that you’re filling with your work. State the research question that your statistical test can help answer. (This doesn’t need to be earth-shattering or paradigm-shifting. It’s a class exercise, after all.)
Hyphothesis
Clearly state both the null hypothesis and the alternative hypothesis. These two hypotheses are the foundation for determining whether there is sufficient evidence in the sample data to draw conclusions about the population
Design
Give information about where your data set came from and explain what inferential statistical test you are going to use. Describe the dataset structure and any relevant variables and scales. Leave no assumptions.
Review of the literature
Give more detailed information about your topic of interest and examples of previous research that has been conducted on your topic. What impact did this research have in helping further understand your topic AND/OR what further questions about your topic did these prior research findings create? Be sure to properly cite the literature you found.
Statistics
Your project must provide at the very least descriptive statistics, correlations, and t-tests. You may substitute a regression model (not just a scatterplot with a trendline!) or ANOVA for t-tests if you prefer and if the data is appropriate.
Results
Describe the relevant descriptive statistics, the outcome of your inferential statistical test, and be sure to include the variables in your description. Don’t rely on tables and figures to do the explaining for you.
Discussion
What are the implications of your results? Did they surprise you? Do they support or contradict prior research findings related to your topic?
Conclusion
In the conclusion, summarize the findings from the analysis. Reflect on the results and discuss whether the evidence supports or refutes the initial hypotheses. Specifically, conclude whether your null hypothesis was rejected or accepted based on the statistical outcomes. Clearly state if the data provided enough evidence to prove the alternative hypothesis true or if the null hypothesis remains valid. This reflection should offer insights into the overall significance of your findings and their implications in the context of the research or study.
References
Your list of cited references. Your document does this automatically when your citations are in the .bib
file and you’ve cited them appropriately. Using a citation manager like those listed on the resources page is highly encouraged as it will help you (a) keep your citations organized and accurate, and (b) give you a BibTeX entry automatically.
Slide Deck
Your slide deck should be created with PowerPoint or Quarto in revealjs
format. You have a template for revealjs
if you decide to use it in your Final Posit Cloud project. This presentation should be submitted to the Final Project D2L dropbox along side your research paper and slide deck.
Where to start?
We’ll be using the TidyTuesday datasets for this project. TidyTuesday is a “weekly data project in R from the R4DataScience learning community” and provides participants with clean, interesting data for exploratory analytic practice, specifically with “an emphasis… on understanding how to summarize and arrange data to make meaningful charts with ggplot2
, tidyr
, dplyr
, and other tools in the tidyverse
ecosystem.
Before sitting down and throwing tests and code at the project, consider the following:
- What is my research project? What do I want to learn? What do I find interesting?
- Where is my data coming from?
- How will this data be analyzed? What’s appropriate?
- How will I interpret my findings?
- What are the implications and findings of my data?
- What new insights or models of understanding does the data suggest?
- How do I communicate what I have learned?
The process for writing the paper may seem a bit backward, since you’re beginning with the data you’re analyzing and finding supporting/foundational research to inform on the topic afterwards. So, the process you’ll want to follow looks something like this:
- Explore the TidyTuesday datasets and find one that is interesting to you. Examples are listed below.
- Begin researching that topic outside the data using Google Scholar or Academic Search Ultimate. You should be looking for scholarly journal articles that relate to your topic. Remember to begin this process with the library’s website to ensure you’re getting access to the content as a student. Do not rely on popular websites, though it is acceptable to use one or two if, for example, pointing out a newsworthy event.
- Report the background of your chosen data topic as a literature review.
- Do exploratory data analysis on your data, always keeping in mind what the literature has told you about the field.
- Share your results and any discoveries you’ve made. If the data supports the literature you found, say so. If you find that the literature you located and the data don’t seem to match up, state that!
TidyTuesday details
The reason for choosing TidyTuesday data for this project is its relative ease to work with. Datasets will typically come ready to use, either already “clean” or providing you with a cleaning script. This will save you considerable amounts of time, as data cleaning is often the most arduous part of a project like this. Pick your data wisely as you don’t want to make it harder on yourself than it needs to be!
This also allows you to use a standard coding syntax for your analysis, so everyone is speaking the same language, as it were.
Using TidyTuesday also allows you to use the tidytuesday
package to load data, rather than relying on downloading CSV or Excel files. Each project readme
file will have the syntax for loading the data, such as: tuesdata <- tidytuesdayR::tt_load('2021-02-09')
(keep in mind this requires you to install the package first: install.packages("tidytuesdayR")
). When you do need to load a CSV, the data loading syntax is also provided in the readme.
IMPORTANT: Remember your dataload.R
script! You don’t want to load data from GitHub every time you knit your document! Your template is designed to help you load data from GitHub once and write it to a local CSV file. Run dataload.R
once after updating it.
Also, make sure to cite the TidyTuesday project properly. Using the Quarto Insert citation
function, you can easily add an R package to your list of references. If you want to try editing your bibliography manually, you can include the following in your .bib
file and cite it in-text as [@tidytuesday]
:
@misc{tidytuesday,
title = {Tidy Tuesday: A weekly data project aimed at the R ecosystem},
author = {Mock, Thomas},
url = {https://github.com/rfordatascience/tidytuesday},
year = {2021}
}
Example Topics
Here are examples of areas of interest to guide your projects. You are encouraged to pick topics that have some societal meaning and will allow you to, at least within the scope of this project, provide suggestions and solutions to problems.
- Wealth inequality in America
- HSBC college enrollment
- Food consumption and CO2 emissions (food carbon footprint)
- Measles vaccine rates from 46,412 schools across 32 US states
- College tuition, diversity, and pay
- American slavery and Juneteenth
- Federal research and development spending by agency
- Women in the workforce
- The Standford Open Policing project
Useful R packages
Here are some useful R packages and instructions to use in the final paper. Given what you are required to include (descriptives and inferential statistics), you may want to look at the following:
- Create Awesome LaTeX Table with knitr::kable and kableExtra
- Frequencies, cross-tabs, and descriptives: summarytools
- Plotting smoothed conditional means — geom_smooth • ggplot2
- Correlation matrices and visualizations: Using corrr • corrr
- Get a nice table of a correlation test: Tidy correlation tests in R
- Simple linear regression: Statology: How to Perform Simple Linear Regression in R
- For inferential statistics (t tests and ANOVA): infer
Rubric
Gateway requirements: after completing the grading declaration for the project, the submitted document meets length requirement (the equivalent of 6-8 content pages excluding abstract, figures, tables, and references), uses scholarly writing standards like properly formatted in-text citations and references, and addresses the criteria in the following rubric.
Criteria | Does not meet | Approaches | Exceeds | |
---|---|---|---|---|
Knowledge: Demonstrates full understanding of all of the topics and accurately applies concepts learned in the course. More than one original conclusion made or idea suggested. | 0 | 10 | 20 | |
Depth of Analysis: In-depth analysis and elaboration in all sections of the paper. Critical thinking is evident throughout. Supporting material is used accurately. Conflicting material or opinions are addressed satisfactorily. Conclusions are supported in the body of the paper. | 0 | 10 | 20 | |
Research: Study Design, Methods, Statistics (descriptives, correlations, & t-tests), and Results are all present and correctly explained. Required statistics and tests are present. | 0 | 10 | 20 | |
Organization, Focus, Clarity: Writing is logically organized and tightly focused. Main ideas are clearly stated and supported with details. Information is on-topic and flows from one logically- presented, clearly explained topic to the next. | 0 | 10 | 20 | |
Slide Deck: Deck is presented in a way that demonstrates understanding approaching mastery of the intersection between statistical tests and Quarto output to provide meaningful dissemination of ideas and results. | 0 | 10 | 20 |