302-specific
302 specific
There are a lot of resources that will benefit you in a class like 302. Here is a nice, long list.
R, RStudio, and Quarto
We’re using an incredibly powerful statistical software package in this class. Unfortunately, it does come with a bit of a learning curve. You won’t be required to learn the nitty-gritty ins and outs of the R scripting language but you will need to know the basics of what R can do. Luckily, we have RStudio to really help out with that. Below are some introductory articles and videos.
-
We’ll be using Posit Cloud in this class. See the Posit Cloud page for instructions on just how to get started with it. If you also want to have RStudio on your local machine, you’ll need to do the following. (Remember, for this class, all you need is https://rstudio.cloud!)
If you wish to also have the Desktop version to use after class is over: Download R and RStudio
Pick the version for your operating system. If you’re using a Chromebook or can’t otherwise get RStudio to work on your machine, you’ll need to use Posit Cloud, anyway. When your class is over, your access to the unrestricted Posit Cloud version will be diminished, so having the desktop vesion is highly encouraged.
-
A quick (obviously) introduction to some of the most basic aspects of R.
The built-in help in RStudio
Inside RStudio, simply type help.start() in the console and voila, the manuals and reference materials appear in the Help module.
-
In case you like cats.
The art of R programming: a tour of statistical software design
Made by the same group that created the Manga guide, this text walks you through the vast majority of the basics to get you well on your way to being an R guru.
-
Print them out and keep them at your desk. The RStudio IDE and RMarkdown cheatsheets are particularly useful.
-
swirl is a fantastic collection of courses (ranging between 10 and 20 minutes each) designed to help you learn R programming while immersed in R!
-
This workshop is designed for those who have no or little prior experience with R Markdown and who want to learn Quarto. Quarto is the next generation of RMarkdown for publishing, including dynamic and static documents and multi-lingual programming language support. With Quarto you can create documents, books, presentations, blogs or other online resources.
-
Comprehensive guide to using Quarto. If you are just starting out, you may want to explore the tutorials to learn the basics.
-
When using Posit Cloud, you don’t need to install this. RStudio now comes with a version built-in, as well. That said, if you need to convert files from one markup format into another, pandoc is your swiss-army knife. (It converts from everything to everything and you’ll never need to touch it; everything happens through RStudio. —Dr S)
LaTeX (variations)
Again, we’re using Posit Cloud, so this is pre-installed. You will be wanting to save your R results as PDFs at some point. To do this you will need to install LaTeX, a mathetmatical typesetting system. It’s required to convert code into symbols. Here’s a good but brief introduction to LaTeX with RStudio. Ideally you would install TinyTeX by Yihui, the brain behind much of what you see in RStudio. Failing this, you can install MiKTeX in Windows and MacTeX on a Mac. That said, you should absolutely start with the TinyTeX R package!
You can include this kind of math inline (NOT to be confused with including R code inline) by using code like this:
$\sigma = 0$
. This will display as \(\sigma = 0\). Note the lack of spaces between the$
and the code!Likewise, you can write something like and have it appear on its own:
$$ y_{ij} = b_{ij} + \beta_{0} + \beta_{1} $$
gives you the following displayed in your document:
\[ y_{ij} = b_{ij} + \beta_{0} + \beta_{1} \]
Big Data
What Is Big Data? A Super Simple Explanation For Everyone
The term “Big Data” may have been around for some time now, but there is still quite a lot of confusion about what it actually means. In truth, the concept is continually evolving and being reconsidered, as it remains the driving force behind many ongoing waves of digital transformation, including artificial intelligence, data science and the Internet of Things. But what exactly is Big Data and how is it changing our world?
-
Big data encompasses a wide range of analytics and data-gathering strategies. Essentially, it’s the ability to capture, store and analyze data on a mass scale to inform business decisions. It follows basic logic: The more you know about a problem or issue, the more reliable the solution.
Data Mining
-
Data Mining is an analytic process designed to explore data (usually large amounts of data - typically business or market related - also known as “big data”) in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. The ultimate goal of data mining is prediction - and predictive data mining is the most common type of data mining and one that has the most direct business applications. The process of data mining consists of three stages: (1) the initial exploration, (2) model building or pattern identification with validation/verification, and (3) deployment (i.e., the application of the model to new data in order to generate predictions).
-
(Abstract) This study was conducted with data mining (DM) techniques to analyze various patterns of online learning behaviors, and to make predictions on learning outcomes . Statistical models and machine learning DM techniques were conducted to analyze 17,934 server logs to investigate 98 undergraduate students’ learning behaviors in an online business course in Taiwan . The study scientifically identified students’ behavioral patterns and preferences in the online learning processes, differentiated active and passive learners, and found important parameters for performance prediction. The results also demonstrated how data mining techniques might be utilized to help improve online teaching and learning with suggestions for online instructors, instructional designers and courseware developers.
How to Catch a Liar on the Internet
Technology makes it easier than ever to play fast and loose with the truth—but easier than ever to get caught.
-
This website presents documents, examples, tutorials and resources on R and data mining.
Text Mining
-
This book serves as an introduction of text mining using the tidytext package and other tidy tools in R. The functions provided by the tidytext package are relatively simple; what is important are the possible applications. Thus, this book provides compelling examples of real text mining problems.
Text Mining(Big Data, Unstructured Data)
The purpose of Text Mining is to process unstructured (textual) information, extract meaningful numeric indices from the text, and, thus, make the information contained in the text accessible to the various data mining (statistical and machine learning) algorithms. Information can be extracted to derive summaries for the words contained in the documents or to compute summaries for the documents based on the words contained in them. Hence, you can analyze words, clusters of words used in documents, etc., or you could analyze documents and determine similarities between them or how they are related to other variables of interest in the data mining project. In the most general terms, text mining will “turn text into numbers” (meaningful indices), which can then be incorporated in other analyses such as predictive data mining projects, the application of unsupervised learning methods (clustering), etc. These methods are described and discussed in great detail in the comprehensive overview work by Manning and Schütze (2002), and for an in-depth treatment of these and related topics as well as the history of this approach to text mining, we highly recommend that source.
Why Text Mining May Be the Next Big Thing
“Big Data” is a hot topic in the business world these days. But there’s a subset of this broad field that has yet to take a turn in the spotlight. It’s called “text mining,” and you’re probably going to be hearing a lot more about it over the coming months and years. Basically, text mining is the process of combing through countless pages of plain-language digitized text to find useful information that’s been hiding in plain sight.
Where to start with text mining
This post is an outline of discussion topics I’m proposing for a workshop at NASSR2012 (a conference of Romanticists). I’m putting it on the blog since some of the links might be useful for a broader audience.
Text mining: what do publishers have against this hi-tech research tool?
Researchers push for end to publishers’ default ban on computer scanning of tens of thousands of papers to find links between genes and diseases
Cluster Analysis
-
Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. A loose definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters
Cluster Analysis Introduction (StatSoft)
The term cluster analysis (first used by Tryon, 1939) encompasses a number of different algorithms and methods for grouping objects of similar kind into respective categories. A general question facing researchers in many areas of inquiry is how to organize observed data into meaningful structures, that is, to develop taxonomies.
Hierarchical Clustering Algorithms
An introduction to hierarchical clustering algorithms.
Analytics and Business Intelligence
-
Every business has a treasure trove of data, from customer and transaction information to manufacturing and shipping statistics. The key is figuring out how to use past data to better the business’ future.
Strategy for building a “good” predictive model
Step-by-step guide.
-
The one and only. (It’s even used on this site. Note that the Analytics site is not accessible while using some VPNs.)