[1] "logical"[1] "logical"Lecture 01
Attendance is expected
Opportunity to work on course assignments with TA support
Labs will begin this week
This course is graded 100% on your coursework (there are no exams).
We will be assessing you based on the following assignments,
| Assignment | Type | Value | n | Assigned | 
|---|---|---|---|---|
| Homeworks | Team | 30% | 5/6 | ~ Every other week | 
| Midterms | Individual | 40% | 2 | ~ Week 6 and 14 | 
| Project | Team | 10% | 1 | ~ Week 10 | 
| Quizzes | Individual | 20% | ~15 | 
Only work that is clearly assigned as team work should be completed collaboratively (Homeworks + Project).
Individual assignments (Midterms) must be completed individually, you may not directly share or discuss answers / code with anyone other than the myself and the TAs.
On Homeworks you should not directly share answers / code with other teams, however you are welcome to discuss the problems in general and ask for advice.
We are aware that a huge volume of code is available on the web, and many tasks may have solutions posted.
Unless explicitly stated otherwise, this course’s policy is that you may make use of any online resources (e.g. Google, StackOverflow, etc.) but you must explicitly cite where you obtained any code you directly use or use as inspiration in your solution(s).
Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of source.
The same applies to the use of LLM like ChatGPT, Claude, or GitHub Copilot - you are welcome to make use of these tools as the basis for your solutions but you must cite the tool when using it for significant amounts of code generation.
AI tools are not a replacement for understanding the material, they can be a tool to help you understand the material.
Reaading code and writing code are both skills that take time and practice to develop - both are still essential skills.
Nature of the tools is changing rapidly - Autocomplete vs ChatBots vs Agentic
To uphold the Duke Community Standard:
- I will not lie, cheat, or steal in my academic endeavors;
- I will conduct myself honorably in all my endeavors; and
- I will act if the Standard is compromised.
To reduce friction, the preferred method is to use the department’s RStudio server(s).
To access RStudio/Posit Workbench:
If you cannot access RStudio via the DSS servers:
Make sure you are on authenticated Duke network (e.g. DukeBlue or VPN if off campus)
Make sure you are not using a custom DNS server
1.1.1.1 or 8.8.8.8Use a Docker container from Duke OIT
Reserve a Container and find a container for Sta 313If working locally you should make sure that your environment meets the following requirements:
latest R (4.5.1)
latest RStudio (2025.05.1+513)
working git installation
ability to create ssh keys (for GitHub authentication)
All R packages updated to their latest version from CRAN
Support policy for local installs - we will try to help you troubleshoot if we can but reserve the right to tell you to use the dept server.
We will be using an organization specifically to this course github.com/sta523-sp25
All assignments will be distributed and collected via GitHub
All of your work and your membership (enrollment) in the organization is private
We will be distributing a survey this weekend to collection your GitHub account names
All course related repositories will be created for you
Create a GitHub account if you don’t have one
Complete the course survey
Make sure you can login in to the Department’s RStudio server https://rstudio.stat.duke.edu
Setup ssh key authentication with GitHub, see https://github.com/DukeStatSci/github_auth_guide
The fundamental building block of data in R are vectors (collections of related values, objects, etc).
R has two types of vectors (that everything is built on):
atomic vectors (vectors)
true/false values, all numbers, or all character strings).generic vectors (lists)
R has six atomic vector types, we can check the type of any object in R using the typeof() function. mode() is a higher level abstraction used to group similar types together.
| typeof() | mode() | 
|---|---|
| logical | logical | 
| double | numeric | 
| integer | numeric | 
| character | character | 
| complex | complex | 
| raw | raw | 
logical - boolean values (TRUE and FALSE)character - text stringsEither single or double quotes are fine, the opening and closing quote must match.
double - floating point values (these are the default numerical type)
Atomic vectors can be constructed using the combine c() function.
typeof(x) - returns a character vector (length 1) of the type of object x.
mode(x) - returns a character vector (length 1) of the mode of object x.
is.logical(x) - returns TRUE if x has type logical.is.character(x) - returns TRUE if x has type character.is.double(x) - returns TRUE if x has type double.is.integer(x) - returns TRUE if x has type integer.is.numeric(x) - returns TRUE if x has mode numeric.is.atomic(x) - returns TRUE if x is an atomic vector.is.list(x) - returns TRUE if x is a list (generic vector).is.vector(x) - returns TRUE if x is either an atomic or generic vector.R is a dynamically typed language – it will automatically convert between most types without raising warnings or errors. Keep in mind that atomic vectors must always contain values of the same type.
Builtin operators and functions (e.g. +, &, log(), etc.) will generally attempt to coerce values to an appropriate type for the given operation (numeric for math, logical for logical, etc.)
Most of the is functions we just saw have an as variant which can be used for explicit coercion.
R uses NA to represent missing values in its data structures, what may not be obvious is that there are different NAs for the different atomic types.
Because NAs represent missing values it makes sense that most calculations using them will also be missing.
A useful mental model for NAs is to consider them as a unknown value that could take any of the possible values for a type.
For numbers or characters this isn’t very helpful, but for a logical value we know that the value must either be TRUE or FALSE and we can use that when deciding what value to return.
These are defined as part of the IEEE floating point standard (not unique to R)
NaN - Not a number
Inf - Positive infinity
-Inf - Negative infinity
Inf and NaNNaN and Inf there are convenience functions for testing for these types of values
First remember that Inf, -Inf, and NaN are doubles, however their coercion behavior is not the same as other doubles
What is the type of the following vectors? Explain why they have that type.
Considering only the four (common) data types, what is R’s implicit type conversion hierarchy (from highest priority to lowest priority)?
05:00
| Operator | Operation | Vectorized? | 
|---|---|---|
| x | y | or | Yes | 
| x & y | and | Yes | 
| !x | not | Yes | 
| x || y | or | No | 
| x && y | and | No | 
| xor(x, y) | exclusive or | Yes | 
& and | are almost always going to be the right choice, the only time we use && or || is when you need to take advantage of short-circuit evaluation.
Almost all of the basic mathematical operations (and many other functions) in R are vectorized.
If the lengths of the vector do not match, then the shorter vector has its values recycled to match the length of the longer vector.
The same length coercion rules apply for most basic mathematical operators,
| Operator | Comparison | Vectorized? | 
|---|---|---|
| x < y | less than | Yes | 
| x > y | greater than | Yes | 
| x <= y | less than or equal to | Yes | 
| x >= y | greater than or equal to | Yes | 
| x != y | not equal to | Yes | 
| x == y | equal to | Yes | 
| x %in% y | contains | Yes (over x)\(^*\) | 
> & < with charactersWhile maybe somewhat unexpected, these comparison operators can be used character values.
Sta 523 - Fall 2025