Lecture 02
Conditional execution of code blocks is achieved via if statements.
if is not vectorizedThis behavior (throwing an error) was added in R 4.2, previous versions will only emit a warnings (while using the first value in the condition vector).
There are a couple of helpful functions for collapsing logical vectors: any, all
else if and elseif blocks return a valueR’s if conditional statements return a value (invisibly), the two following implementations are equivalent.
Take a look at the following code below on the left, without running it in R what do you expect the outcome will be for each call on the right?
03:00
NAs can be particularly problematic for control flow,
NATo explicitly test if a value is missing it is necessary to use is.na (often along with any or all).
Note is.na() is testing for a property of the values, not a property of the vector - so it is vectorized.
stop and stopifnotOften we want to validate user input, function arguments, or other assumptions in our code - if our assumptions are not met then we often want to report (throw) an error and stop execution.
Do stuff:
R has a variety of different output “methods” that can be used,
Printed output - cat(), print()
Diagnostic messages - message()
Warnings - warning()
Errors - stop(), stopifnot()
Each of these provides text output while also providing signals which can be interacted with programmatically (e.g. catching errors or treating warnings as errors).
Functions are abstractions in programming languages that allow us to modularize our code into small “self contained” units.
In general the goals of writing functions is to,
Simplify a complex process or task into smaller sub-steps
Allow for the reuse of code without duplication
Improve the readability of your code
Improve the maintainability of your code
Functions are 1st order objects in R and have a mode of function. They are assigned names like other objects using = or <-.
In R functions are defined by two components:
the arguments (formals)
the code / expression (body).
As with most other languages, functions are most often used to process inputs and return a value as output. There are two approaches to returning values from functions in R - explicit and implicit returns.
Many functions in R make use of an invisible return value
If we want a function to return more than one value we can group results using atomic vectors or lists.
When defining a function we explicitly define names for the arguments, which become variables within the scope of the function.
When calling a function we can use these names to pass arguments in an alternative order.
It is also possible to give function arguments default values, so that they don’t need to be provided every time the function is called.
R has generous scoping rules, if it can’t find a variable in the current scope (e.g. a function’s body) it will look for it in the next higher scope, and so on until an object with that name is found or it runs out of environments.
Additionally, variables defined within a scope only persist for the duration of that scope, and do not overwrite variables at higher scope(s).
Another interesting / unique feature of R is that function arguments are lazily evaluated, which means they are only evaluated when needed.
The previous example is not particularly useful, a more common use for this lazy evaluation is that this enables us define arguments as expressions of other arguments.
In R, operators are actually a special type of function - using backticks around the operator we can write them as functions.
Prefixing any function name with a ? will open the related help file for that function.
For functions not in the base package, you can generally see their implementation by entering the function name without parentheses (or using the body function).
function (formula, data, subset, weights, na.action, method = "qr", 
    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
    contrasts = NULL, offset, ...) 
{
    ret.x <- x
    ret.y <- y
    cl <- match.call()
    mf <- match.call(expand.dots = FALSE)
    m <- match(c("formula", "data", "subset", "weights", "na.action", 
        "offset"), names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- quote(stats::model.frame)
    mf <- eval(mf, parent.frame())
    if (method == "model.frame") 
        return(mf)
    else if (method != "qr") 
        warning(gettextf("method = '%s' is not supported. Using 'qr'", 
            method), domain = NA)
    mt <- attr(mf, "terms")
    y <- model.response(mf, "numeric")
    w <- as.vector(model.weights(mf))
    if (!is.null(w) && !is.numeric(w)) 
        stop("'weights' must be a numeric vector")
    offset <- model.offset(mf)
    mlm <- is.matrix(y)
    ny <- if (mlm) 
        nrow(y)
    else length(y)
    if (!is.null(offset)) {
        if (!mlm) 
            offset <- as.vector(offset)
        if (NROW(offset) != ny) 
            stop(gettextf("number of offsets is %d, should equal %d (number of observations)", 
                NROW(offset), ny), domain = NA)
    }
    if (is.empty.model(mt)) {
        x <- NULL
        z <- list(coefficients = if (mlm) matrix(NA_real_, 0, 
            ncol(y)) else numeric(), residuals = y, fitted.values = 0 * 
            y, weights = w, rank = 0L, df.residual = if (!is.null(w)) sum(w != 
            0) else ny)
        if (!is.null(offset)) {
            z$fitted.values <- offset
            z$residuals <- y - offset
        }
    }
    else {
        x <- model.matrix(mt, mf, contrasts)
        z <- if (is.null(w)) 
            lm.fit(x, y, offset = offset, singular.ok = singular.ok, 
                ...)
        else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
            ...)
    }
    class(z) <- c(if (mlm) "mlm", "lm")
    z$na.action <- attr(mf, "na.action")
    z$offset <- offset
    z$contrasts <- attr(x, "contrasts")
    z$xlevels <- .getXlevels(mt, mf)
    z$call <- cl
    z$terms <- mt
    if (model) 
        z$model <- mf
    if (ret.x) 
        z$x <- x
    if (ret.y) 
        z$y <- y
    if (!qr) 
        z$qr <- NULL
    z
}
<bytecode: 0x10a8d7350>
<environment: namespace:stats>We can define our own infix functions like + or *, the only requirement is that the function name must start and end with a %.
We can also define functions that allow for ‘inplace’ modification like attr or names.
These are the most common type of loop in R - given a vector it iterates through the elements and evaluate the code expression for each value.
while loopsThis loop repeats evaluation of the code expression until the condition is not met (i.e. evaluates to FALSE)
repeat loopsEquivalent to a while(TRUE){} loop, it repeats until a break statement
break and nextThese are special actions that only work inside of a loop
break - ends the current loopnext - ends the current iterationOften we want to use a loop across the indexes of an object and not the elements themselves. There are several useful functions to help you do this:
:, length, seq, seq_along, seq_len, etc.
1:length(x)A common loop construction you’ll see in a lot of R code is using 1:length(x) to generate a vector of index values for the vector x.
To the right is a vector containing all prime numbers between 2 and 100 and a separate vector x containing some values we would like to check for primality.
Write the R code necessary to print only the values of x that are not prime (without using subsetting or the %in% operator).
Your code will need to use nested loops to iterate through the vector of primes and x.
03:00
Sta 523 - Fall 2025