[1] 4
[1] "integer"
[1] "double"
Lecture 26
The Rcpp package integrates R and C++ via R functions and a (header-only) C++ library.
All underlying R types and objects, i.e., everything a
SEXPrepresents internally in R, are matched to corresponding C++ objects. This covers anything from vectors, matrices or lists to environments, functions and more. EachSEXPvariant is automatically mapped to a dedicated C++ class. For example, numeric vectors are represented as instances of theRcpp::NumericVectorclass, environments are represented as instances ofRcpp::Environment, functions are represented asRcpp::Function, etc …
From “Extending R with C++ -”A Brief Introduction to Rcpp”:
R has always provided an application programming interface (API) for extensions. Based on the C language, it uses a number of macros and other low-level constructs to exchange data structures between the R process and any dynamically-loaded component modules authors added to it. With the introduction of the Rcpp package, and its later refinements, this process has become considerably easier yet also more robust. By now, Rcpp has become the most popular extension mechanism for R.
| Type | Size (bits) | Description | Value Range |
|---|---|---|---|
bool |
1* | Logical value: true or false |
true or false |
char |
8 | Character (ASCII or UTF8) | \(\pm 127\) |
short int |
16 | Small integers | \(\pm 3.27 \cdot 10^4\) |
int |
32 | Medium integers | \(\pm 2.14 \cdot 10^9\) |
long int |
64 | Large integers | \(\pm 9.22 \cdot 10^18\) |
float |
32 | Small floating point value | \(\pm 10^{-38}\) to \(\pm 10^{38}\) |
double |
64 | Large floating point value | \(\pm 10^{-308}\) to \(\pm 10^{308}\) |
+ many more
All of the basic types in R are vectors by default, the C++ types just shown are all scalars. So it is necessary to have one more level of abstraction to translate between the two. Rcpp provides for this with several built in classes:
| C++ type (scalar) | Rcpp Class | R type (typeof) |
|---|---|---|
int |
Rcpp::IntegerVector |
integer |
double |
Rcpp::NumericVector |
numeric |
bool |
Rcpp::LogicalVector |
logical |
std::string |
Rcpp::CharacterVector |
character |
char |
Rcpp::RawVector |
raw |
std::complex<double> |
Rcpp::ComplexVector |
complex |
Rcpp::List |
list |
|
Rcpp::Environment |
environment |
|
Rcpp::Function |
function |
|
Rcpp::XPtr |
externalptr |
|
Rcpp::S4 |
S4 |
Rcpp provides some helpful functions for compiling and running simple C++ expressions (evalCpp), functions (cppFunction), or cpp files (sourceCpp). It is even possible to include C++ code in Rmd / qmd documents using the Rcpp engine.
Generated code for function definition:
--------------------------------------------------------
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP get_value(){ return wrap( 2+2 ) ; }
Generated extern "C" functions
--------------------------------------------------------
#include <Rcpp.h>
#ifdef RCPP_USE_GLOBAL_ROSTREAM
Rcpp::Rostream<true>& Rcpp::Rcout = Rcpp::Rcpp_cout_get();
Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
#endif
// get_value
SEXP get_value();
RcppExport SEXP sourceCpp_1_get_value() {
BEGIN_RCPP
Rcpp::RObject rcpp_result_gen;
Rcpp::RNGScope rcpp_rngScope_gen;
rcpp_result_gen = Rcpp::wrap(get_value());
return rcpp_result_gen;
END_RCPP
}
Generated R functions
-------------------------------------------------------
`.sourceCpp_1_DLLInfo` <- dyn.load('/private/var/folders/v7/wrxd7cdj6l5gzr0191__m9lr0000gr/T/Rtmp5XTr30/sourceCpp-aarch64-apple-darwin20-1.1.0/sourcecpp_67b26af491ad/sourceCpp_5.so')
get_value <- Rcpp:::sourceCppFunction(function() {}, FALSE, `.sourceCpp_1_DLLInfo`, 'sourceCpp_1_get_value')
rm(`.sourceCpp_1_DLLInfo`)
Building shared library
--------------------------------------------------------
DIR: /private/var/folders/v7/wrxd7cdj6l5gzr0191__m9lr0000gr/T/Rtmp5XTr30/sourceCpp-aarch64-apple-darwin20-1.1.0/sourcecpp_67b26af491ad
/Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB --preclean -o 'sourceCpp_5.so' 'file67b24ddefdfb.cpp'
[1] 4
sourceCppThis allows for an entire .cpp source file to be compiled and loaded into R. This is generally the preferred way of working with C++ code and is well supported by RStudio (i.e. provides syntax highlights, tab completion, etc.)
Rcpp:: everywhere, include the namespace
> bench::mark(cpp_mean(1, 2), mean(c(1, 2)))
# A tibble: 2 × 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 cpp_mean(1… 164ns 245.99ns 2944640. 0B 0 10000 0 3.4ms
2 mean(c(1, … 779ns 1.23µs 792794. 0B 0 10000 0 12.6ms
# ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>
for loopsIn C & C++ for loops are traditionally constructed as,
Since the adoption of the C++11 standard there is an alternative for loop syntax,
Rcpp also attempts to provide many of the base R functions within the C++ scope, generally these are referred to as Rcpp Sugar. More details can be found here or by looking at the Rcpp source.
#include <Rcpp.h>
//[[Rcpp::plugins(cpp11)]]
//[[Rcpp::export]]
double cpp_imean(Rcpp::IntegerVector x) {
double sum = 0.0;
for(int i=0; i != x.size(); i++) {
sum += x[i];
}
return sum/x.size();
}
//[[Rcpp::export]]
double cpp11_imean(Rcpp::IntegerVector x) {
double sum = 0.0;
for(auto v : x) {
sum += v;
}
return sum/x.size();
}
//[[Rcpp::export]]
double rcpp_imean(Rcpp::IntegerVector x) {
return Rcpp::mean(x);
}[1] "double"
[1] Inf
Warning in cpp_imean(y): NAs introduced by coercion to integer range
[1] -195225781
Warning in cpp11_imean(y): NAs introduced by coercion to integer range
[1] -195225781
Warning in rcpp_imean(y): NAs introduced by coercion to integer range
[1] NA
From Hadley’s Adv-R Rcpp chapter,
# A tibble: 5 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 mean(y) 950.42µs 966.49µs 1013. 0B 0
2 r_mean(y) 6.03ms 6.38ms 156. 19.5KB 0
3 cpp_mean(y) 4.63ms 4.83ms 208. 0B 0
4 cpp11_mean(y) 467.48µs 488.02µs 2009. 0B 0
5 rcpp_mean(y) 1.06ms 1.09ms 901. 0B 0
bench::presslist$norm
[1] 1.9488167 -0.8285304 -1.1097544 -0.5550965 1.2926654 1.2270469
[7] -2.9024726 -0.1560244 -0.7628897 -0.9276166
$beta
[1] 0.9387534068 0.5661279624 0.4984754021 0.4521797032 0.7545363114
[6] 0.0000225238 0.0341585791 0.1516264656 0.1550719612 0.5204840924
[[3]]
[1] 1 2 3 4 5 NA
data.frame#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::DataFrame make_tbl(int n) {
Rcpp::DataFrame df = Rcpp::DataFrame::create(
Rcpp::Named("norm") = Rcpp::rnorm(n, 0, 1),
Rcpp::Named("beta") = Rcpp::rbeta(n, 1, 1)
);
df.attr("class") = Rcpp::CharacterVector::create("tbl_df", "tbl", "data.frame");
return df;
}R has some weird behavior when it comes to printing text from C++, use Rcout rather than std::cout.
NAsRcpp attributes provides a bunch of convenience tools that handle much of the conversion from R SEXP’s to C++ / Rcpp types and back. Some times it is necessary to handle this directly.
Developed by Dr. Conrad Sanderson and Dr Ryan Curtin
Template based linear algebra library with high level syntax (like R or Matlab)
Heavy lifting is (mostly) handled by LAPACK (i.e. benefits from OpenBLAS)
Supports vectors, matrices, and cubes in dense or sparse format
Some builtin expression optimization via template meta-programming
Header only or shared library versions available
Armadillo has 4 basic (dense) templated types:
These types can be specialized using one of the following data types:
For convenience the following typedefs are defined:
arma::vec = arma::colvec = arma::Col<double>
arma::dvec = arma::dcolvec = arma::Col<double>
arma::fvec = arma::fcolvec = arma::Col<float>
arma::cx_vec = arma::cx_colvec = arma::Col<cx_double>
arma::cx_dvec = arma::cx_dcolvec = arma::Col<cx_double>
arma::cx_fvec = arma::cx_fcolvec = arma::Col<cx_float>
arma::uvec = arma::ucolvec = arma::Col<uword>
arma::ivec = arma::icolvec = arma::Col<sword>Written and maintained by Dirk Eddelbuettel, Romain Francois, Doug Bates and Binxiang Ni
Provides the header only version of Armadillo along with additional wrappers
Include the following in your C++ code:
| Attribute | Description |
|---|---|
.n_rows |
number of rows; present in Mat, Col, Row, Cube, field and SpMat |
.n_cols |
number of columns; present in Mat, Col, Row, Cube, field and SpMat |
.n_elem |
total number of elements; present in Mat, Col, Row, Cube, field and SpMat |
.n_slices |
number of slices; present in Cube and field |
For an arma::vec v,
| Call | Description |
|---|---|
v(i) |
Access the i-th element with bounds checking |
v.at(i) |
Access the i-th element without bounds checking |
v[i] |
Access the i-th element without bounds checking |
For an arma::mat m,
| Call | Description |
|---|---|
m(i) |
Access the i-th element, treating object as flat and in column major order |
m(i,j) |
Access the element in i-th row and j-th column with bounds checking |
m.at(i,j) |
Access the element in i-th row and j-th column without bounds checking |
For an arma::cube c,
| Call | Description |
|---|---|
c(i) |
Access the i-th element, treating object as flat and in column major order |
c(i,j,k) |
Access the element in i-th row, j-th column, and k-th slice with bounds checking |
c.at(i,j,k) |
Access the element in i-th row, j-th column, and k-th slice without bounds checking |
fastLm example// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
Rcpp::List fastLm(const arma::mat& X, const arma::colvec& y) {
int n = X.n_rows, k = X.n_cols;
arma::colvec coef = arma::solve(X, y); // fit model y ~ X
arma::colvec res = y - X*coef; // residuals
// std.errors of coefficients
double s2 = std::inner_product(res.begin(), res.end(), res.begin(), 0.0)/(n - k);
arma::colvec std_err = arma::sqrt(s2 * arma::diagvec(arma::pinv(arma::trans(X)*X)));
return Rcpp::List::create(
Rcpp::Named("coefficients") = coef,
Rcpp::Named("stderr") = std_err,
Rcpp::Named("df.residual") = n - k
);
}For an \(n\)-dimension multivate normal distribution with covariance \(\boldsymbol{\Sigma}\) can be written as
\[ \underset{n \times 1}{\boldsymbol{Y}} \sim N(\underset{n \times 1}{\boldsymbol{\mu}}, \; \underset{n \times n}{\boldsymbol{\Sigma}}) \]
where \(\big\{\boldsymbol{\Sigma}\big\}_{ij} = \rho_{ij} \sigma_i \sigma_j\)
\[ \begin{pmatrix} y_1\\ y_2\\ \vdots\\ y_n \end{pmatrix} \sim N\left( \begin{pmatrix} \mu_1\\ \mu_2\\ \vdots\\ \mu_n \end{pmatrix}, \, \begin{pmatrix} \rho_{11}\sigma_1\sigma_1 & \rho_{12}\sigma_1\sigma_2 & \cdots & \rho_{1n}\sigma_1\sigma_n \\ \rho_{21}\sigma_2\sigma_1 & \rho_{22}\sigma_2\sigma_2 & \cdots & \rho_{2n}\sigma_2\sigma_n\\ \vdots & \vdots & \ddots & \vdots \\ \rho_{n1}\sigma_n\sigma_1 & \rho_{n2}\sigma_n\sigma_2 & \cdots & \rho_{nn}\sigma_n\sigma_n \\ \end{pmatrix} \right) \]
To generate draws from an \(n\)-dimensional multivate normal with mean \(\underset{n \times 1}{\boldsymbol{\mu}}\) and covariance matrix \(\underset{n \times n}{\boldsymbol{\Sigma}}\),
Find a matrix \(\underset{n \times n}{\boldsymbol{L}}\) such that \(\boldsymbol{\Sigma} = \boldsymbol{L}\,\boldsymbol{L}^t\)
Draw \(n\) iid unit normals, \(N(0,1)\), as \(\underset{n \times 1}{\boldsymbol{z}}\)
Obtain multivariate normal draws using \[ \underset{n \times 1}{\boldsymbol{y}} = \underset{n \times 1}{\boldsymbol{\mu}} + \underset{n \times n}{\boldsymbol{L}} \, \underset{n \times 1}{\boldsymbol{z}} \]
For the \(n\) dimensional multivate normal given on the last slide, its density is given by
\[ f\big(\boldsymbol{Y} | \boldsymbol{\mu}, \boldsymbol{\Sigma}\big) = (2\pi)^{-n/2} \; \det(\boldsymbol{\Sigma})^{-1/2} \; \exp{\left(-\frac{1}{2} \underset{1 \times n}{(\boldsymbol{Y}-\boldsymbol{\mu})'} \underset{n \times n}{\boldsymbol{\Sigma}^{-1}} \underset{n \times 1}{(\boldsymbol{Y}-\boldsymbol{\mu})}\right)} \]
and its log density is given by
\[ \log f\big(\boldsymbol{Y} | \boldsymbol{\mu}, \boldsymbol{\Sigma}\big) = -\frac{n}{2} \log 2\pi - \frac{1}{2} \log \det(\boldsymbol{\Sigma}) - \frac{1}{2} \underset{1 \times n}{(\boldsymbol{Y}-\boldsymbol{\mu})'} \underset{n \times n}{\boldsymbol{\Sigma}^{-1}} \underset{n \times 1}{(\boldsymbol{Y}-\boldsymbol{\mu})} \]
Sta 523 - Fall 2025