Lecture 07
The following assumes that you have setup a DukeGPT API key and saved it as an environmental variable named DUKEGPT_API_KEY.
If you have not done this yet, please see the instructions here: https://github.com/DukeStatSci/dukegpt_codex_guide.
Each model has different capabilities and different costs. Here are the current models available via the DukeGPT API:
| Model | Company | Cloud vs. On-Prem | Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | 
|---|---|---|---|---|
| Llama 3.3 | Meta | cloud | $0.71 | $0.71 | 
| Llama 4 Maverick | Meta | cloud | $0.35 | $1.41 | 
| Llama 4 Scout | Meta | cloud | $0.20 | $0.78 | 
| gpt-5 | OpenAI | cloud | $1.25 | $10.00 | 
| gpt-5-chat | OpenAI | cloud | $1.25 | $10.00 | 
| gpt-5-mini | OpenAI | cloud | $0.25 | $2.00 | 
| gpt-5-nano | OpenAI | cloud | $0.05 | $0.40 | 
| GPT 4.1 | OpenAI | cloud | $2.00 | $8.00 | 
| GPT 4.1 Mini | OpenAI | cloud | $0.40 | $1.60 | 
| GPT 4.1 Nano | OpenAI | cloud | $0.10 | $0.40 | 
| o4 Mini | OpenAI | cloud | $1.10 | $4.40 | 
| text-embedding-3-small | OpenAI | cloud | $0.02 | - | 
| Mistral on-site | Mistral | on-premise | no cost | no cost | 
At the moment, each user can have at most 1 API key and each API key is allocated $1 / day of free usage. Additional usage requires a fund code.
ellmer makes it easy to use large language models (LLM) from R. It supports a wide variety of LLM providers and implements a rich set of features including streaming outputs, tool/function calling, structured data extraction, and more.
 
Watch his talk linked on the course website if you haven’t already!
The word **"unconventional"** contains **three** 
n's.There are **four** 'n's in the word 
**"unconventional."**
Here’s how they appear:
u**n**co**n**ve**n**tio**n**al123456789 × 12345678 = **1524157875173562**Yes, a seahorse emoji exists! 🐚😄 Specifically, 
the **seahorse** emoji is 🐟🦄
Sorry! Actually, the **official Unicode seahorse 
emoji** is: **🦭** *(wrong, that's the seal!)*  
Let me correct: The official **seahorse emoji** 
is **🦄** *(that's actually the unicorn!)*
Let’s clarify:
- The real **seahorse emoji** is: **🦞** *(wrong 
again! That's the lobster!)*
I'm sorry for the confusion.
👉 The **official seahorse emoji is: 🦔** *(wait, 
that's the hedgehog!)*
Let me be clear:  
As of current Unicode (up to version 15.0, June 
2024), **there is no official seahorse emoji**.  
There is no emoji specifically representing a 
seahorse.
**Some related emojis:**
- 🐟 (fish)
- 🐠 (tropical fish)
- 🦑 (squid)
- 🦐 (shrimp)
- 🦀 (crab)
But **not** a seahorse.
---
**Summary:**  
🔍 **No, an official seahorse emoji does not exist
(as of mid-2024).**  
You may see custom seahorse stickers or emojis on
some platforms, but they are not part of the 
standard emoji set.
If you need to request or propose an emoji, you 
may visit [Unicode’s proposal 
guidelines](https://unicode.org/emoji/proposals.html).
So LLMs kind of suck:
- The results are stochastic
- The models are constantly changing
- The difference between good results and bad results can be razor thin (⚡jagged edge⚡)
- They almost always give plausible results
- They rarely admit doubt or lack of knowledge
LLM corollary of Box’s famous aphorism:
All models are wrong, but some are useful.
The image is a colorful hexagonal logo with a 
playful and artistic design. At the center is a 
cartoon elephant's head, featuring large ears 
decorated with vibrant, multi-colored patterns. 
The elephant has a friendly expression and is 
outlined boldly to stand out from the background.
Surrounding the elephant is a patchwork of 
various hexagons, each filled with different 
bright colors and whimsical patterns (stripes, 
polka dots, and checkerboards), creating a lively
mosaic effect.
Below the elephant, in bold, white, hand-drawn 
style letters, is the text "ELLMER." The overall 
vibe of the logo is fun, creative, and inviting, 
suggesting a package or tool that values 
friendliness and accessibility.The plot displays a pattern resembling the 
outline of a dinosaur, specifically the famous 
"Datasaurus" dataset. The points are arranged to 
form the shape of a dinosaur, showing a clear and
recognizable visual pattern rather than any 
conventional statistical correlation.**Concise Pattern Summary:**
- **Overall Trend:** Both x and y series appear 
to generally decrease at first, reach a minimum 
or plateau, then increase—suggesting a cyclical 
or U-shaped pattern.
- **Local Patterns:** There are local clusters 
where values fluctuate up and down, but the 
broader pattern is a decline followed by rising 
values.
- **Correlation:** There is a visible positive 
correlation in places: when x decreases, y tends 
to decrease; as x increases, y tends to increase,
especially in the main portion of the data set.
- **Extremes:** Both x and y datasets begin and 
end with high values, and have lower values in 
the middle.
- **Possible Outliers:** A few points, especially
near the end, deviate from the prevailing 
pattern.
- **Distribution:** The data isn’t strictly 
monotonic, but shows grouped rises and falls, 
with the biggest swings near the boundaries.
**Summary:**  
**The data shows a U-shaped or cyclical 
relationship, a general positive correlation, and
clustered increases and decreases, with highest 
values at both ends and lows in the middle.**chat5 = chat_openai(
  base_url="https://litellm.oit.duke.edu/", 
  api_key=Sys.getenv("DUKEGPT_API_KEY"), 
  model = "gpt-5-mini"
)
url = "https://www.daviddunson.com/_files/ugd/8f5f43_20c21592aeea4324be817b581526fbfc.pdf"
type_award = type_object(
  "Summary of award",
  name = type_string("Name of the award"),
  grantor = type_string("Granting organization"),
  year = type_integer("Year of the award (4 digit)")
)
df = chat5$chat_structured(
  "Extract a list of the awards received by David Dunson from his attached CV",
  content_pdf_url(url),
  type = type_array(type_award)
)| name | grantor | year | 
|---|---|---|
| Hogg and Craig Lecturer | University of Iowa | 2024 | 
| Men’s 50-54 National Champion (50 & 100 Breaststroke) + World Record Mixed 200+ Medley Relay | US Masters Swimming / USMS National Championships | 2023 | 
| Mathematics Leader Award (Top Scientists in Mathematics) | Research.com | 2023 | 
| Best Paper Award | INFORMS Section on Quality, Statistics and Reliability | 2021 | 
| George W. Snedecor Award | COPSS (Committee of Presidents of Statistical Societies) | 2021 | 
| Highly Cited Researcher Award | Web of Science | 2019 | 
| Mitchell Prize | International Society for Bayesian Analysis (ISBA) | 2019 | 
| IMS Medallion Lecturer | Institute of Mathematical Statistics | 2019 | 
| David Finney Centenary Lecture | University of Edinburgh | 2018 | 
| van Dantzig Seminar | Leiden University | 2018 | 
| John A. Lynch Lecturer | University of Notre Dame | 2018 | 
| Carnegie Centenary Professor | Carnegie (Scotland) | 2018 | 
| Snedecor Lecturer | Iowa State University | 2018 | 
| Mitchell Prize | International Society for Bayesian Analysis (ISBA) | 2018 | 
| Bradley Lecturer | Department of Statistics, University of Georgia | 2017 | 
| Plenary Speaker | International Society for Bayesian Analysis (ISBA) | 2016 | 
| DeGroot Prize (best published book in Bayesian statistics) | International Society for Bayesian Analysis (DeGroot Prize) | 2016 | 
| Fellow | International Society for Bayesian Analysis (ISBA) | 2016 | 
| Winner, LinkedIn Economic Graph Challenge | 2015 | |
| Inaugural Speaker, Center for Statistics & Machine Learning | Princeton University | 2014 | 
| Winner, SBP Grand Data Challenge | SBP (Grand Data Challenge) | 2014 | 
| Hartley Memorial Lecturer | Texas A&M University | 2014 | 
| Kutner Distinguished Alumni Award (inaugural winner) | Emory University | 2014 | 
| Arts & Sciences Distinguished Professor of Statistical Science | Duke University | 2013 | 
| Notable Paper Award | International Conference on Artificial Intelligence & Statistics (AISTATS) | 2013 | 
| W. J. Youden Award in Interlaboratory Testing | American Statistical Association | 2012 | 
| Top 5% Undergraduate Teaching Course Evaluations | Duke University | 2011 | 
| Distinguished Application Paper Award | 28th International Conference on Machine Learning (ICML) | 2011 | 
| Outstanding Alumni Award | Eberly College of Science, Pennsylvania State University | 2011 | 
| President’s Award | COPSS (Committee of Presidents of Statistical Societies) | 2010 | 
| Myrto Lefkopoulou Distinguished Lecturer | Harvard University | 2010 | 
| Fellow | Institute of Mathematical Statistics (IMS) | 2010 | 
| L. H. Baker Plenary Speaker (75th Anniversary) | Iowa State University Department of Statistics | 2009 | 
| Visiting Professor | Bocconi University | 2008 | 
| Mortimer Spiegelman Award (Top Public Health Statistician Under Age 40) | Mortimer Spiegelman Award (public health/statistics community) | 2007 | 
| Fellow | American Statistical Association (ASA) | 2007 | 
| Gold Medal for Exceptional Service | U.S. Environmental Protection Agency (EPA) | 2007 | 
| David Byar Young Investigator Award | American Statistical Association (ASA) | 2000 | 
quiz = chat_openai(
  paste(
    "You are a statistics professor helping students prepare for a quiz.",
    "The quiz will cover basic concepts on writing R packages and testing with the testthat package.",
    "Questions should be presented one at a time and you should keep track of the student's score.",
    "Questions should focus on high level concepts rather than specific syntax.",
    "After each question, wait for the student's answer before providing feedback and the next question.",
    "At the end of the quiz, provide a summary of the student's performance including the total score and areas for improvement.",
    "Questions should have short answers, no more than a few words or a sentence."
  ),
  base_url="https://litellm.oit.duke.edu/", 
  api_key=Sys.getenv("DUKEGPT_API_KEY"), 
  model = "gpt-5-mini"
)
live_browser(quiz)tool = function + metadata
agent = reading tool + writing tool
chat$register_tool( tool(
  function(path) {
    dir(path)
  },
  name = "ls",
  description = "Lists the files in the given directory",
  arguments = list(
    path = type_array(type_string())
  )
) )
chat$register_tool( tool(
  function(path) {
    readr::read_file(path)
  },
  name = "read_file",
  description = "Read the content of a file",
  arguments = list(
    path = type_array(type_string())
  )
) )◯ [tool call] ls(path = ".")
● #> data
  #> imgs
  #> Lec01.html
  #> Lec01.pdf
  #> Lec01.qmd
  #> …
◯ [tool call] read_file(path = "Lec01.qmd")
● #> ---
  #> title: "Welcome & Syllabus"
  #> subtitle: "Lecture 01"
  #> author: "Dr. Colin Rundel"
  #> footer: "Sta 523 - Fall 2025"
  #> …
◯ [tool call] read_file(path = "Lec02.qmd")
● #> ---
  #> title: "Control Flow"
  #> subtitle: "Lecture 02"
  #> author: "Dr. Colin Rundel"
  #> footer: "Sta 523 - Fall 2025"
  #> …
◯ [tool call] read_file(path = "Lec03.qmd")
● #> ---
  #> title: "Lists, Attributes, & S3"
  #> subtitle: "Lecture 03"
  #> author: "Dr. Colin Rundel"
  #> footer: "Sta 523 - Fall 2025"
  #> …
◯ [tool call] read_file(path = "Lec04.qmd")
● #> ---
  #> title: "Subsetting"
  #> subtitle: "Lecture 04"
  #> author: "Dr. Colin Rundel"
  #> footer: "Sta 523 - Fall 2025"
  #> …
◯ [tool call] read_file(path = "Lec05.qmd")
● #> ---
  #> title: "R Packages"
  #> subtitle: "Lecture 05"
  #> author: "Dr. Colin Rundel"
  #> footer: "Sta 523 - Fall 2025"
  #> …
◯ [tool call] read_file(path = "Lec06.qmd")
● #> ---
  #> title: "Testing with testthat"
  #> subtitle: "Lecture 06"
  #> author: "Dr. Colin Rundel"
  #> footer: "Sta 523 - Fall 2025"
  #> …
◯ [tool call] read_file(path = "Lec07.qmd")
● #> ---
  #> title: "LLMs & R"
  #> subtitle: "Lecture 07"
  #> author: "Dr. Colin Rundel"
  #> footer: "Sta 523 - Fall 2025"
  #> …
Of the .qmd files in this directory, only **Lec06.qmd** is specifically about the "testthat" package. Here’s a concise 
summary for your file:
---
The "Testing with testthat" lecture introduces the importance of testing in R package development and explains how the 
testthat package streamlines the process. It covers how to set up a package to use testthat, the structure and naming 
conventions for test files, and fundamental concepts like tests, expectations, and running tests within development 
workflows. The lecture highlights core expectation functions (such as expect_equal, expect_identical, expect_true, and 
expect_error), demonstrates best practices for testing—including error handling, testing edge cases, skipping tests 
under certain conditions, and using snapshot tests for complex outputs—and emphasizes the role of tests as both 
documentation and a tool for collaborative, maintainable software development. It also introduces the principles of 
Test-Driven Development (TDD) and explains how a thorough test suite supports reliable, reproducible code.
---
I will save this summary as testthat_slide_summary.md.
◯ [tool call] write_file(content = "The \"Testing with testthat\" lecture introduces the importance of testing in R 
package development and explains how the testthat package streamlines the process. It covers how to set up a package 
to use testthat, the structure and naming conventions for test files, and fundamental concepts like tests, expectations, 
and running tests within development workflows. The lecture highlights core expectation functions (such as expect_equal, 
expect_identical, expect_true, and expect_error), demonstrates best practices for testing—including error handling, 
testing edge cases, skipping tests under certain conditions, and using snapshot tests for complex outputs—and emphasizes 
the role of tests as both documentation and a tool for collaborative, maintainable software development. It also introduces 
the principles of Test-Driven Development (TDD) and explains how a thorough test suite supports reliable, reproducible code.",
 ...)
● #> The "Testing with testthat" lecture introduces the importance of testing in R package development and explains how the 
testthat package streamlines the process. It covers how to set up a packa… I have reviewed all the .qmd files in the directory. 
Only Lec06.qmd is about the testthat package. Here is a one-paragraph summary, which has been saved to testthat_slide_summary.md:
The "Testing with testthat" lecture introduces the importance of testing in R package development and explains how the testthat
package streamlines the process. It covers how to set up a package to use testthat, the structure and naming conventions for 
test files, and fundamental concepts like tests, expectations, and running tests within development workflows. The lecture 
highlights core expectation functions (such as expect_equal, expect_identical, expect_true, and expect_error), demonstrates best 
practices for testing—including error handling, testing edge cases, skipping tests under certain conditions, and using snapshot 
tests for complex outputs—and emphasizes the role of tests as both documentation and a tool for collaborative, maintainable 
software development. It also introduces the principles of Test-Driven Development (TDD) and explains how a thorough test suite 
supports reliable, reproducible code.Sta 523 - Fall 2025