Web APIs

Lecture 16

Dr. Colin Rundel

URLs

Query Strings

Provides named parameter(s) and value(s) that modify the behavior of the resulting page.

Format generally follows:

?arg1=value1&arg2=value2&arg3=value3

Some quick examples,

URL encoding

This is will often be handled automatically by your web browser or other tool, but it is useful to know a bit about what is happening

Spaces will encoded as ‘+’ or ‘%20’
Certain characters are reserved and will be replaced with the percent-encoded version within a URL

!	#	$	&	’	(	)
%21	%23	%24	%26	%27	%28	%29
*	+	,	/	:	;	=
%2A	%2B	%2C	%2F	%3A	%3B	%3D
?	@	[	]
%3F	%40	%5B	%5D

Characters that cannot be converted to the correct charset are replaced with HTML numeric character references (e.g. a Σ would be encoded as Σ )

Examples

URLencode("http://lmgtfy.com/?q=hello world")

[1] "http://lmgtfy.com/?q=hello%20world"

URLdecode("http://lmgtfy.com/?q=hello%20world")

[1] "http://lmgtfy.com/?q=hello world"

URLencode("!#$&'()*+,/:;=?@[]")

[1] "!#$&'()*+,/:;=?@[]"

URLencode("!#$&'()*+,/:;=?@[]", reserved = TRUE)

[1] "%21%23%24%26%27%28%29%2A%2B%2C%2F%3A%3B%3D%3F%40%5B%5D"

URLencode("!#$&'()*+,/:;=?@[]", reserved = TRUE) |> 
  URLdecode()

[1] "!#$&'()*+,/:;=?@[]"

URLencode("Σ")

[1] "%CE%A3"

URLdecode("%CE%A3")

[1] "Σ"

RESTful APIs

REST

REpresentational State Transfer

Describes an architectural style for web services (not a standard)
All communication via HTTP requests and responses
Key features:
- Client-server architecture - separation of concerns between client and server
- Stateless - each request contains all information needed; no session state stored on server
- Addressable - resources identified by URLs (endpoints)
- Uniform interface - standard HTTP methods (GET, POST, PUT, DELETE)
- Cacheable - responses can be cached to improve performance
- Layered system - client cannot tell if connected directly to server or intermediary
Resources are represented in standard formats (typically JSON or XML)

GitHub API

GitHub provides a REST API that allows you to interact with most of the data available on the website.

There is extensive documentation and a huge number of endpoints to use - almost anything that can be done on the website can also be done via the API.

GitHub REST API

Demo 1 - GitHub API
Basic access

Get a user

List organization repositories

Pagination

Many REST APIs limit the number of results returned in a single response to manage server load and improve performance. When working with large datasets, you’ll need to make multiple requests to retrieve all results.

Common pagination approaches:

Offset-based - specify starting position and number of items (?offset=20&limit=10)
Page-based - specify page number and page size (?page=2&per_page=30)
Cursor-based - use a token/cursor pointing to next set of results
Link header - server provides URLs to next/previous pages in response headers

GitHub API Pagination

GitHub uses page-based and link header pagination:

Query parameters:
- per_page - number of items per page (default: 30, max: 100)
- page - page number to retrieve (default: 1)
Link header: GitHub includes a Link header in responses with URLs for:
- next - next page of results
- prev - previous page
- first - first page
- last - last page

Background

httr2 is a package designed around the construction and handling of HTTP requests and responses. It is a rewrite of the httr package and includes the following features:

Pipeable API
Explicit request object, with support for
- rate limiting
- retries
- OAuth
- Secure secret storage
Explicit response object, with support for
- error codes / reporting
- common body encoding (e.g. json, etc.)

Structure of an HTTP Request

HTTP Methods / Verbs

GET - fetch a resource
POST - create a new resource
PUT - full update of a resource
PATCH - partial update of a resource
DELETE - delete a resource.

Less common verbs: HEAD, TRACE, OPTIONS.

httr2 request objects

A new request object is constructed via request() which is then modified via req_*() functions

Some useful functions:

request() - initialize a request object
req_method() - set HTTP method
req_url_query() - add query parameters to URL
req_url_*() - add or modify URL
req_body_*() - set body content (various formats and sources)
req_user_agent() - set user-agent
req_dry_run() - shows the exact request that will be made

Structure of an HTTP Response

Status Codes

1xx: Informational Messages
2xx: Successful
3xx: Redirection
4xx: Client Error
5xx: Server Error

httr2 response objects

Once constructed a request is made via req_perform() which returns a response object (the most recent response can also be retrieved via last_response()). Content of the response are accessed via the resp_*() functions