The authoritative, exhaustive reference for the roxyassert annotation language. Every construct, and a representative spanning set of combinations, is written out so the language can be audited against itself — it must agree with itself everywhere.
1. Design invariants
-
One home per concept. The element type and its
element-level modifiers (
in,| NA) and avector’s length live in<>(or attach to a bare atomic type); the whole-argument modifiers (?/| NULL, type unions) sit outside. -
“Bare” means different things by category. A bare
atomic type is a vector of any length
(
scalar<...>is the only way to say length 1); a bare wildcard (any) is any object; a bare reference type (function/class<Class>) is a single length-1 object; a bare composite is an unconstrainedlist/table (§2, §6). -
NAandNULLare reserved words, never type names — this keeps|unambiguous. -
Element-level binds tighter than slot-level.
in/| NAbind to the nearest atom on their left, before any slot-level|union or?. - The language is recursive. A composite contains full annotations in its bullets, to any depth.
-
Values are copied verbatim; the generator never
coerces. Every value the user writes — an interval bound, a set
element — is an R expression emitted into the generated check exactly as
written (
0,as.Date("2024-01-01"),c("BUY", "SELL"),ORDER_SIDE). roxyassert performs no coercion and invents no time zone or format. Intervals are ISO/Bourbaki bracket notation around two such expressions. Element/atom type agreement is checked for inline literals and taken on trust for opaque expressions (S2). -
A type may only carry what its category supports
(§2). The grammar makes category-level nonsense unrepresentable
— an interval on
complex, a set onlogical/raw, a fractional or expression bound oninteger(integer bounds are literal whole numbers or±Inf), avectoroffunction,| NAonraw, a constraint onany.Inf/-Infare side-specific sentinels (-Inflow only,Infhigh only). Anumericbound, by contrast, is an opaque expression — a number (0) or any R expression — checked only for being valid R. ADate/POSIXctbound must be a class-matching expression (as.Date(...)): a bare number is rejected as a type error (S2), and any other expression is taken on trust.
2. Type categories
Every base type sits in exactly one category, and the category alone (no per-type exceptions) decides which modifiers are legal:
| Category | Types | in [interval] |
in c(set) |
| NA |
length / vector<>
|
shape |
|---|---|---|---|---|---|---|
| Ordered atomic |
integer numeric Date
POSIXct
|
✅ | ✅¹ | ✅ | ✅ | bare / scalar<> /
vector<>
|
| Enumerable atomic |
character factor
|
❌ | ✅² | ✅ | ✅ | bare / scalar<> /
vector<>
|
| Plain atomic |
complex logical
|
❌ | ❌ | ✅ | ✅ | bare / scalar<> /
vector<>
|
| Byte atomic | raw |
❌ | ❌ | ❌ | ✅ | bare / scalar<> /
vector<>
|
| Count | count |
✅ | ❌ | ❌ | ✅ | bare / scalar<> /
vector<>
|
| Wildcard | any |
❌ | ❌ | ❌ | ✅ | bare / scalar<> /
vector<>
|
| Reference |
function class<Class>
|
❌ | ❌ | ❌ | ❌ | bare only (length-1 by nature) |
| Composite |
list data.table
data.frame
|
❌ | ❌ | ❌ | ❌ | bare, nested bullets, or list<T>
(list only) |
| Promise | promise<T> |
❌ | ❌ | ❌ | ❌ | wraps one resolved type T; collapses to
T (S5) |
¹ Sets compare with ==/%in% and apply no
normalisation. integer and whole-day Date sets
(from as.Date/Sys.Date) are
exact and fine; numeric sets are
floating-point-fragile, and POSIXct (and any sub-day
Date) sets are precision/time-zone-fragile, so are
discouraged — prefer intervals. Interval bounds and set
elements for Date/POSIXct are written as
ordinary R expressions of the matching class —
as.Date("2024-01-01"),
as.POSIXct("2024-01-01 09:00", tz = "America/New_York") —
and copied verbatim, so the user (not roxyassert) chooses the
constructor, format, and time zone (S2). A
Date/POSIXct bound’s class and time zone are
taken on trust and compared with </>
(which coerce), so a class/tz mismatch yields a silently-wrong range
check — keep the bound and the value in the same class/tz.
integer bounds are literal whole numbers;
Inf/-Inf are open-end sentinels (S2). ² On
character and factor a set constrains the
realised values
(as.character(x) %in% set); with | NA the
generated check is
all(as.character(x) %in% set | is.na(x)). It does
not assert a factor’s declared levels() —
for that, use prose (out of scope, §12).
any asserts nothing about type — the
generator emits no type check, only the requested length/nullability. It
is the explicit escape hatch for a polymorphic argument, an
unconstrained list/record field, or a data.table
list-column of arbitrary cells (S3).
complex/logical/raw take no set
(degenerate or floating-point-fragile), and raw has no
NA representation. function/class
are bare length-1 references; a class<Name> asserts
the value’s class via inherits(), so it works for any
object system (S3, S4, Reference Classes, R6, S7) and matches subclasses
(class<AbstractClock> accepts a
RealClock). Name is a single class, not a
pkg::Class reference (name the source package in prose if
it helps). A composite is refined by nested bullets (a named record /
typed columns) or — for list — parameterised as
list<T>, a homogeneous list whose every element is
the type T. All of these are enforced by the grammar (§3);
the residual element-type rules are static (§4).
3. Formal context-free grammar (EBNF)
annotation ::= "(" slot ")" description? bullets?
(* everything after ) is free-text description (roxyassert
ignores it); any ":" adjacent to it is cosmetic.
bullets gated by S1 *)
slot ::= type ( "|" type )* ( ( "|" "NULL" ) | "?" )?
(* a union of >=1 types; the whole argument may be NULL,
written EITHER as a "| NULL" alternative OR a trailing "?",
never both *)
type ::= atomic | wildcard | reference | composite | derived | promise | named
named ::= ident
(* an identifier that is not a built-in type: a reference to a
type declared elsewhere with `@type` (S6). Resolved inline at
document() time; takes no in / | NA / bullets at the use site
(refine in the @type, not here). *)
atomic ::= atom
| "scalar" "<" atom ">"
| "vector" "<" atom "," length ">"
(* a bare atom is a vector of any length; vector<> requires a
length (a length-less vector<atom> is a parse error — use
bare `atom`) *)
atom ::= "integer" ( "in" ( int_interval | set ) )? ( "|" "NA" )?
| "numeric" ( "in" ( num_interval | set ) )? ( "|" "NA" )?
| temporal ( "in" ( time_interval | set ) )? ( "|" "NA" )?
| enumerable ( "in" set )? ( "|" "NA" )?
| plain ( "|" "NA" )?
| "raw"
| "count" ( "in" int_interval )?
(* count: a non-negative whole number (double OR integer);
whole-number interval bounds like integer; no set, no NA *)
temporal ::= "Date" | "POSIXct"
enumerable ::= "character" | "factor"
plain ::= "complex" | "logical"
wildcard ::= "any"
| "scalar" "<" "any" ">"
| "vector" "<" "any" "," length ">"
(* no in / set / | NA: any asserts nothing about type *)
reference ::= "function" | "class" "<" ident ">"
composite ::= "list" ( "<" type ">" )? | "data.table" | "data.frame"
(* bare `list`/table = unconstrained, or a named record /
typed columns when refined by nested bullets (S1/S3).
`list<T>` is a HOMOGENEOUS list: EVERY element satisfies the
type T — list<character>, list<scalar<numeric>>,
list<class<Engine>>, list<any>, list<data.table>. T is a `type`
(no slot-level union / `| NULL` / `?`); a `list<T>` is a leaf
and takes no bullets (S1, S3). *)
derived ::= "extends" ident ( "," ident )*
( ( "pick" | "omit" ) ident ( "," ident )* )?
(* a DERIVED composite (S7): inherit the columns of the named
base record(s), then add / override columns via nested bullets
and/or narrow with pick / omit. The composite KIND is inherited
from the base(s) and never restated (no `data.table extends`);
the whole derivation sits INSIDE the parentheses, like any
type. Resolved at document() time by splicing columns; bases
resolve through `named`, so cycle / unknown-base detection is
reused. *)
promise ::= "promise" "<" type ">"
(* a result resolving to the type T, delivered synchronously
OR as a promises::promise (S5). T is a single `type` (no
slot-level union / `| NULL` / `?` inside `<>`, as with
list<T>; a data.table T may still take field bullets, which
describe the resolved table). A union `T | promise<T>` (same
T) is the sync-or-async pattern and collapses to the single
resolved T; nested promise<promise<T>> collapses likewise.
promise<T> stands for a whole slot value, so it may NOT be a
list element or sit inside scalar<>/vector<>. Most natural on
@return, but allowed anywhere. *)
int_interval ::= low int_lo "," int_hi high
num_interval ::= low num_lo "," num_hi high
time_interval ::= low time_lo "," time_hi high
low ::= "[" (* closed *) | "]" (* open *)
high ::= "]" (* closed *) | "[" (* open *)
int_lo ::= "-"? digit+ | "-Inf"
int_hi ::= "-"? digit+ | "Inf"
num_lo ::= signed_number | "-Inf" | rexpr
num_hi ::= signed_number | "Inf" | rexpr
time_lo ::= "-Inf" | rexpr
time_hi ::= "Inf" | rexpr
(* Only the LOW bound may be -Inf and only the HIGH may be Inf;
a wrong-side sentinel ([Inf, 0]) is a parse error. A
Date/POSIXct bound is an R expression of the matching class,
emitted verbatim — as.Date("2024-01-01"),
as.POSIXct("2024-01-01 09:00", tz = "America/New_York"),
lubridate::ymd_hms("...") — so the user owns the constructor
and time zone; roxyassert never coerces. *)
set ::= name_set | call_set
name_set ::= ident (* a bare constant; a single maximal-munch token,
terminated by | , ) > or ?; may NOT be a
compound or indexing R expression *)
call_set ::= rexpr (* a call / index / any expression with brackets
or an operator — c("a", "b"), pkg::CONST *)
length ::= int (* exactly n *)
| int ".." int (* inclusive range *)
| int ".." (* at least n *)
bullets ::= bullet+ (* siblings at one indentation depth *)
bullet ::= "- " name " (" slot ")" description? bullets?
(* everything after ) is free-text description; any ":" is
cosmetic. trailing bullets gated by S1. *)
name ::= ident | "**" ident "**" (* bold optional, no semantic effect *)
(* lexical terminals *)
int ::= digit+
signed_number ::= "-"? digit+ ( "." digit+ )?
ident ::= letter ( letter | digit | "." | "_" )*
rexpr ::= (* any valid R expression; opaque, emitted verbatim *)
description ::= (* free text; the remainder of the same physical line *)
Tokenizing rules (so the grammar is deterministic in practice):
-
One scan order. An
rexprregion (acall_set, or aDate/POSIXctbound) is consumed first, by a string-aware, bracket-balanced R scan that skips string/character literals and balances([{. Only after a bound/set is fully consumed does the next structural token count. Consequently any,|><[]inside a balancedrexpris invisible to the annotation tokenizer; only top-level structural tokens are seen. -
Interval scan (single pass). A
[or]immediately afterinopens an interval. Each bound is then scanned by the string-aware bracket-balanced rexpr scan (rule 1), the two bounds separated by exactly one top-level comma; the interval is closed by the first top-level]or[that appears after the high bound has been fully balanced. So a bound may itself end in](e.g.df[["t"]]): the balancer consumes it first, and only the bracket after it closes the interval. Commas/brackets inside a balancedrexprbound are invisible; an open interval]0, Inf[never unbalances anything. -
Bound classification (ordered choice). A bound is
matched by ordered choice: a bare
-Inf/Infis always lexed as the sentinel token (beforerexpr), then asigned_number(maximal munch of-?digit+(.digit+)?with nothing trailing before the comma/close), then any otherrexpr. The sentinel is side-gated —-Infonly in the low slot,Infonly in the high — so a wrong-side bare sentinel ([Inf, 0],]a, -Inf]) has no production and is a genuine parse error; because a bare±Infalways lexes as the sentinel it cannot fall through torexpr. (Anrexprthat merely evaluates to±Infis opaque and, if it makes the interval degenerate, is caught by S4.) A bound that lexes as a sentinel orsigned_numberis a literal (subject to S2/S4); anything else is a trustedrexpr(S2). -
Set dispatch after
in. If the next char is[or], parse an interval. Otherwise scan an R expression: a lone maximal-munchidentwith nothing trailing before| , ) > ?is aname_set; anything containing(/[/an operator is acall_set(bracket-balanced, string-aware). A barename_setmay not be compound or indexed —FOO | NAis the setFOOplus an atom-level| NA, andfoo[1:2]must be written as thecall_setit is. The terminating|is then resolved by §5 (NA→ atom,NULL/type → slot). -
The
vectorlength comma. Insidevector<...>the elementatom— with itsin (...)and| NA— is consumed first; the atom/length comma is the first top-level comma that remains.scalar<...>has no length production, so any top-level comma insidescalar<...>is a parse error (a comma inside its interval or set is fine) — that single rule is whyscalar<T, n>is rejected. -
>closes the innermost open generic (depth-aware);vector<...>andclass<...>each close one level.classmust be followed by<Name>(anident); a bareclassis a parse error. -
..is exactly two dots (maximal munch), valid only in alength; a run of three or more dots (...) is a lexical error. A lone.is a decimal point, valid only inside a numeric bound;..in a bound position and a lone.in a length position are lexical errors. -
Everything after
)is free-text description. roxyassert parses only the(slot)token; the rest of the line is description it ignores, so any:adjacent to it ((type): desc,(type) desc:) is cosmetic — the canonical style omits it. Thenameof a bullet is the single token before the first(at the bullet’s top level; theslotis the balanced-parenthesis group opened by that(, found with the same string-aware scan asannotation. Names contain no spaces. Childbulletsbegin only on a subsequent line at strictly greater indentation whose first non-space characters are-; a-inside description text (same line) is literal. -
Reference and wildcard shapes —
function/class<Class>are bare length-1 references (neverscalar<>/vector<>, neverin/| NA/length);anytakes noin/set/| NAbut does acceptscalar<>/vector<>for a length check. -
Union
| NA(one-token look-ahead past the|). Each union member is parsed as a fullatom, which eagerly consumes its own optional| NAbefore the slot’s union loop resumes; the disambiguation is the token after the|(NA→ element, reserved;NULL/typename → slot). So in(numeric | character | NA)the| NAbelongs tocharacter.| NULLis not positional: the grammar fixes it to the slot tail, so a mid-union(numeric | NULL | character)is a parse error. -
extendsname lists. A slot opening with the keywordextendsis aderivedcomposite: a comma-separated list of baseidents, then an optionalpick/omitkeyword with its own comma-separatedidentlist of column names. Both lists are plain identifiers (no brackets /rexpr); each ends at the first token that is notident",". Nestedbulletsafter the)add or override columns exactly as for a bare composite (S1).
4. Static rules (beyond the context-free grammar)
These context-sensitive constraints are checked by the generator; they are specified rules, not stylistic suggestions.
-
S1 — composite nesting. Nested
bulletsmay follow anannotationor abulletonly when its slot has exactly one non-NULL alternative and that alternative is a barelist/data.table/data.frame(no<T>parameter). Any other slot — including alist<T>, which is a leaf — takes no children;(data.table | data.frame)with bullets is rejected (which field-set would they describe?). -
S2 — bound/set element type and lowering. Every
interval bound and set element is an R expression emitted
verbatim — roxyassert never coerces. For an inline literal the
generator checks it matches the atom’s type: a number for
numeric; forinteger, an interval bound is a bare whole number ([1, 5], perint_lo/int_hi) while a set element carries theLsuffix (c(1L, 2L, 3L); a barec(1, 2, 3)for anintegerset is rejected — no coercion); a character literal for bothcharacterandfactorsets (afactorset is given by its character labels, matched viaas.character(x), footnote 2). ForDate/POSIXct“type” means the S3 class (inherits(x, "Date")), nottypeof; the bound is a class-matching expression the user supplies (as.Date(...),as.POSIXct(..., tz = ...),lubridate::ymd_hms(...)), so the user owns constructor/format/tz, and a class/tz mismatch compares silently wrong.Inf/-Infare open-end sentinels —-Infonly the low bound,Infonly the high. A closed bracket at a sentinel means “no bound that side” (the comparison is omitted), not a value. On anumeric, an open bracket at a sentinel instead excludes that infinity — a finiteness constraint:]0, Inf[is “finite and > 0”,]-Inf, Inf[is “any finite double” (lowered withupper = Inf, upper_inclusive = FALSE/lower = -Inf, lower_inclusive = FALSE). Finiteness isnumeric-only —integer/countcannot beInfand aDate/POSIXctInfbound would be a type mismatch, so for those a sentinel is omitted regardless of bracket. With| NAan interval is lowered NA-aware —all((x in range) | is.na(x)), equivalently checked onx[!is.na(x)]— parallel to the set form in footnote 2; the default (no| NA) rejects NA. An opaquename_setor any otherrexpris taken on trust — prefer an inline literal where it matters. -
S3 — named composite fields, homogeneous lists, and
list-columns. A bulleted
list/data.table/data.frameasserts the presence of the named fields/columns listed (all(<names> %in% names(x)), plus the per-column type for tables). Alist<T>instead asserts every element satisfiesT, applied element-wise (assert_list_offor a flat element type; an element-wise loop ofT’s full check whenTcarries length/range/| NA).list<any>imposes no per-element check — it lowers to a bareassert_list(x), solistandlist<any>are the same runtime check. A column declared with an atomic type rejects a list-cell — the intended way to catch an accidental list-column; to declare one deliberately, type itlist<T>(each cell aT, e.g.list<character>) orlist<any>. Positional/unnamed records are out of scope (§12). Field checks apply only when the value is non-NULL: a| NULL/?value short-circuits them. -
S4 — non-empty interval. For literal
numeric/integer bounds, the low bound must be ≤ the high bound (and
strictly < when either end is open), so an empty or reversed interval
(
]1, 1[,[5, 1]) — whose check can never pass — is rejected at generation time. (A wrong-side sentinel such as[Inf, 0]is caught earlier, by the grammar.)Date/POSIXct/rexprbounds are opaque and not range-checked. -
S5 — promise
is the resolved type, async-agnostic. Apromise<T>(and the unionT | promise<T>, same T) lowers to the checks for the resolved typeTonly — roxyassert never emitspromises::then()oris.promise(), because the same generated validator must serve a function whether it works withTdirectly or a promise ofT. The caller wires the async by applying the generatedfunction(value) valuevalidator however they need — it is athen()callback, e.g.promises::then(impl, assert_return_fn)for an always-async result, or branching on a knownasyncflag for a sync-or-async one.promise<T>is most natural on@returnbut is allowed in any slot position (roxyassert does not police it: a function that takes a promise input is valid; the caller decides how to apply the generated check to the resolved value). It stands for a whole slot value, so it may not be alist<>element or sit insidescalar<>/vector<>(a list of promises cannot be validated synchronously per element). TheT | promise<T>union must resolve to a singleT— every alternative must be the same type, same base, shape and refinement values (a promise unioned with a different type is rejected); nestedpromise<promise<T>>is peeled toT. Refinements (sets, bounds) are compared by parsing them, so cosmetic spacing differs harmlessly —numeric in c(1, 2)andnumeric in c(1,2)match, while[0, 1]and]0, 1[do not. The emitted check still uses the verbatim text you wrote (§1.6); only the equality test parses. -
S6 — named types (
@type). A@type Name (type)block declares a reusable named type; a bareName(anything that is not a built-in type) in an annotation is a reference, resolved inline at document() time — the generated checks are exactly those of the named type’s definition (no runtime cost). A@typedefines a single type: no slot-level|union,?, or| NULLin the definition (those go at the use site). A reference is usable bare, nullable, in a union, and insidelist<…>/promise<…>, but not insidescalar<>/vector<>(define a scalar/vector alias directly) and it takes no use-site refinement (in/| NA/ bullets) — the shape lives only in the definition. A@typemay reference another (resolved transitively); a cycle, an unknown name, a duplicate@type, and a name that shadows a built-in are all errors. Types are package-local. -
S7 — record derivation (
extends). Aderivedtype splices the resolved columns of its base(s) with this type’s own field bullets and anypick/omit, producing a plain composite that lowers exactly like a hand-written record (no runtime cost). The composite kind is inherited: every base must resolve to a bare composite (S3) and all bases must share one kind (mixinglistwithdata.tableis an error); the derived type never restates the kind. Inherited columns come first, in base order, then this type’s added columns in source order. A bullet whose name matches an inherited column overrides it in place (its position is kept) — a trusted full replacement, not checked for being a narrowing (roxyassert has no subtype lattice, so the author owns compatibility). A bullet with a new name appends. A column defined by more than one base is an error unless the derived type redeclares it (the override resolves the tie).pickkeeps only the listed inherited columns,omitdrops them; the two are mutually exclusive and every listed name must exist in a base. The collision check runs afterpick/omit, so dropping a shared column withpick/omitalso resolves the conflict; conversely, redeclaring a column thatpick/omitremoved is an error. Listing a base twice, declaring a column twice, or restating the kind (data.table extends …) are errors, and the keywordsextends/pick/omitmay not be@typenames. Bases resolve throughnamed(S6), so an unknown base and a cycle reuse S6’s detection. Works in a@typedefinition and inline in a@param/@return. Column renaming and generic / parameterized types are out of scope (§12).
5. Binding & precedence (why | is never
ambiguous)
A | is resolved by one-token look-ahead past
it — the token that follows:
What follows |
|
Reads as | Level |
|---|---|---|
NA |
elements may be missing | element (part of an atom) |
NULL |
the whole argument may be NULL
|
slot (tail only) |
a type (atomic / wildcard /
reference / composite) |
a union alternative | slot |
NA/NULL are reserved and can never start a
type, so this peek is unambiguous. Only | NA is
positional (it attaches to the atom on its left);
| NULL is fixed by the grammar to the slot tail and never
appears mid-union. | NA is valid only after an
atom (never after a closed
scalar<>/vector<>, never on
raw, any, reference, or composite), and binds
to the immediately preceding atom — the nearest type to
its left, at most one per atom. Thus:
-
(scalar<numeric> | NA)is invalid; write(scalar<numeric | NA>). - in
(numeric | character | NA)the| NAbinds tocharacteronly — a numeric vector, or a character vector whose elements may beNA. - in
(numeric in [0, 1] | NA)theinand| NAboth attach tonumeric.
6. Base types
| Type | Category | R meaning |
|---|---|---|
integer |
ordered atomic | integer vector |
numeric |
ordered atomic | double vector |
Date |
ordered atomic |
Date vector |
POSIXct |
ordered atomic | date-time vector |
character |
enumerable atomic | character vector |
factor |
enumerable atomic | factor |
complex |
plain atomic | complex vector |
logical |
plain atomic |
TRUE/FALSE vector |
raw |
byte atomic | raw vector (no NA) |
count |
count | non-negative whole number(s), 20 or 20L
(assert_scalar_count / assert_count); no
NA, no set |
any |
wildcard | any R object; no type check (length/nullability only) |
function |
reference | a function/closure (length 1) |
class<Class> |
reference | an object whose class is Class — any object system
(S3/S4/RC/R6/S7), subclasses match (length 1) |
list |
composite | a list — bare = unconstrained; + bullets = named record
(S3); list<T> = homogeneous (every element
T) |
data.table |
composite | a data.table (typed/list-columns when refined, S3) |
data.frame |
composite | a data.frame (typed/list-columns when refined, S3) |
promise<T> |
promise | a result resolving to T, sync or async (S5) |
A list/data.table/data.frame
with nested bullets is a fixed named-field structure;
without bullets it is an unconstrained list/table.
list<T> is the third form — a homogeneous list (or
list-column) whose every element is T
(e.g. list<character>,
list<class<Engine>>,
list<any>).
7. Every construct, with examples
# --- bare atomic = vector (any length) ---
(character) # character vector, length >= 1
(numeric) # double vector
(integer) # integer vector
(logical) # logical vector
(complex) # complex vector
(raw) # raw vector
(Date) # Date vector
(POSIXct) # date-time vector
(factor) # factor of any length
# --- wildcard: any asserts nothing about type ---
(any) # any R object, unchecked
(any?) # anything, or NULL
(scalar<any>) # a single object of any type
(vector<any, 3>) # length 3, elements of any type(s)
(vector<any, 1..>) # at least one element, any type(s)
# --- scalar (length 1) ---
(scalar<character>)
(scalar<numeric>)
(scalar<complex>)
(scalar<raw>) # a single byte
(scalar<logical | NA>) # tri-state flag: TRUE / FALSE / NA
# --- count: a non-negative whole number (accepts 20 or 20L) ---
(count) # a vector of counts
(scalar<count>) # one count, 0, 1, 2, ...
(scalar<count in [1, Inf[>) # a positive count (>= 1)
(scalar<count in [1, Inf[>?) # a positive count, or NULL
(vector<count, 3>) # three counts
# --- reference types: bare, length-1 by nature ---
(function) # a single function/closure
(function?) # a function, or NULL
(class<Engine>) # a single object of class Engine (any system)
(class<Engine> | NULL) # an Engine, or NULL
# --- homogeneous lists: list<T> = every element is T ---
(list<character>) # a list, every element a character vector
(list<scalar<numeric>>) # a list of single numbers
(list<class<Engine>>) # a list of Engine instances
(list<function>) # a list of callbacks
(list<any>) # a list of anything (= bare list at runtime)
(list<data.table>) # a list of data.tables
# --- promise returns: resolved type, sync or async ---
(promise<data.table>) # a promise resolving to a data.table
(promise<scalar<numeric>>) # a promise resolving to one number
(data.table | promise<data.table>) # a data.table OR a promise of one (sync-or-async)
# --- vector with explicit length (every atom type) ---
(vector<numeric, 10>) # exactly 10
(vector<numeric, 1..10>) # 1 to 10 inclusive
(vector<integer, 2..>) # at least 2
(vector<character, 0..>) # any length, including 0
(vector<logical, 3>) # three flags
(vector<raw, 32>) # a 32-byte hash
(vector<factor in c("a", "b"), 2..>) # set + open length
(vector<numeric in [0, 1], 1..>) # interval + open-ended length
(vector<Date in [as.Date("2024-01-01"), as.Date("2024-12-31")], 1..7>) # rexpr-comma vs length-comma
# --- intervals (ordered atomics only; -Inf is the low sentinel, Inf the high) ---
(scalar<numeric in [0, 1]>) # 0 <= x <= 1
(scalar<numeric in ]0, 1[>) # 0 < x < 1
(scalar<numeric in ]0, 1]>) # 0 < x <= 1
(scalar<numeric in [-1.5, 2.5]>) # fractional, signed bounds
(scalar<numeric in ]0, Inf[>) # x > 0 and finite (open bracket excludes Inf)
(scalar<numeric in ]0, Inf]>) # x > 0, Inf allowed (closed sentinel: no upper bound)
(scalar<numeric in ]-Inf, 0]>) # x <= 0 and finite (open bracket excludes -Inf)
(scalar<numeric in ]-Inf, Inf[>) # any finite double
(scalar<integer in [1, Inf[>) # x >= 1 (integer can't be Inf, so sentinel omitted)
(scalar<integer in ]-Inf, 0]>) # x <= 0 (-Inf is the low sentinel)
(numeric in [0, 1]) # every element in [0, 1]
(scalar<Date in [as.Date("2024-01-01"), as.Date("2026-12-31")]>) # bounds are R exprs, verbatim
(scalar<Date in [as.Date("2024-01-01"), Inf[>) # on or after a date (Inf high sentinel)
(scalar<Date in ]-Inf, as.Date("2024-12-31")]>) # on or before a date (-Inf low sentinel)
(scalar<POSIXct in [as.POSIXct("2024-01-01 00:00", tz = "America/New_York"), Inf[>) # user owns the tz
(scalar<POSIXct in [lubridate::ymd_hms("2024-01-01 00:00:00"), lubridate::ymd_hms("2025-01-01 00:00:00")[>) # any constructor
# --- sets / enums (ordered + enumerable atomics) ---
(scalar<character in c("BUY", "SELL")>) # inline set, scalar
(character in c("BUY", "SELL")) # vector, every element in the set
(scalar<character in ORDER_SIDE>) # set from a bare constant name
(integer in c(1L, 2L, 3L)) # exact integer enum (bare vector)
(scalar<integer in c(1L, 2L, 3L)>) # exact integer enum (scalar)
(factor in c("low", "med", "high")) # constrains realised values (footnote 2)
(Date in c(as.Date("2024-01-01"), as.Date("2024-06-30"))) # set elements are R exprs, verbatim
(numeric in c(0.25, 0.5, 1.0)) # discouraged: floating-point ==
# --- NA permission (atomics except raw; default: NA not allowed) ---
(numeric | NA) # numeric vector, NAs allowed
(scalar<numeric | NA>) # one numeric or NA
(scalar<integer | NA>) # one integer or NA
(scalar<POSIXct | NA>) # one timestamp or NA
(complex | NA) # complex vector, NAs allowed
(vector<numeric | NA, 10>) # 10 numerics, NAs allowed
(numeric in [0, 1] | NA) # constrained + NA-allowed
(factor in c("low", "med", "high") | NA) # missing-category factor
# --- nullable slot (whole argument) ---
(scalar<numeric>?) # one numeric, or NULL
(scalar<numeric> | NULL) # identical to the above (use one, not both)
(character?) # character vector, or NULL
# --- type unions (slot level) ---
(numeric | character) # a numeric vector OR a character vector
(numeric | character | NA) # numeric, OR character with NAs allowed
(class<Reader> | class<Writer>) # either class
(data.table | NULL) # a data.table or NULL
# --- everything at once ---
(vector<numeric in ]0, 1] | NA, 1..100>?)
# NULL, OR a numeric vector of length 1..100 whose elements lie in (0,1] and may be NA8. Demo 1 — every inline form on one function
#' Place a batch of orders.
#'
#' @param symbol (scalar<character>) the `BASE/QUOTE` pair.
#' @param sides (character in c("BUY", "SELL")) one side per order.
#' @param quantities (vector<numeric in ]0, Inf[, 1..500>) positive sizes, up to 500.
#' @param limits (vector<numeric in ]0, Inf[ | NA, 1..500>) limit prices; NA = market.
#' @param leverage (scalar<integer in [1, 125]>?) leverage, or NULL for spot.
#' @param tags (vector<character, 0..>) free-form labels (possibly none).
#' @param tag (scalar<character> | NULL) optional client tag.
#' @param venue (scalar<character in VENUES>) a known venue id.
#' @param tier (scalar<factor in c("retail", "vip")>) account tier.
#' @param dry_run (scalar<logical | NA>) simulate only; NA = use account default.
#' @param not_before (scalar<POSIXct>?) earliest send time, or NULL.
#' @param on_fill (function?) optional fill callback.
#' @param userdata (any?) opaque payload passed back to `on_fill`, or NULL.
#' @param config (list) execution config (a bulleted composite param, S1):
#' - slippage_bps (scalar<numeric in [0, Inf[>) allowed slippage.
#' - retries (scalar<integer in [0, 5]>) max retries.
#' @return (data.table) the acknowledgements (see Demo 2 for nested returns).
#' @export
place_batch <- function(symbol, sides, quantities, limits, leverage = NULL,
tags = character(), tag = NULL, venue, tier, dry_run,
not_before = NULL, on_fill = NULL, userdata = NULL,
config = list()) {
assert_args_place_batch(symbol, sides, quantities, limits, leverage,
tags, tag, venue, tier, dry_run, not_before, on_fill,
userdata, config)
result <- ...
return(assert_return_place_batch(result))
}9. Demo 2 — the kitchen sink: a deeply nested composite return
#' Run a full report.
#'
#' @param symbols (character) one or more `BASE/QUOTE` pairs.
#' @param top_n (scalar<integer in [1, Inf[>) rows to keep per section.
#' @return (list) the report:
#' - **status** (scalar<character in c("ok", "partial", "failed")>) overall outcome.
#' - generated_at (scalar<POSIXct>) when the report was produced.
#' - window (scalar<Date in [as.Date("2000-01-01"), as.Date("2100-01-01")]>) as-of date.
#' - sections (list) one entry per requested view:
#' - matches (data.table) ranked matches:
#' - symbol (character) the pair.
#' - score (numeric in [0, 1]) normalised rank score.
#' - drawdown (numeric in ]-Inf, 0]) worst observed drawdown.
#' - side (factor in c("BUY", "SELL")) order side.
#' - flags (character | NA) label(s), NA where none apply.
#' - extras (list<numeric>) a list-column; each cell a numeric vector.
#' - rejected (data.table | NULL) rows dropped, or NULL if none:
#' - symbol (character) the pair.
#' - reason (scalar<character in c("liquidity", "filter", "error")>) why.
#' - cursor (scalar<character>?) next-page cursor, or NULL at the end.
#' - audit (data.frame) a flat audit log:
#' - at (POSIXct) event time.
#' - level (factor in c("info", "warn", "error")) severity.
#' - message (character) the message.
#' - context (list<any>) an arbitrary list-column (no per-cell check).
#' - diagnostics (list) run diagnostics:
#' - warnings (vector<character, 0..>) messages (possibly none).
#' - retries (scalar<integer in [0, Inf[>) retry count.
#' - timings (list) millisecond timings:
#' - parse_ms (scalar<numeric in [0, Inf[>) parse time.
#' - run_ms (scalar<numeric in [0, Inf[>) run time.
#' - per_source (data.table) a row per source:
#' - source (character) source id.
#' - ms (numeric in [0, Inf[) time for that source.
#' @export
report <- function(symbols, top_n) {
assert_args_report(symbols, top_n)
result <- ...
return(assert_return_report(result))
}10. Demo 3 — an abstract R6 class enforcing every return kind
#' @title AbstractStore
#' @description A store contract; subclasses implement `.impl_*`. Each public
#' method validates its inputs and its return from the documented types.
AbstractStore <- R6::R6Class(
"AbstractStore",
public = list(
#' @description Fetch records by key.
#' @param keys (character) keys to fetch.
#' @param limit (scalar<integer in [1, Inf[>?) optional max rows.
#' @return (data.table) the records:
#' - key (character) the key.
#' - value (numeric | NA) the value, NA if unset.
#' - updated_at (POSIXct) last write time.
get = function(keys, limit = NULL) {
assert_args_AbstractStore__get(keys, limit)
return(assert_return_AbstractStore__get(private$.impl_get(keys, limit)))
},
#' @description Count records.
#' @return (scalar<integer in [0, Inf[>) the count.
count = function() {
return(assert_return_AbstractStore__count(private$.impl_count()))
},
#' @description Write one record; returns self for chaining.
#' @param key (scalar<character>) the key.
#' @param value (scalar<numeric> | NULL) the value, or NULL to clear it.
#' @return (class<AbstractStore>) self.
put = function(key, value) {
assert_args_AbstractStore__put(key, value)
private$.impl_put(key, value)
return(invisible(assert_return_AbstractStore__put(self)))
},
#' @description Fetch one record, or NULL if absent.
#' @param key (scalar<character>) the key.
#' @return (list | NULL) the record, or NULL:
#' - key (scalar<character>) the key.
#' - value (scalar<numeric | NA>) the value.
find = function(key) {
assert_args_AbstractStore__find(key)
return(assert_return_AbstractStore__find(private$.impl_find(key)))
}
),
private = list(
.impl_get = function(keys, limit) stop("not implemented"),
.impl_count = function() stop("not implemented"),
.impl_put = function(key, value) stop("not implemented"),
.impl_find = function(key) stop("not implemented")
)
)11. Demo 4 — corner cases and tricky combinations
# union mixing a constrained vector, NA permission, and NULL
(numeric in [0, 1] | NA | character?)
# read: NULL, OR (a numeric vector in [0,1], NAs allowed) OR (a character vector)
# (the `?` is the slot tail, INSIDE the parens — never `(...)?`)
# scalar union, nullable
(scalar<integer in [1, 6]> | scalar<character in c("d6")>?)
# length-pinned, NA-allowed, range-constrained, nullable
(vector<numeric in [-1, 1] | NA, 3>?)
# set from a bare constant, as a vector
(character in CURRENCIES)
# reference / wildcard, nullable / unioned
(function?) # a function, or NULL
(class<Engine> | NULL) # an Engine, or NULL
(class<Reader> | class<Writer>) # either class
(any) # truly polymorphic argument, unchecked
# a data.table with list-columns (type the column list<T>)
(data.table)
# - id (character) identifier.
# - tags (list<character>) a list-column; each cell a character vector.
# - blob (list<any>) a list-column of arbitrary cells (no per-cell check).
# nested list whose only field is a nullable nested table
(list)
# - page (scalar<integer in [1, Inf[>) page number.
# - rows (data.table | NULL) the page, or NULL when empty:
# - id (character) identifier.
# - amount (numeric in ]0, Inf[ | NA) positive amount, NA if pending.
# --- intentionally INVALID ---
# (scalar<numeric, 1>) # scalar takes no length
# (scalar<numeric> | NA) # | NA must be inside <>: use (scalar<numeric | NA>)
# (numeric | NULL?) # pick one nullability marker, not both
# (numeric)? # '?' goes inside the parens: write (numeric?)
# (complex in [0, 1]) # interval on a non-ordered type
# (character in [0, 1]) # interval on a non-ordered type (use a set)
# (integer in [0.5, 2.5]) # fractional bounds on integer
# (integer in c(1, 2, 3)) # S2: integer set needs the L suffix: c(1L, 2L, 3L)
# (character in c(1, 2)) # S2: character set elements must be string literals
# (Date in [0, 1]) # S2: a Date bound must be of class Date, not a bare number
# (numeric in [Inf, 0]) # Inf is the HIGH sentinel only; -Inf the low
# (numeric in ]-Inf, Inf[) # both-sentinel: imposes no bound (drop the `in [..]`)
# (numeric in ]1, 1[) # S4: empty / reversed interval (never satisfiable)
# (logical in c(TRUE, FALSE)) # logical takes no set (degenerate)
# (complex in c(0+0i, 1+0i)) # complex takes no set
# (raw in OPCODES) # raw takes no set
# (raw | NA) # raw has no NA representation
# (any in c(1, 2)) # any takes no constraint
# (any | NA) # any takes no | NA
# (scalar<function>) # function is bare: write (function)
# (vector<function, 3>) # function is a length-1 reference (scope choice, §12)
# (class | NULL) # class must name a class: class<Name>
# (class<lubridate::Duration>) # class<> names ONE class, not pkg::Class
# (R6<Engine>) # R6<> was removed; write class<Engine>
# (data.table | NA) # | NA is element-level; not valid on a composite
# (data.table | data.frame): ... # S1: bullets need a single bare composite
# (vector<numeric>) # vector<> requires a length — use bare (numeric)
# (list<numeric>): ... # S1: list<T> is a leaf and takes no bullets
# (promise) # promise must name its resolved type: promise<T>
# (numeric | promise<character>) # S5: a promise union must resolve to one type
# (list<promise<numeric>>) # promise<T> is a whole-slot value, not a list element
# (scalar<promise<numeric>>) # scalar<>/vector<> wrap an atom or `any`, not a promise12. Non-goals (deliberately inexpressible)
Out of scope for the current grammar — express these in prose or a
hand-written check. (Homogeneous/element-typed collections, lists of
callbacks, and columns of model objects are no longer
here — they are now list<T>, e.g.
list<scalar<numeric>>,
list<function>,
list<class<Model>>.)
-
Composite cardinality — “a
data.tablewith 1..N rows”, “alist<T>of exactly 3 elements”. Length applies only to atomic/anyvector<>; alist<T>is unbounded in length. -
A vector of reference types —
vector<function>/vector<class<...>>are rejected (references are length-1). A collection of them is fine and expressible:list<function>,list<class<Engine>>. -
Positional / unnamed records —
listbullets describe named fields; a positional (unnamed) record is out of scope. (A homogeneous unnamed list islist<T>.) -
A factor’s declared
levels()— a set checks realised values, not the level schema (footnote 2). -
A list of un-resolved promises —
list<promise<T>>is rejected: each element would be an unresolved promise, which can’t be validated synchronously, and roxyassert never emits the per-elementthen()wiring (S5). Await the promises (e.g.promises::promise_all) and annotate the result aspromise<list<T>>. (If there’s real demand, a future version could relax this.) -
Verifying that
class<Name>names a real class — roxyassert emitsassert_class(x, "Name")blindly, so a typo (class<Duraton>) generates without complaint and fails only at runtime; there is nodocument()-time check that the class exists. -
Refining a named type at the use site — a
@typereference (S6) expands to its definition as-is;(Price in [0, 1])on a namedPriceis not allowed. Put the refinement in the@type, define another@type, or write the refined type inline. Cross-package named types are also out of scope —@typeis package-local. -
Renaming a derived column — a
derivedtype (S7,extends) can add, override,pickandomitcolumns, but cannot rename an inherited column (there is noold -> newoperator). Useomit+ re-add under the new name. A renamed column is, for assertion purposes, a different record. -
Generic / parameterized types — a
@typetemplate over a type variable (@type Paged<T>instantiated asPaged<Order>) is out of scope. The<…>syntax only applies a known container (list<T>,promise<T>,class<T>) to a concrete type; it does not define a type parameter.
13. @noassert — document a type without enforcing
it
roxyassert couples documentation and enforcement: a
(type) on a @param both renders in the help
page and generates a check. @noassert
decouples them — the type is still shown, but no check is generated —
for a parameter that a hand-written guard already validates (so the
generated check would be redundant, or would pre-empt the guard’s better
error message).
-
@noassert <names>exempts the named parameters (comma- or space-separated); their(type)still renders, but noassert_*is emitted for them. - A bare
@noassertmakes the whole function (or R6 method) documented-only. - Naming a parameter that is not documented is an error.
- Exempted parameters are still parsed and validated (a malformed type is still caught) — only their code generation is skipped.
- Works for plain functions and R6 methods.
#' @param symbol (scalar<character>) a normalised BASE/QUOTE pair.
#' @noassert symbol
#' ...
ticker = function(symbol) {
assert_normalised_symbol(symbol) # the guard enforces; symbol's type is doc-only
...
}