Type-Erasing Awaitables
Overview
The any_* wrappers type-erase stream and source concepts so that algorithms can operate on heterogeneous concrete types through a uniform interface. Each wrapper preallocates storage for the type-erased awaitable at construction time, achieving zero steady-state allocation.
The vtable layout depends on how many async operations the wrapper exposes and whether those operations share an await-return type.
Single-Operation: Flat Vtable
When a wrapper exposes exactly one async operation (e.g. any_read_stream with read_some, or any_write_stream with write_some), all function pointers live in a single flat vtable:
// Flat vtable -- 64 bytes, one cache line
struct vtable
{
void (*construct_awaitable)(...); // 8
bool (*await_ready)(void*); // 8
std::coroutine_handle<> (*await_suspend)(void*, ...); // 8
io_result<size_t> (*await_resume)(void*); // 8
void (*destroy_awaitable)(void*); // 8
size_t awaitable_size; // 8
size_t awaitable_align; // 8
void (*destroy)(void*); // 8
};
The inner awaitable can be constructed in either await_ready or await_suspend, depending on whether the outer awaitable has a short-circuit path.
Construct in await_ready (any_read_stream)
When there is no outer short-circuit, constructing in await_ready lets immediate completions skip await_suspend entirely:
bool await_ready() {
vt_->construct_awaitable(stream_, cached_awaitable_, buffers);
awaitable_active_ = true;
return vt_->await_ready(cached_awaitable_); // true → no suspend
}
std::coroutine_handle<> await_suspend(std::coroutine_handle<> h, io_env const* env) {
return vt_->await_suspend(cached_awaitable_, h, env);
}
io_result<size_t> await_resume() {
auto r = vt_->await_resume(cached_awaitable_);
vt_->destroy_awaitable(cached_awaitable_);
awaitable_active_ = false;
return r;
}
Construct in await_suspend (any_write_stream)
When the outer awaitable has a short-circuit (empty buffers), construction is deferred to await_suspend so the inner awaitable is never created on the fast path:
bool await_ready() const noexcept {
return buffers_.empty(); // short-circuit, no construct
}
std::coroutine_handle<> await_suspend(std::coroutine_handle<> h, io_env const* env) {
vt_->construct_awaitable(stream_, cached_awaitable_, buffers);
awaitable_active_ = true;
if(vt_->await_ready(cached_awaitable_))
return h; // immediate → resume caller
return vt_->await_suspend(cached_awaitable_, h, env);
}
io_result<size_t> await_resume() {
if(!awaitable_active_)
return {{}, 0}; // short-circuited
auto r = vt_->await_resume(cached_awaitable_);
vt_->destroy_awaitable(cached_awaitable_);
awaitable_active_ = false;
return r;
}
Both variants touch the same two cache lines on the hot path.
Multi-Operation: Per-Construct awaitable_ops
When a wrapper exposes more than one async operation, it cannot embed a single set of per-awaitable function pointers in its vtable, because each operation produces a distinct concrete awaitable type. Instead the vtable holds one construct_*_awaitable pointer per operation, and each construct call returns a pointer to a static constexpr ops struct describing the awaitable it just created.
Two sub-cases arise, distinguished by whether the operations share an await-return type:
-
Same await-return type (
any_read_source:read_someandreadboth await-returnio_result<std::size_t>). One ops struct (awaitable_ops) serves both operations; bothconstruct_*_awaitablepointers return the same layout. -
Different await-return types (
any_buffer_source:pullawait-returnsio_result<std::span<const_buffer>>while its synthesizedread_some/readawait-returnio_result<std::size_t>;any_buffer_sinkandany_write_sinksimilarly mixio_result<>,io_result<std::size_t>). These need more than one ops struct (e.g.awaitable_opsplusread_awaitable_ops/write_awaitable_ops/eof_awaitable_ops), and eachconstructreturns the ops matching its result type.
// Per-awaitable dispatch -- one struct per await-return type
struct awaitable_ops
{
bool (*await_ready)(void*);
std::coroutine_handle<> (*await_suspend)(void*, ...);
io_result<size_t> (*await_resume)(void*);
void (*destroy)(void*);
};
// Vtable -- one construct_* pointer per operation
struct vtable
{
awaitable_ops const* (*construct_read_some_awaitable)(...);
awaitable_ops const* (*construct_read_awaitable)(...);
size_t awaitable_size;
size_t awaitable_align;
void (*destroy)(void*);
};
The inner awaitable is constructed in await_suspend. Outer await_ready handles short-circuits (e.g. empty buffers) before the inner type is ever created:
bool await_ready() const noexcept {
return buffers_.empty(); // short-circuit
}
std::coroutine_handle<> await_suspend(std::coroutine_handle<> h, io_env const* env) {
active_ops_ = vt_->construct_read_some_awaitable(source_, cached_awaitable_, buffers_);
if(active_ops_->await_ready(cached_awaitable_))
return h; // immediate → resume caller
return active_ops_->await_suspend(cached_awaitable_, h, env);
}
io_result<size_t> await_resume() {
if(!active_ops_)
return {{}, 0}; // short-circuited
auto r = active_ops_->await_resume(cached_awaitable_);
active_ops_->destroy(cached_awaitable_);
active_ops_ = nullptr;
return r;
}
Cache Line Analysis
Immediate completion path — inner await_ready returns true:
Flat (any_read_stream, any_write_stream): 2 cache lines
LINE 1 object stream_, vt_, cached_awaitable_, ...
LINE 2 vtable construct → await_ready → await_resume → destroy
(contiguous, sequential access, prefetch-friendly)
Per-construct ops (any_read_source, any_buffer_source,
any_buffer_sink, any_write_sink): 3 cache lines
LINE 1 object source_, vt_, cached_awaitable_, active_ops_, ...
LINE 2 vtable construct_*_awaitable pointers
LINE 3 awaitable_ops await_ready → await_suspend → await_resume → destroy
(separate .rodata address, defeats spatial prefetch)
The flat layout keeps all per-awaitable function pointers adjacent to construct_awaitable in a single 64-byte structure. The per-construct layout places vtable and the returned awaitable_ops at unrelated addresses in .rodata, adding one cache miss on the hot path.
When to Use Which
| Layout | Wrappers |
|---|---|
Flat vtable (one operation, ops embedded in vtable) |
|
Per-construct ops, single ops struct (multiple operations, all sharing one await-return type) |
|
Per-construct ops, multiple ops structs (operations differ in await-return type) |
|
Why the Flat Layout Cannot Scale
With multiple operations, each construct call produces a different concrete awaitable type. The per-awaitable function pointers (await_ready, await_suspend, await_resume, destroy) must match the type that was constructed. Returning the correct ops pointer from each construct call solves this. Embedding the four function pointers directly in the vtable, as the flat layout does, would require one full set per operation — workable for one operation, unwieldy for four. When the operations share an await-return type (any_read_source) a single ops struct suffices; when they differ (any_buffer_source, any_buffer_sink, any_write_sink) the wrapper carries one ops struct per result type.