Type-Erasing Awaitables

Overview

The any_* wrappers type-erase stream and source concepts so that algorithms can operate on heterogeneous concrete types through a uniform interface. Each wrapper preallocates storage for the type-erased awaitable at construction time, achieving zero steady-state allocation.

The vtable layout depends on how many async operations the wrapper exposes and whether those operations share an await-return type.

Single-Operation: Flat Vtable

When a wrapper exposes exactly one async operation (e.g. any_read_stream with read_some, or any_write_stream with write_some), all function pointers live in a single flat vtable:

// Flat vtable -- 64 bytes, one cache line
struct vtable
{
    void (*construct_awaitable)(...);       // 8
    bool (*await_ready)(void*);            // 8
    std::coroutine_handle<> (*await_suspend)(void*, ...);     // 8
    io_result<size_t> (*await_resume)(void*); // 8
    void (*destroy_awaitable)(void*);      // 8
    size_t awaitable_size;                 // 8
    size_t awaitable_align;                // 8
    void (*destroy)(void*);                // 8
};

The inner awaitable can be constructed in either await_ready or await_suspend, depending on whether the outer awaitable has a short-circuit path.

Construct in await_ready (any_read_stream)

When there is no outer short-circuit, constructing in await_ready lets immediate completions skip await_suspend entirely:

bool await_ready() {
    vt_->construct_awaitable(stream_, cached_awaitable_, buffers);
    awaitable_active_ = true;
    return vt_->await_ready(cached_awaitable_);   // true → no suspend
}

std::coroutine_handle<> await_suspend(std::coroutine_handle<> h, io_env const* env) {
    return vt_->await_suspend(cached_awaitable_, h, env);
}

io_result<size_t> await_resume() {
    auto r = vt_->await_resume(cached_awaitable_);
    vt_->destroy_awaitable(cached_awaitable_);
    awaitable_active_ = false;
    return r;
}

Construct in await_suspend (any_write_stream)

When the outer awaitable has a short-circuit (empty buffers), construction is deferred to await_suspend so the inner awaitable is never created on the fast path:

bool await_ready() const noexcept {
    return buffers_.empty();             // short-circuit, no construct
}

std::coroutine_handle<> await_suspend(std::coroutine_handle<> h, io_env const* env) {
    vt_->construct_awaitable(stream_, cached_awaitable_, buffers);
    awaitable_active_ = true;
    if(vt_->await_ready(cached_awaitable_))
        return h;                        // immediate → resume caller
    return vt_->await_suspend(cached_awaitable_, h, env);
}

io_result<size_t> await_resume() {
    if(!awaitable_active_)
        return {{}, 0};                  // short-circuited
    auto r = vt_->await_resume(cached_awaitable_);
    vt_->destroy_awaitable(cached_awaitable_);
    awaitable_active_ = false;
    return r;
}

Both variants touch the same two cache lines on the hot path.

Multi-Operation: Per-Construct awaitable_ops

When a wrapper exposes more than one async operation, it cannot embed a single set of per-awaitable function pointers in its vtable, because each operation produces a distinct concrete awaitable type. Instead the vtable holds one construct_*_awaitable pointer per operation, and each construct call returns a pointer to a static constexpr ops struct describing the awaitable it just created.

Two sub-cases arise, distinguished by whether the operations share an await-return type:

  • Same await-return type (any_read_source: read_some and read both await-return io_result<std::size_t>). One ops struct (awaitable_ops) serves both operations; both construct_*_awaitable pointers return the same layout.

  • Different await-return types (any_buffer_source: pull await-returns io_result<std::span<const_buffer>> while its synthesized read_some/read await-return io_result<std::size_t>; any_buffer_sink and any_write_sink similarly mix io_result<>, io_result<std::size_t>). These need more than one ops struct (e.g. awaitable_ops plus read_awaitable_ops/write_awaitable_ops/eof_awaitable_ops), and each construct returns the ops matching its result type.

// Per-awaitable dispatch -- one struct per await-return type
struct awaitable_ops
{
    bool (*await_ready)(void*);
    std::coroutine_handle<> (*await_suspend)(void*, ...);
    io_result<size_t> (*await_resume)(void*);
    void (*destroy)(void*);
};

// Vtable -- one construct_* pointer per operation
struct vtable
{
    awaitable_ops const* (*construct_read_some_awaitable)(...);
    awaitable_ops const* (*construct_read_awaitable)(...);
    size_t awaitable_size;
    size_t awaitable_align;
    void (*destroy)(void*);
};

The inner awaitable is constructed in await_suspend. Outer await_ready handles short-circuits (e.g. empty buffers) before the inner type is ever created:

bool await_ready() const noexcept {
    return buffers_.empty();             // short-circuit
}

std::coroutine_handle<> await_suspend(std::coroutine_handle<> h, io_env const* env) {
    active_ops_ = vt_->construct_read_some_awaitable(source_, cached_awaitable_, buffers_);
    if(active_ops_->await_ready(cached_awaitable_))
        return h;                        // immediate → resume caller
    return active_ops_->await_suspend(cached_awaitable_, h, env);
}

io_result<size_t> await_resume() {
    if(!active_ops_)
        return {{}, 0};                  // short-circuited
    auto r = active_ops_->await_resume(cached_awaitable_);
    active_ops_->destroy(cached_awaitable_);
    active_ops_ = nullptr;
    return r;
}

Cache Line Analysis

Immediate completion path — inner await_ready returns true:

Flat (any_read_stream, any_write_stream): 2 cache lines
  LINE 1  object        stream_, vt_, cached_awaitable_, ...
  LINE 2  vtable        construct → await_ready → await_resume → destroy
                         (contiguous, sequential access, prefetch-friendly)

Per-construct ops (any_read_source, any_buffer_source,
                   any_buffer_sink, any_write_sink):  3 cache lines
  LINE 1  object        source_, vt_, cached_awaitable_, active_ops_, ...
  LINE 2  vtable        construct_*_awaitable pointers
  LINE 3  awaitable_ops await_ready → await_suspend → await_resume → destroy
                         (separate .rodata address, defeats spatial prefetch)

The flat layout keeps all per-awaitable function pointers adjacent to construct_awaitable in a single 64-byte structure. The per-construct layout places vtable and the returned awaitable_ops at unrelated addresses in .rodata, adding one cache miss on the hot path.

When to Use Which

Layout Wrappers

Flat vtable (one operation, ops embedded in vtable)

any_read_stream (read_some)
any_write_stream (write_some)

Per-construct ops, single ops struct (multiple operations, all sharing one await-return type)

any_read_source (read_some, read — both io_result<size_t>)

Per-construct ops, multiple ops structs (operations differ in await-return type)

any_buffer_source (pullio_result<span>; synthesized read_some/readio_result<size_t>)
any_buffer_sink (commit/commit_eofio_result<>; write_some/writeio_result<size_t>)
any_write_sink (write_some/writeio_result<size_t>; write_eof()io_result<>)

Why the Flat Layout Cannot Scale

With multiple operations, each construct call produces a different concrete awaitable type. The per-awaitable function pointers (await_ready, await_suspend, await_resume, destroy) must match the type that was constructed. Returning the correct ops pointer from each construct call solves this. Embedding the four function pointers directly in the vtable, as the flat layout does, would require one full set per operation — workable for one operation, unwieldy for four. When the operations share an await-return type (any_read_source) a single ops struct suffices; when they differ (any_buffer_source, any_buffer_sink, any_write_sink) the wrapper carries one ops struct per result type.