⭐️ If you like this project, give it a star on GitHub! ⭐️

RDMA: Memory Window

LastUpdate: 2026-05-19, Author: HAO022

This blog post is part of a series of foundational technical research articles on AI Infrastructure.

Concept

Basic Concept: A Memory Window (MW) is a type of RDMA resource requested by the user that enables a remote node to access a local memory region. (For this reason, an MW possesses only an R_KEY and has no L_KEY.) Each MW is bound to an already registered Memory Region (MR). Compared to an MR, an MW provides more flexible control over remote access permissions. An MW can be roughly understood as a subset of an MR. A single MR can have multiple MWs carved out of it, and each MW can have its own permission set configured. The relationship is illustrated in the following network diagram:

MR/MW Permission Relationship: When binding a Memory Window, a Consumer can request any combination of remote access rights for the Window. However, if the associated Region does not have local write access enabled and the Consumer requests remote write or remote atomic access for the Window, the Channel Interface must return an error either at bind time or access time.

Use Cases:

  1. To grant and revoke remote access rights to a registered region dynamically, thereby reducing performance penalties.
  2. To grant different remote access rights within the same registered memory region to different remote agents.

Implementation

User-space Implementation: libibverbs API:

struct ibv_mw *ibv_alloc_mw(struct ibv_pd *pd, enum ibv_mw_type type);

struct ibv_mw {
        uint32_t                rkey;
        enum ibv_mw_type        type;
    	...
};

Parameters and Return Values:

  • ibv_mw_type: For IBV_MW_TYPE_1, the MW can only be bound to an MR via the ibv_bind_mw method. For IBV_MW_TYPE_2, binding occurs via ibv_post_send.
  • rkey: The Remote Key generated by the HCA. The kernel driver implementation is straightforward: it assembles the parameters and dispatches the command to the hardware.

Kernel Implementation: mlx5_ib_alloc_mw constructs the MKey Context and submits it to the HCA.

1. MKey Context construction:
   - `mkc.free`
   - `mkc.pd`
   - `mkc.umr_en`
   - `mkc.en_rinval`
2. `mlx5_ib_create_mkey` submits the request and returns the `rkey`.

Memory Window Binding, 1:

// Bind a memory window to a region
int ibv_bind_mw(struct ibv_qp *qp, struct ibv_mw *mw, struct ibv_mw_bind *mw_bind);

struct ibv_mw_bind {
    	uint64_t                     wr_id;           /* User defined WR ID */
    	unsigned int                 send_flags;      /* Use ibv_send_flags */
    	struct ibv_mw_bind_info      bind_info;       /* MW bind information */
}
struct ibv_mw_bind_info {
		struct ibv_mr                *mr;             /* The MR to bind the MW to */
		uint64_t                     addr;            /* The address the MW should start at */
		uint64_t                     length;          /* The length (in bytes) the MW should span */
		unsigned int                 mw_access_flags; /* Access flags to the MW. Use ibv_access_flags */
};

mw_access_flags:

  • IBV_ACCESS_REMOTE_WRITE
  • IBV_ACCESS_REMOTE_READ
  • IBV_ACCESS_REMOTE_ATOMIC
  • IBV_ACCESS_ZERO_BASED
int mlx5_bind_mw(struct ibv_qp *qp, struct ibv_mw *mw, struct ibv_mw_bind *mw_bind)
		...
        // Initialize WR
        struct ibv_send_wr wr;
        wr.opcode = IBV_WR_BIND_MW;
        wr.send_flags = mw_bind->send_flags;
        wr.bind_mw.bind_info = mw_bind->bind_info;
        wr.bind_mw.mw = mw;
        wr.bind_mw.rkey = mw->rkey;

		// Submit bind memory window request via post-send
        _mlx5_post_send(qp, &wr, ...);
        ...

Memory Window Binding, 2: A more flexible approach involves assembling the WQE manually and submitting the request via ibv_post_send.

Observations

  • Must the MW addr fall within the MR’s range? In RDMA programming, a registered Memory Window (MW) must be bound within an already registered Memory Region (MR). It cannot exist independently, nor can it extend beyond the boundaries of the MR. The MR is the fundamental unit for memory registration and address translation (Virtual Address → Physical Address). The RDMA NIC hardware relies on the MR to validate and access memory. The MW itself performs no memory registration; its function is solely to refine access control and scope restrictions on top of the MR. This mechanism is primarily used for the dynamic granting and revocation of remote access permissions. Furthermore, frequent MR operations incur significant performance overhead, whereas MW operations are relatively lightweight.

HUATUO is an operating system observability project open-sourced by DiDi and incubated under the China Computer Federation (CCF).

微信