Dev news

Commit a6dc643c6931 for kernel

commit a6dc643c69311677c574a0f17a3f4d66a5f3744b
Author: Christian Brauner <brauner@kernel.org>
Date:   Thu Apr 23 11:56:09 2026 +0200

    eventpoll: fix ep_remove struct eventpoll / struct file UAF

    ep_remove() (via ep_remove_file()) cleared file->f_ep under
    file->f_lock but then kept using @file inside the critical section
    (is_file_epoll(), hlist_del_rcu() through the head, spin_unlock).
    A concurrent __fput() taking the eventpoll_release() fastpath in
    that window observed the transient NULL, skipped
    eventpoll_release_file() and ran to f_op->release / file_free().

    For the epoll-watches-epoll case, f_op->release is
    ep_eventpoll_release() -> ep_clear_and_put() -> ep_free(), which
    kfree()s the watched struct eventpoll. Its embedded ->refs
    hlist_head is exactly where epi->fllink.pprev points, so the
    subsequent hlist_del_rcu()'s "*pprev = next" scribbles into freed
    kmalloc-192 memory.

    In addition, struct file is SLAB_TYPESAFE_BY_RCU, so the slot
    backing @file could be recycled by alloc_empty_file() --
    reinitializing f_lock and f_ep -- while ep_remove() is still
    nominally inside that lock. The upshot is an attacker-controllable
    kmem_cache_free() against the wrong slab cache.

    Pin @file via epi_fget() at the top of ep_remove() and gate the
    critical section on the pin succeeding. With the pin held @file
    cannot reach refcount zero, which holds __fput() off and
    transitively keeps the watched struct eventpoll alive across the
    hlist_del_rcu() and the f_lock use, closing both UAFs.

    If the pin fails @file has already reached refcount zero and its
    __fput() is in flight. Because we bailed before clearing f_ep,
    that path takes the eventpoll_release() slow path into
    eventpoll_release_file() and blocks on ep->mtx until the waiter
    side's ep_clear_and_put() drops it. The bailed epi's share of
    ep->refcount stays intact, so the trailing ep_refcount_dec_and_test()
    in ep_clear_and_put() cannot free the eventpoll out from under
    eventpoll_release_file(); the orphaned epi is then cleaned up
    there.

    A successful pin also proves we are not racing
    eventpoll_release_file() on this epi, so drop the now-redundant
    re-check of epi->dying under f_lock. The cheap lockless
    READ_ONCE(epi->dying) fast-path bailout stays.

    Fixes: 58c9b016e128 ("epoll: use refcount to reduce ep_mutex contention")
    Reported-by: Jaeyoung Chung <jjy600901@snu.ac.kr>
    Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-6-2470f9eec0f5@kernel.org
    Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 5ee4398a6cb8..0f785c0a1544 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -912,22 +912,26 @@ static bool ep_remove_epi(struct eventpoll *ep, struct epitem *epi)
  */
 static void ep_remove(struct eventpoll *ep, struct epitem *epi)
 {
-	struct file *file = epi->ffd.file;
+	struct file *file __free(fput) = NULL;

 	lockdep_assert_irqs_enabled();
 	lockdep_assert_held(&ep->mtx);

 	ep_unregister_pollwait(ep, epi);

-	/* sync with eventpoll_release_file() */
+	/* cheap sync with eventpoll_release_file() */
 	if (unlikely(READ_ONCE(epi->dying)))
 		return;

-	spin_lock(&file->f_lock);
-	if (epi->dying) {
-		spin_unlock(&file->f_lock);
+	/*
+	 * If we manage to grab a reference it means we're not in
+	 * eventpoll_release_file() and aren't going to be.
+	 */
+	file = epi_fget(epi);
+	if (!file)
 		return;
-	}
+
+	spin_lock(&file->f_lock);
 	ep_remove_file(ep, epi, file);

 	if (ep_remove_epi(ep, epi))