Dev news

Commit 07422c948f4b for kernel

commit 07422c948f4bdf15567a129a0983f7c12e57ba8e
Author: Christian Brauner <brauner@kernel.org>
Date:   Thu Apr 23 11:56:13 2026 +0200

    eventpoll: drop vestigial epi->dying flag

    With ep_remove() now pinning @file via epi_fget() across the
    f_ep clear and hlist_del_rcu(), the dying flag no longer
    orchestrates anything: it was set in eventpoll_release_file()
    (which only runs from __fput(), i.e. after @file's refcount has
    reached zero) and read in __ep_remove() / ep_remove() as a cheap
    bail before attempting the same synchronization epi_fget() now
    provides unconditionally.

    The implication is simple: epi->dying == true always coincides
    with file_ref_get(&file->f_ref) == false, because __fput() is
    reachable only once the refcount hits zero and the refcount is
    monotone in that state. The READ_ONCE(epi->dying) in ep_remove()
    therefore selects exactly the same callers that epi_fget() would
    reject, just one atomic cheaper. That's not worth a struct
    field, a second coordination mechanism, and the comments on
    both.

    Refresh the eventpoll_release_file() comment to describe what
    actually makes the path race-free now (the pin in ep_remove()).
    No functional change: the correctness argument is unchanged,
    only the mechanism is now a single one instead of two.

    Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-10-2470f9eec0f5@kernel.org
    Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index eeaadb000eee..a3090b446af1 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -148,13 +148,6 @@ struct epitem {
 	/* The file descriptor information this item refers to */
 	struct epoll_filefd ffd;

-	/*
-	 * Protected by file->f_lock, true for to-be-released epitem already
-	 * removed from the "struct file" items list; together with
-	 * eventpoll->refcount orchestrates "struct eventpoll" disposal
-	 */
-	bool dying;
-
 	/* List containing poll wait queues */
 	struct eppoll_entry *pwqlist;

@@ -220,10 +213,7 @@ struct eventpoll {
 	struct hlist_head refs;
 	u8 loop_check_depth;

-	/*
-	 * usage count, used together with epitem->dying to
-	 * orchestrate the disposal of this struct
-	 */
+	/* usage count, orchestrates "struct eventpoll" disposal */
 	refcount_t refcount;

 	/* used to defer freeing past ep_get_upwards_depth_proc() RCU walk */
@@ -918,13 +908,10 @@ static void ep_remove(struct eventpoll *ep, struct epitem *epi)

 	ep_unregister_pollwait(ep, epi);

-	/* cheap sync with eventpoll_release_file() */
-	if (unlikely(READ_ONCE(epi->dying)))
-		return;
-
 	/*
 	 * If we manage to grab a reference it means we're not in
-	 * eventpoll_release_file() and aren't going to be.
+	 * eventpoll_release_file() and aren't going to be: once @file's
+	 * refcount has reached zero, file_ref_get() cannot bring it back.
 	 */
 	file = epi_fget(epi);
 	if (!file)
@@ -1126,15 +1113,15 @@ void eventpoll_release_file(struct file *file)
 	struct epitem *epi;

 	/*
-	 * Use the 'dying' flag to prevent a concurrent ep_clear_and_put() from
-	 * touching the epitems list before eventpoll_release_file() can access
-	 * the ep->mtx.
+	 * A concurrent ep_remove() cannot outrace us: it pins @file via
+	 * epi_fget(), which fails once __fput() has dropped the refcount
+	 * to zero -- the path we're on. So any racing ep_remove() bails
+	 * and leaves the epi for us to clean up here.
 	 */
 again:
 	spin_lock(&file->f_lock);
 	if (file->f_ep && file->f_ep->first) {
 		epi = hlist_entry(file->f_ep->first, struct epitem, fllink);
-		WRITE_ONCE(epi->dying, true);
 		spin_unlock(&file->f_lock);

 		/*