Dev news

Commit 02baaa67d9af for kernel

commit 02baaa67d9afc2e56c6e1ac6a1fb1f1dd2be366f
Merge: 8449d3252c26 1dd6c84f1c54
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Dec 3 13:25:39 2025 -0800

    Merge tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

    Pull sched_ext updates from Tejun Heo:

     - Improve recovery from misbehaving BPF schedulers.

       When a scheduler puts many tasks with varying affinity restrictions
       on a shared DSQ, CPUs scanning through tasks they cannot run can
       overwhelm the system, causing lockups.

       Bypass mode now uses per-CPU DSQs with a load balancer to avoid this,
       and hooks into the hardlockup detector to attempt recovery.

       Add scx_cpu0 example scheduler to demonstrate this scenario.

     - Add lockless peek operation for DSQs to reduce lock contention for
       schedulers that need to query queue state during load balancing.

     - Allow scx_bpf_reenqueue_local() to be called from anywhere in
       preparation for deprecating cpu_acquire/release() callbacks in favor
       of generic BPF hooks.

     - Prepare for hierarchical scheduler support: add
       scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs,
       make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in
       structs for future aux__prog parameter.

     - Implement cgroup_set_idle() callback to notify BPF schedulers when a
       cgroup's idle state changes.

     - Fix migration tasks being incorrectly downgraded from
       stop_sched_class to rt_sched_class across sched_ext enable/disable.
       Applied late as the fix is low risk and the bug subtle but needs
       stable backporting.

     - Various fixes and cleanups including cgroup exit ordering,
       SCX_KICK_WAIT reliability, and backward compatibility improvements.

    * tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits)
      sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
      sched_ext: tools: Removing duplicate targets during non-cross compilation
      sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
      sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
      sched_ext: Update comments replacing breather with aborting mechanism
      sched_ext: Implement load balancer for bypass mode
      sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
      sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
      sched_ext: Add scx_cpu0 example scheduler
      sched_ext: Hook up hardlockup detector
      sched_ext: Make handle_lockup() propagate scx_verror() result
      sched_ext: Refactor lockup handlers into handle_lockup()
      sched_ext: Make scx_exit() and scx_vexit() return bool
      sched_ext: Exit dispatch and move operations immediately when aborting
      sched_ext: Simplify breather mechanism with scx_aborting flag
      sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
      sched_ext: Refactor do_enqueue_task() local and global DSQ paths
      sched_ext: Use shorter slice in bypass mode
      sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
      sched_ext: Minor cleanups to scx_task_iter
      ...

diff --cc kernel/sched/ext.c
index 6827689a0966,b563b8c3fd24..05f5a49e9649
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@@ -473,10 -526,9 +526,9 @@@ struct scx_task_iter
   */
  static void scx_task_iter_start(struct scx_task_iter *iter)
  {
- 	BUILD_BUG_ON(__SCX_DSQ_ITER_ALL_FLAGS &
- 		     ((1U << __SCX_DSQ_LNODE_PRIV_SHIFT) - 1));
+ 	memset(iter, 0, sizeof(*iter));

 -	spin_lock_irq(&scx_tasks_lock);
 +	raw_spin_lock_irq(&scx_tasks_lock);

  	iter->cursor = (struct sched_ext_entity){ .flags = SCX_TASK_CURSOR };
  	list_add(&iter->cursor.tasks_node, &scx_tasks);
@@@ -2342,8 -2436,18 +2436,17 @@@ do_pick_task_scx(struct rq *rq, struct
  	rq_unpin_lock(rq, rf);
  	balance_one(rq, prev);
  	rq_repin_lock(rq, rf);
 -
  	maybe_queue_balance_callback(rq);
- 	if (rq_modified_above(rq, &ext_sched_class))
+
+ 	/*
+ 	 * If any higher-priority sched class enqueued a runnable task on
+ 	 * this rq during balance_one(), abort and return RETRY_TASK, so
+ 	 * that the scheduler loop can restart.
+ 	 *
+ 	 * If @force_scx is true, always try to pick a SCHED_EXT task,
+ 	 * regardless of any higher-priority sched classes activity.
+ 	 */
+ 	if (!force_scx && rq_modified_above(rq, &ext_sched_class))
  		return RETRY_TASK;

  	keep_prev = rq->scx.flags & SCX_RQ_BAL_KEEP;