From 5.15 onwards there seems to be an incompatibility between tickless and non-preemptive options. If I select either one by itself the kernel seems to be stable, if I select both, I get RCU expedited CPU stalls. So this is not so easy to sort out because each of these options by itself triggers a dozen or more other selections so this can not so easily be isolated to a specific bit of code. For now I’m going to go with tickless and voluntary preemption. This seems to suffice for stopping RCU expedited CPU stalls and isn’t really harming efficiency since any job that voluntarily gives up a CPU isn’t that high priority anyway.
As a consequence, I am scheduling a kernel upgrade for next Friday Sept 30th starting at 11pm though I will be installing new kernels on all the machines just not rebooting sooner so if they spontaneously boot they will boot into a new kernel.