diff mbox series

sched/tracing: fix __trace_sched_switch_state (5.18-rc7+)

Message ID 20220519150257.26136-1-bruce.ashfield@gmail.com
State New
Headers show
Series sched/tracing: fix __trace_sched_switch_state (5.18-rc7+) | expand

Commit Message

Bruce Ashfield May 19, 2022, 3:02 p.m. UTC
From: Bruce Ashfield <bruce.ashfield at gmail.com>

The commit [fix: sched/tracing: Don't re-read p->state when emitting
sched_switch event (v5.18)] was correct, but the kernel changed their
mind with the following commit:

   commit 9c2136be0878c88c53dea26943ce40bb03ad8d8d
   Author: Delyan Kratunov <delyank at fb.com>
   Date:   Wed May 11 18:28:36 2022 +0000

       sched/tracing: Append prev_state to tp args instead

       Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
       sched_switch event, 2022-01-20) added a new prev_state argument to the
       sched_switch tracepoint, before the prev task_struct pointer.

       This reordering of arguments broke BPF programs that use the raw
       tracepoint (e.g. tp_btf programs). The type of the second argument has
       changed and existing programs that assume a task_struct* argument
       (e.g. for bpf_task_storage access) will now fail to verify.

       If we instead append the new argument to the end, all existing programs
       would continue to work and can conditionally extract the prev_state
       argument on supported kernel versions.

       Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20)
       Signed-off-by: Delyan Kratunov <delyank at fb.com>
       Signed-off-by: Peter Zijlstra (Intel) <peterz at infradead.org>
       Acked-by: Steven Rostedt (Google) <rostedt at goodmis.org>
       Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel at fb.com

By reordering the parameters (again) we can get back up and building.

Signed-off-by: Bruce Ashfield <bruce.ashfield at gmail.com>
---

Hi all,

This is more than likely NOT a correct fix, but while working on the
yocto -dev reference kernel, I ran into this build failure against
5.18-rc7.

I didn't see any sign of another fix on the mailing list, so I wanted
to send this in case anyone else runs into the failure and to check
to see if there's a better fix in progress.

Cheers,

Bruce


 include/instrumentation/events/sched.h | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

Comments

Mathieu Desnoyers May 19, 2022, 3:31 p.m. UTC | #1
----- On May 19, 2022, at 11:02 AM, Bruce Ashfield via lttng-dev lttng-dev at lists.lttng.org wrote:

> From: Bruce Ashfield <bruce.ashfield at gmail.com>
> 
> The commit [fix: sched/tracing: Don't re-read p->state when emitting
> sched_switch event (v5.18)] was correct, but the kernel changed their
> mind with the following commit:
> 
>   commit 9c2136be0878c88c53dea26943ce40bb03ad8d8d
>   Author: Delyan Kratunov <delyank at fb.com>
>   Date:   Wed May 11 18:28:36 2022 +0000
> 
>       sched/tracing: Append prev_state to tp args instead
> 
>       Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
>       sched_switch event, 2022-01-20) added a new prev_state argument to the
>       sched_switch tracepoint, before the prev task_struct pointer.
> 
>       This reordering of arguments broke BPF programs that use the raw
>       tracepoint (e.g. tp_btf programs). The type of the second argument has
>       changed and existing programs that assume a task_struct* argument
>       (e.g. for bpf_task_storage access) will now fail to verify.
> 
>       If we instead append the new argument to the end, all existing programs
>       would continue to work and can conditionally extract the prev_state
>       argument on supported kernel versions.
> 
>       Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
>       sched_switch event, 2022-01-20)
>       Signed-off-by: Delyan Kratunov <delyank at fb.com>
>       Signed-off-by: Peter Zijlstra (Intel) <peterz at infradead.org>
>       Acked-by: Steven Rostedt (Google) <rostedt at goodmis.org>
>       Link:
>       https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel at fb.com
> 
> By reordering the parameters (again) we can get back up and building.
> 
> Signed-off-by: Bruce Ashfield <bruce.ashfield at gmail.com>
> ---
> 
> Hi all,
> 
> This is more than likely NOT a correct fix, but while working on the
> yocto -dev reference kernel, I ran into this build failure against
> 5.18-rc7.
> 
> I didn't see any sign of another fix on the mailing list, so I wanted
> to send this in case anyone else runs into the failure and to check
> to see if there's a better fix in progress.

Hi Bruce,

Please see: https://review.lttng.org/c/lttng-modules/+/8045

I was planning to merge it today, unless you have objections.

I notice that in your own patch, you modify the arguments passed to
__trace_sched_switch_state(), althrough I don't see them changing in
the upstream kernel. Am I missing something ?

Thanks,

Mathieu

> 
> Cheers,
> 
> Bruce
> 
> 
> include/instrumentation/events/sched.h | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/include/instrumentation/events/sched.h
> b/include/instrumentation/events/sched.h
> index 339bec9..f9e9c38 100644
> --- a/include/instrumentation/events/sched.h
> +++ b/include/instrumentation/events/sched.h
> @@ -23,8 +23,9 @@
> #if (LTTNG_LINUX_VERSION_CODE >= LTTNG_KERNEL_VERSION(5,18,0))
> 
> static inline long __trace_sched_switch_state(bool preempt,
> -		unsigned int prev_state,
> -		struct task_struct *p)
> +		struct task_struct *p,
> +		struct task_struct *n,
> +		unsigned int prev_state )
> {
>         unsigned int state;
> 
> @@ -356,20 +357,20 @@ LTTNG_TRACEPOINT_EVENT_INSTANCE(sched_wakeup_template,
> sched_wakeup_new,
> LTTNG_TRACEPOINT_EVENT(sched_switch,
> 
> 	TP_PROTO(bool preempt,
> -		unsigned int prev_state,
> 		struct task_struct *prev,
> -		struct task_struct *next),
> +		struct task_struct *next,
> +		unsigned int prev_state),
> 
> -	TP_ARGS(preempt, prev_state, prev, next),
> +	TP_ARGS(preempt, prev, next, prev_state),
> 
> 	TP_FIELDS(
> 		ctf_array_text(char, prev_comm,	prev->comm, TASK_COMM_LEN)
> 		ctf_integer(pid_t, prev_tid, prev->pid)
> 		ctf_integer(int, prev_prio, prev->prio - MAX_RT_PRIO)
> #ifdef CONFIG_LTTNG_EXPERIMENTAL_BITWISE_ENUM
> -		ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt,
> prev_state, prev))
> +		ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt,
> prev, next, prev_state))
> #else
> -		ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev_state,
> prev))
> +		ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev, next,
> prev_state))
> #endif
> 		ctf_array_text(char, next_comm, next->comm, TASK_COMM_LEN)
> 		ctf_integer(pid_t, next_tid, next->pid)
> --
> 2.19.1
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Bruce Ashfield May 19, 2022, 4:01 p.m. UTC | #2
On Thu, May 19, 2022 at 11:31 AM Mathieu Desnoyers
<mathieu.desnoyers at efficios.com> wrote:
>
>
> ----- On May 19, 2022, at 11:02 AM, Bruce Ashfield via lttng-dev lttng-dev at lists.lttng.org wrote:
>
> > From: Bruce Ashfield <bruce.ashfield at gmail.com>
> >
> > The commit [fix: sched/tracing: Don't re-read p->state when emitting
> > sched_switch event (v5.18)] was correct, but the kernel changed their
> > mind with the following commit:
> >
> >   commit 9c2136be0878c88c53dea26943ce40bb03ad8d8d
> >   Author: Delyan Kratunov <delyank at fb.com>
> >   Date:   Wed May 11 18:28:36 2022 +0000
> >
> >       sched/tracing: Append prev_state to tp args instead
> >
> >       Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
> >       sched_switch event, 2022-01-20) added a new prev_state argument to the
> >       sched_switch tracepoint, before the prev task_struct pointer.
> >
> >       This reordering of arguments broke BPF programs that use the raw
> >       tracepoint (e.g. tp_btf programs). The type of the second argument has
> >       changed and existing programs that assume a task_struct* argument
> >       (e.g. for bpf_task_storage access) will now fail to verify.
> >
> >       If we instead append the new argument to the end, all existing programs
> >       would continue to work and can conditionally extract the prev_state
> >       argument on supported kernel versions.
> >
> >       Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
> >       sched_switch event, 2022-01-20)
> >       Signed-off-by: Delyan Kratunov <delyank at fb.com>
> >       Signed-off-by: Peter Zijlstra (Intel) <peterz at infradead.org>
> >       Acked-by: Steven Rostedt (Google) <rostedt at goodmis.org>
> >       Link:
> >       https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel at fb.com
> >
> > By reordering the parameters (again) we can get back up and building.
> >
> > Signed-off-by: Bruce Ashfield <bruce.ashfield at gmail.com>
> > ---
> >
> > Hi all,
> >
> > This is more than likely NOT a correct fix, but while working on the
> > yocto -dev reference kernel, I ran into this build failure against
> > 5.18-rc7.
> >
> > I didn't see any sign of another fix on the mailing list, so I wanted
> > to send this in case anyone else runs into the failure and to check
> > to see if there's a better fix in progress.
>
> Hi Bruce,
>
> Please see: https://review.lttng.org/c/lttng-modules/+/8045
>
> I was planning to merge it today, unless you have objections.
>
> I notice that in your own patch, you modify the arguments passed to
> __trace_sched_switch_state(), althrough I don't see them changing in
> the upstream kernel. Am I missing something ?

No objects here. That is obviously better than mine :) I swear I saw
the order of the arguments changing when I looked at the kernel, and
had a secondary compile error in those calls .. but I very easily could
have been mistaken.

Bruce

>
> Thanks,
>
> Mathieu
>
> >
> > Cheers,
> >
> > Bruce
> >
> >
> > include/instrumentation/events/sched.h | 15 ++++++++-------
> > 1 file changed, 8 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/instrumentation/events/sched.h
> > b/include/instrumentation/events/sched.h
> > index 339bec9..f9e9c38 100644
> > --- a/include/instrumentation/events/sched.h
> > +++ b/include/instrumentation/events/sched.h
> > @@ -23,8 +23,9 @@
> > #if (LTTNG_LINUX_VERSION_CODE >= LTTNG_KERNEL_VERSION(5,18,0))
> >
> > static inline long __trace_sched_switch_state(bool preempt,
> > -             unsigned int prev_state,
> > -             struct task_struct *p)
> > +             struct task_struct *p,
> > +             struct task_struct *n,
> > +             unsigned int prev_state )
> > {
> >         unsigned int state;
> >
> > @@ -356,20 +357,20 @@ LTTNG_TRACEPOINT_EVENT_INSTANCE(sched_wakeup_template,
> > sched_wakeup_new,
> > LTTNG_TRACEPOINT_EVENT(sched_switch,
> >
> >       TP_PROTO(bool preempt,
> > -             unsigned int prev_state,
> >               struct task_struct *prev,
> > -             struct task_struct *next),
> > +             struct task_struct *next,
> > +             unsigned int prev_state),
> >
> > -     TP_ARGS(preempt, prev_state, prev, next),
> > +     TP_ARGS(preempt, prev, next, prev_state),
> >
> >       TP_FIELDS(
> >               ctf_array_text(char, prev_comm, prev->comm, TASK_COMM_LEN)
> >               ctf_integer(pid_t, prev_tid, prev->pid)
> >               ctf_integer(int, prev_prio, prev->prio - MAX_RT_PRIO)
> > #ifdef CONFIG_LTTNG_EXPERIMENTAL_BITWISE_ENUM
> > -             ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt,
> > prev_state, prev))
> > +             ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt,
> > prev, next, prev_state))
> > #else
> > -             ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev_state,
> > prev))
> > +             ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev, next,
> > prev_state))
> > #endif
> >               ctf_array_text(char, next_comm, next->comm, TASK_COMM_LEN)
> >               ctf_integer(pid_t, next_tid, next->pid)
> > --
> > 2.19.1
> >
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
diff mbox series

Patch

diff --git a/include/instrumentation/events/sched.h b/include/instrumentation/events/sched.h
index 339bec9..f9e9c38 100644
--- a/include/instrumentation/events/sched.h
+++ b/include/instrumentation/events/sched.h
@@ -23,8 +23,9 @@ 
 #if (LTTNG_LINUX_VERSION_CODE >= LTTNG_KERNEL_VERSION(5,18,0))
 
 static inline long __trace_sched_switch_state(bool preempt,
-		unsigned int prev_state,
-		struct task_struct *p)
+		struct task_struct *p,
+		struct task_struct *n,
+		unsigned int prev_state )
 {
         unsigned int state;
 
@@ -356,20 +357,20 @@  LTTNG_TRACEPOINT_EVENT_INSTANCE(sched_wakeup_template, sched_wakeup_new,
 LTTNG_TRACEPOINT_EVENT(sched_switch,
 
 	TP_PROTO(bool preempt,
-		unsigned int prev_state,
 		struct task_struct *prev,
-		struct task_struct *next),
+		struct task_struct *next,
+		unsigned int prev_state),
 
-	TP_ARGS(preempt, prev_state, prev, next),
+	TP_ARGS(preempt, prev, next, prev_state),
 
 	TP_FIELDS(
 		ctf_array_text(char, prev_comm,	prev->comm, TASK_COMM_LEN)
 		ctf_integer(pid_t, prev_tid, prev->pid)
 		ctf_integer(int, prev_prio, prev->prio - MAX_RT_PRIO)
 #ifdef CONFIG_LTTNG_EXPERIMENTAL_BITWISE_ENUM
-		ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt, prev_state, prev))
+		ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt, prev, next, prev_state))
 #else
-		ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev_state, prev))
+		ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev, next, prev_state))
 #endif
 		ctf_array_text(char, next_comm, next->comm, TASK_COMM_LEN)
 		ctf_integer(pid_t, next_tid, next->pid)