diff mbox series

[lttng-tools] Fix: test code assumes that child process is schedule to run before parent

Message ID 20210331185624.406960-1-wallinux@gmail.com
State New
Headers show
Series [lttng-tools] Fix: test code assumes that child process is schedule to run before parent | expand

Commit Message

Anders Wallin March 31, 2021, 6:56 p.m. UTC
the following tests fails on arm64
- test_event_vpid_tracker ust 0 "${EVENT_NAME}"
- test_event_vpid_track_untrack ust 0 "${EVENT_NAME}"
- test_event_pid_tracker ust 0 "${EVENT_NAME}"
- test_event_pid_track_untrack ust 0 "${EVENT_NAME}"

Signed-off-by: Anders Wallin <wallinux at gmail.com>
---
 tests/regression/tools/tracker/test_event_tracker | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jonathan Rajotte-Julien March 31, 2021, 7:25 p.m. UTC | #1
> #
> # SPDX-License-Identifier: GPL-2.0-only
> 
> -TEST_DESC="LTTng - Event traker test"
> +TEST_DESC="LTTng - Event tracker test"
> 
> CURDIR=$(dirname "$0")/
> TESTDIR="$CURDIR/../../.."
> @@ -42,6 +42,8 @@ function prepare_ust_app
> 
> 	$TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a "$AFTER_FIRST_PATH" -b
> 	"$BEFORE_LAST_PATH" &
> 	CHILD_PID=$!
> +	# voluntary context switch to start $TESTAPP_BIN
> +	sleep 0.1

A wait on the $AFTER_FIRST_PATH file would be probably more deterministic than a sleep here.

  while [ ! -f "${AFTER_FIRST_PATH}" ]; do
          sleep 0.1
  done

I would also expect something similar for the `prepare_kernel_app` function considering the same race is most probably present and simply not triggered by a chance of luck.
Seems like gen-kernel-test-events does not expose the same sync capabilities here, please use gen-ust-events as an example of how it is done.

Let us know how your testing goes.

Thanks
Anders Wallin March 31, 2021, 9:09 p.m. UTC | #2
Hi Julian,

Neither mine "sleep 0.1" or your version with "while [! -f ............"
are race condition free.
I suggest that we add an option to gen-ust-events to wait before the first
event is generated.
gen_kernel_test_events already have this functionality to wait before the
first event.

Something like this
static struct option long_options[] =
{
/* These options set a flag. */
{"iter", required_argument, 0, 'i'},
{"wait", required_argument, 0, 'w'},
{"sync-after-first-event", required_argument, 0, 'a'},
{"sync-before-last-event", required_argument, 0, 'b'},
{"sync-before-last-event-touch", required_argument, 0, 'c'},
{"sync-before-exit", required_argument, 0, 'd'},
{"sync-before-exit-touch", required_argument, 0, 'e'},
*+ {"sync-before-first-event", required_argument, 0, 'f'},*

{0, 0, 0, 0}
};
....

I will create one or more patches for this tomorrow

Anders Wallin


On Wed, Mar 31, 2021 at 9:25 PM Jonathan Rajotte-Julien <
jonathan.rajotte-julien at efficios.com> wrote:

> > #
> > # SPDX-License-Identifier: GPL-2.0-only
> >
> > -TEST_DESC="LTTng - Event traker test"
> > +TEST_DESC="LTTng - Event tracker test"
> >
> > CURDIR=$(dirname "$0")/
> > TESTDIR="$CURDIR/../../.."
> > @@ -42,6 +42,8 @@ function prepare_ust_app
> >
> >       $TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a "$AFTER_FIRST_PATH" -b
> >       "$BEFORE_LAST_PATH" &
> >       CHILD_PID=$!
> > +     # voluntary context switch to start $TESTAPP_BIN
> > +     sleep 0.1
>
> A wait on the $AFTER_FIRST_PATH file would be probably more deterministic
> than a sleep here.
>
>   while [ ! -f "${AFTER_FIRST_PATH}" ]; do
>           sleep 0.1
>   done
>
> I would also expect something similar for the `prepare_kernel_app`
> function considering the same race is most probably present and simply not
> triggered by a chance of luck.
> Seems like gen-kernel-test-events does not expose the same sync
> capabilities here, please use gen-ust-events as an example of how it is
> done.
>
> Let us know how your testing goes.
>
> Thanks
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20210331/e03d9f30/attachment.htm>
Jonathan Rajotte-Julien March 31, 2021, 9:33 p.m. UTC | #3
Hi,

On Wed, Mar 31, 2021 at 11:09:42PM +0200, Anders Wallin wrote:
> Hi Julian,

You can use Jonathan. ;)

> 
> Neither mine "sleep 0.1" or your version with "while [! -f ............"
> are race condition free.

I might be missing something here but as far as I understand the race you are
trying to mitigate is that the test script execute/continue before the `backgrounded`
command (background test app) had time to execute, right?

If so at least waiting for the app to create a file is necessary. Now
gen_kernel_test_events does not have this functionality. The PATH_WAIT_FILE is
used to control when the testapp can continue. Hence the script still cannot
know if the app have been scheduled.

Now based on the test case you might need more synchronization for the test
cases.

Note that in the ust cases, the trace_ust_app uses `touch "$BEFORE_LAST_PATH"`
that effectively unblock the app and allows it to perform the last tracepoint
hit and the we `wait` on the background process.

Note: some tests case are a bit clever and uses "trace_"$domain"_app" instead of
calling trace_ust_app directly.

For these tests case it seems that we are only expecting at least a single event
matching the event name under test. Here the last tracepoint hit should satisfy
this criteria.

Am I missing a race?

Cheers


> I suggest that we add an option to gen-ust-events to wait before the first
> event is generated.
> gen_kernel_test_events already have this functionality to wait before the
> first event.
> 
> Something like this
> static struct option long_options[] =
> {
> /* These options set a flag. */
> {"iter", required_argument, 0, 'i'},
> {"wait", required_argument, 0, 'w'},
> {"sync-after-first-event", required_argument, 0, 'a'},
> {"sync-before-last-event", required_argument, 0, 'b'},
> {"sync-before-last-event-touch", required_argument, 0, 'c'},
> {"sync-before-exit", required_argument, 0, 'd'},
> {"sync-before-exit-touch", required_argument, 0, 'e'},
> *+ {"sync-before-first-event", required_argument, 0, 'f'},*
> 
> {0, 0, 0, 0}
> };
> ....
> 
> I will create one or more patches for this tomorrow
> 
> Anders Wallin
> 
> 
> On Wed, Mar 31, 2021 at 9:25 PM Jonathan Rajotte-Julien <
> jonathan.rajotte-julien at efficios.com> wrote:
> 
> > > #
> > > # SPDX-License-Identifier: GPL-2.0-only
> > >
> > > -TEST_DESC="LTTng - Event traker test"
> > > +TEST_DESC="LTTng - Event tracker test"
> > >
> > > CURDIR=$(dirname "$0")/
> > > TESTDIR="$CURDIR/../../.."
> > > @@ -42,6 +42,8 @@ function prepare_ust_app
> > >
> > >       $TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a "$AFTER_FIRST_PATH" -b
> > >       "$BEFORE_LAST_PATH" &
> > >       CHILD_PID=$!
> > > +     # voluntary context switch to start $TESTAPP_BIN
> > > +     sleep 0.1
> >
> > A wait on the $AFTER_FIRST_PATH file would be probably more deterministic
> > than a sleep here.
> >
> >   while [ ! -f "${AFTER_FIRST_PATH}" ]; do
> >           sleep 0.1
> >   done
> >
> > I would also expect something similar for the `prepare_kernel_app`
> > function considering the same race is most probably present and simply not
> > triggered by a chance of luck.
> > Seems like gen-kernel-test-events does not expose the same sync
> > capabilities here, please use gen-ust-events as an example of how it is
> > done.
> >
> > Let us know how your testing goes.
> >
> > Thanks
> >
Anders Wallin April 1, 2021, 1:21 a.m. UTC | #4
Hi Jonathan :)

It's very unlikely that the race could occur, BUT it can happen.

OK run
1. test_event_tracker starts gen-ust-events
2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
3. gen-ust-event create first event and create AFTER_FIRST_PATH
4. gen-ust-event continue and create 99 events
5. gen-ust-event wait for  BEFORE_LAST_PATH
6. test_event_track start collecting events (lttng start .....)
7. test_event_tracker calls "lttng track ... -pid "Faulty PID"
8. test_event_tracker touches BEFORE_LAST_PATH
9. gen-ust-events creates the last event
10. test_event finishes and since it tracks the faulty pid the result will
be 0 events == OK

Faulty run
1. test_event_tracker starts gen-ust-events
2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
3. gen-ust-event create first event and create AFTER_FIRST_PATH
4. gen-ust-event gets rescheduled before it has created 99 events, e.g
after 9 events
5. test_event_track start collecting events (lttng start .....)
6. gen-ust-event is rescheduled and starts generating the remaining events.
90 events
7. lttng collects these 90 events since we have not setup "tracking" of PID
yet
8. test_event_tracker calls "lttng track ... -pid "Faulty PID"
9. test_event_tracker touches BEFORE_LAST_PATH
10. gen-ust-events creates the last event
11. test_event finishes and since it tracks the faulty pid the result
should  be 0 events, but we got 90 == FAULTY

We can solve this by;
A: using NR_ITER=2
or
B: by adding a flag to gen-ust-events to wait before sending the first event
1. test_event_tracker starts gen-ust-events
2. test_event_tracker waits for gen-ust-events waits for BEFORE_FIRST_PATH
3. test_event_track start collecting events (lttng start .....)
4. test_event_tracker calls "lttng track ... -pid "Faulty PID"
5. test_event_track creates BEFORE_FIRST_PATH
6. gen_ust_event creates 100 events
7. test_event_tracker waits for gen_event_ust to end
8. test_event finishes and since it tracked the faulty pid the result will
be 0 events == OK

This is in principle how gen-kernel-test-events works (but with different
arguments)
I would suggest to use B since that will be bulletproof

Anders Wallin


On Wed, Mar 31, 2021 at 11:33 PM Jonathan Rajotte-Julien <
jonathan.rajotte-julien at efficios.com> wrote:

> Hi,
>
> On Wed, Mar 31, 2021 at 11:09:42PM +0200, Anders Wallin wrote:
> > Hi Julian,
>
> You can use Jonathan. ;)
>
> >
> > Neither mine "sleep 0.1" or your version with "while [! -f ............"
> > are race condition free.
>
> I might be missing something here but as far as I understand the race you
> are
> trying to mitigate is that the test script execute/continue before the
> `backgrounded`
> command (background test app) had time to execute, right?
>
> If so at least waiting for the app to create a file is necessary. Now
> gen_kernel_test_events does not have this functionality. The
> PATH_WAIT_FILE is
> used to control when the testapp can continue. Hence the script still
> cannot
> know if the app have been scheduled.
>
> Now based on the test case you might need more synchronization for the test
> cases.
>
> Note that in the ust cases, the trace_ust_app uses `touch
> "$BEFORE_LAST_PATH"`
> that effectively unblock the app and allows it to perform the last
> tracepoint
> hit and the we `wait` on the background process.
>
> Note: some tests case are a bit clever and uses "trace_"$domain"_app"
> instead of
> calling trace_ust_app directly.
>
> For these tests case it seems that we are only expecting at least a single
> event
> matching the event name under test. Here the last tracepoint hit should
> satisfy
> this criteria.
>
> Am I missing a race?
>
> Cheers
>
>
> > I suggest that we add an option to gen-ust-events to wait before the
> first
> > event is generated.
> > gen_kernel_test_events already have this functionality to wait before the
> > first event.
> >
> > Something like this
> > static struct option long_options[] =
> > {
> > /* These options set a flag. */
> > {"iter", required_argument, 0, 'i'},
> > {"wait", required_argument, 0, 'w'},
> > {"sync-after-first-event", required_argument, 0, 'a'},
> > {"sync-before-last-event", required_argument, 0, 'b'},
> > {"sync-before-last-event-touch", required_argument, 0, 'c'},
> > {"sync-before-exit", required_argument, 0, 'd'},
> > {"sync-before-exit-touch", required_argument, 0, 'e'},
> > *+ {"sync-before-first-event", required_argument, 0, 'f'},*
> >
> > {0, 0, 0, 0}
> > };
> > ....
> >
> > I will create one or more patches for this tomorrow
> >
> > Anders Wallin
> >
> >
> > On Wed, Mar 31, 2021 at 9:25 PM Jonathan Rajotte-Julien <
> > jonathan.rajotte-julien at efficios.com> wrote:
> >
> > > > #
> > > > # SPDX-License-Identifier: GPL-2.0-only
> > > >
> > > > -TEST_DESC="LTTng - Event traker test"
> > > > +TEST_DESC="LTTng - Event tracker test"
> > > >
> > > > CURDIR=$(dirname "$0")/
> > > > TESTDIR="$CURDIR/../../.."
> > > > @@ -42,6 +42,8 @@ function prepare_ust_app
> > > >
> > > >       $TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a
> "$AFTER_FIRST_PATH" -b
> > > >       "$BEFORE_LAST_PATH" &
> > > >       CHILD_PID=$!
> > > > +     # voluntary context switch to start $TESTAPP_BIN
> > > > +     sleep 0.1
> > >
> > > A wait on the $AFTER_FIRST_PATH file would be probably more
> deterministic
> > > than a sleep here.
> > >
> > >   while [ ! -f "${AFTER_FIRST_PATH}" ]; do
> > >           sleep 0.1
> > >   done
> > >
> > > I would also expect something similar for the `prepare_kernel_app`
> > > function considering the same race is most probably present and simply
> not
> > > triggered by a chance of luck.
> > > Seems like gen-kernel-test-events does not expose the same sync
> > > capabilities here, please use gen-ust-events as an example of how it is
> > > done.
> > >
> > > Let us know how your testing goes.
> > >
> > > Thanks
> > >
>
> --
> Jonathan Rajotte-Julien
> EfficiOS
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20210401/1191c340/attachment-0001.htm>
Jonathan Rajotte-Julien April 1, 2021, 1:45 p.m. UTC | #5
On Thu, Apr 01, 2021 at 03:21:10AM +0200, Anders Wallin wrote:
> Hi Jonathan :)
> 
> It's very unlikely that the race could occur, BUT it can happen.
> 
> OK run
> 1. test_event_tracker starts gen-ust-events
> 2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
> 3. gen-ust-event create first event and create AFTER_FIRST_PATH
> 4. gen-ust-event continue and create 99 events
> 5. gen-ust-event wait for  BEFORE_LAST_PATH
> 6. test_event_track start collecting events (lttng start .....)

> 7. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> 8. test_event_tracker touches BEFORE_LAST_PATH
> 9. gen-ust-events creates the last event
> 10. test_event finishes and since it tracks the faulty pid the result will
> be 0 events == OK

The sequence of event for the `test_event_tracker` function is (as of master):

create
enable event
start
track

launch app
wait for app return
stop
destroy


The sequence you are describing here is:

lauch app
start
track
...

I'm pretty sure we are not talking about the same things. Please specify the
test case and all functions involved and make sure to use the proper name for
each of them.

I suspect you are talking about test_event_pid_tracker. Still let's make sure of
it. If it is, I do agree that it seems to have a window where we can gather
event for.

You might want to look if there is a real reason for this sequence instead of
mimicking test_event_tracker

Current code:
 enable_"$domain"_lttng_event_ok $SESSION_NAME "$wildcard" "$channel"
 prepare_"$domain"_app
 start_lttng_tracing_ok

 if [ "$expect_event" -eq 1 ]; then
         lttng_track_"$domain"_ok "--vpid ${CHILD_PID}"
 else
         lttng_track_"$domain"_ok "--vpid $((CHILD_PID+1))"
 fi
 trace_"$domain"_app
 stop_lttng_tracing_ok
 destroy_lttng_session_ok $SESSION_NAME


We might simply want to move the track command before the start considering have
all the information to do so.

 enable_"$domain"_lttng_event_ok $SESSION_NAME "$wildcard" "$channel"

 prepare_"$domain"_app

 if [ "$expect_event" -eq 1 ]; then
         lttng_track_"$domain"_ok "--vpid ${CHILD_PID}"
 else
         lttng_track_"$domain"_ok "--vpid $((CHILD_PID+1))"
 fi

 start_lttng_tracing_ok
 trace_"$domain"_app

 stop_lttng_tracing_ok
 destroy_lttng_session_ok $SESSION_NAME

After testing this, seems like the validate_trace_empty does not handle the
case were simply stream allocation did not take place since no application was
'valid' at the moment of the trace start.

Well we got a good one here. We will wait for your updated patch and go from
there.

Cheers

> 
> Faulty run
> 1. test_event_tracker starts gen-ust-events
> 2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
> 3. gen-ust-event create first event and create AFTER_FIRST_PATH
> 4. gen-ust-event gets rescheduled before it has created 99 events, e.g
> after 9 events
> 5. test_event_track start collecting events (lttng start .....)
> 6. gen-ust-event is rescheduled and starts generating the remaining events.
> 90 events
> 7. lttng collects these 90 events since we have not setup "tracking" of PID
> yet
> 8. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> 9. test_event_tracker touches BEFORE_LAST_PATH
> 10. gen-ust-events creates the last event
> 11. test_event finishes and since it tracks the faulty pid the result
> should  be 0 events, but we got 90 == FAULTY
> 
> We can solve this by;
> A: using NR_ITER=2
> or
> B: by adding a flag to gen-ust-events to wait before sending the first event
> 1. test_event_tracker starts gen-ust-events
> 2. test_event_tracker waits for gen-ust-events waits for BEFORE_FIRST_PATH
> 3. test_event_track start collecting events (lttng start .....)
> 4. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> 5. test_event_track creates BEFORE_FIRST_PATH
> 6. gen_ust_event creates 100 events
> 7. test_event_tracker waits for gen_event_ust to end
> 8. test_event finishes and since it tracked the faulty pid the result will
> be 0 events == OK
> 
> This is in principle how gen-kernel-test-events works (but with different
> arguments)
> I would suggest to use B since that will be bulletproof
> 
> Anders Wallin
> 
> 
> On Wed, Mar 31, 2021 at 11:33 PM Jonathan Rajotte-Julien <
> jonathan.rajotte-julien at efficios.com> wrote:
> 
> > Hi,
> >
> > On Wed, Mar 31, 2021 at 11:09:42PM +0200, Anders Wallin wrote:
> > > Hi Julian,
> >
> > You can use Jonathan. ;)
> >
> > >
> > > Neither mine "sleep 0.1" or your version with "while [! -f ............"
> > > are race condition free.
> >
> > I might be missing something here but as far as I understand the race you
> > are
> > trying to mitigate is that the test script execute/continue before the
> > `backgrounded`
> > command (background test app) had time to execute, right?
> >
> > If so at least waiting for the app to create a file is necessary. Now
> > gen_kernel_test_events does not have this functionality. The
> > PATH_WAIT_FILE is
> > used to control when the testapp can continue. Hence the script still
> > cannot
> > know if the app have been scheduled.
> >
> > Now based on the test case you might need more synchronization for the test
> > cases.
> >
> > Note that in the ust cases, the trace_ust_app uses `touch
> > "$BEFORE_LAST_PATH"`
> > that effectively unblock the app and allows it to perform the last
> > tracepoint
> > hit and the we `wait` on the background process.
> >
> > Note: some tests case are a bit clever and uses "trace_"$domain"_app"
> > instead of
> > calling trace_ust_app directly.
> >
> > For these tests case it seems that we are only expecting at least a single
> > event
> > matching the event name under test. Here the last tracepoint hit should
> > satisfy
> > this criteria.
> >
> > Am I missing a race?
> >
> > Cheers
> >
> >
> > > I suggest that we add an option to gen-ust-events to wait before the
> > first
> > > event is generated.
> > > gen_kernel_test_events already have this functionality to wait before the
> > > first event.
> > >
> > > Something like this
> > > static struct option long_options[] =
> > > {
> > > /* These options set a flag. */
> > > {"iter", required_argument, 0, 'i'},
> > > {"wait", required_argument, 0, 'w'},
> > > {"sync-after-first-event", required_argument, 0, 'a'},
> > > {"sync-before-last-event", required_argument, 0, 'b'},
> > > {"sync-before-last-event-touch", required_argument, 0, 'c'},
> > > {"sync-before-exit", required_argument, 0, 'd'},
> > > {"sync-before-exit-touch", required_argument, 0, 'e'},
> > > *+ {"sync-before-first-event", required_argument, 0, 'f'},*
> > >
> > > {0, 0, 0, 0}
> > > };
> > > ....
> > >
> > > I will create one or more patches for this tomorrow
> > >
> > > Anders Wallin
> > >
> > >
> > > On Wed, Mar 31, 2021 at 9:25 PM Jonathan Rajotte-Julien <
> > > jonathan.rajotte-julien at efficios.com> wrote:
> > >
> > > > > #
> > > > > # SPDX-License-Identifier: GPL-2.0-only
> > > > >
> > > > > -TEST_DESC="LTTng - Event traker test"
> > > > > +TEST_DESC="LTTng - Event tracker test"
> > > > >
> > > > > CURDIR=$(dirname "$0")/
> > > > > TESTDIR="$CURDIR/../../.."
> > > > > @@ -42,6 +42,8 @@ function prepare_ust_app
> > > > >
> > > > >       $TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a
> > "$AFTER_FIRST_PATH" -b
> > > > >       "$BEFORE_LAST_PATH" &
> > > > >       CHILD_PID=$!
> > > > > +     # voluntary context switch to start $TESTAPP_BIN
> > > > > +     sleep 0.1
> > > >
> > > > A wait on the $AFTER_FIRST_PATH file would be probably more
> > deterministic
> > > > than a sleep here.
> > > >
> > > >   while [ ! -f "${AFTER_FIRST_PATH}" ]; do
> > > >           sleep 0.1
> > > >   done
> > > >
> > > > I would also expect something similar for the `prepare_kernel_app`
> > > > function considering the same race is most probably present and simply
> > not
> > > > triggered by a chance of luck.
> > > > Seems like gen-kernel-test-events does not expose the same sync
> > > > capabilities here, please use gen-ust-events as an example of how it is
> > > > done.
> > > >
> > > > Let us know how your testing goes.
> > > >
> > > > Thanks
> > > >
> >
> > --
> > Jonathan Rajotte-Julien
> > EfficiOS
> >
Anders Wallin April 1, 2021, 3:02 p.m. UTC | #6
Hi,

I think you understood what I meant. The issue I have is for this 4 test
cases in ./regression/tools/tracker/test_event_tracker
- test_event_vpid_tracker ust 0 "${EVENT_NAME}"
- test_event_vpid_track_untrack ust 0 "${EVENT_NAME}"
- test_event_pid_tracker ust 0 "${EVENT_NAME}"
- test_event_pid_track_untrack ust 0 "${EVENT_NAME}"

I have 2 patches ready, but I want to run some more tests first before
posting them

Anders Wallin


On Thu, Apr 1, 2021 at 3:45 PM Jonathan Rajotte-Julien <
jonathan.rajotte-julien at efficios.com> wrote:

> On Thu, Apr 01, 2021 at 03:21:10AM +0200, Anders Wallin wrote:
> > Hi Jonathan :)
> >
> > It's very unlikely that the race could occur, BUT it can happen.
> >
> > OK run
> > 1. test_event_tracker starts gen-ust-events
> > 2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
> > 3. gen-ust-event create first event and create AFTER_FIRST_PATH
> > 4. gen-ust-event continue and create 99 events
> > 5. gen-ust-event wait for  BEFORE_LAST_PATH
> > 6. test_event_track start collecting events (lttng start .....)
>
> > 7. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> > 8. test_event_tracker touches BEFORE_LAST_PATH
> > 9. gen-ust-events creates the last event
> > 10. test_event finishes and since it tracks the faulty pid the result
> will
> > be 0 events == OK
>
> The sequence of event for the `test_event_tracker` function is (as of
> master):
>
> create
> enable event
> start
> track
>
> launch app
> wait for app return
> stop
> destroy
>
>
> The sequence you are describing here is:
>
> lauch app
> start
> track
> ...
>
> I'm pretty sure we are not talking about the same things. Please specify
> the
> test case and all functions involved and make sure to use the proper name
> for
> each of them.
>
> I suspect you are talking about test_event_pid_tracker. Still let's make
> sure of
> it. If it is, I do agree that it seems to have a window where we can gather
> event for.
>
> You might want to look if there is a real reason for this sequence instead
> of
> mimicking test_event_tracker
>
> Current code:
>  enable_"$domain"_lttng_event_ok $SESSION_NAME "$wildcard" "$channel"
>  prepare_"$domain"_app
>  start_lttng_tracing_ok
>
>  if [ "$expect_event" -eq 1 ]; then
>          lttng_track_"$domain"_ok "--vpid ${CHILD_PID}"
>  else
>          lttng_track_"$domain"_ok "--vpid $((CHILD_PID+1))"
>  fi
>  trace_"$domain"_app
>  stop_lttng_tracing_ok
>  destroy_lttng_session_ok $SESSION_NAME
>
>
> We might simply want to move the track command before the start
> considering have
> all the information to do so.
>
>  enable_"$domain"_lttng_event_ok $SESSION_NAME "$wildcard" "$channel"
>
>  prepare_"$domain"_app
>
>  if [ "$expect_event" -eq 1 ]; then
>          lttng_track_"$domain"_ok "--vpid ${CHILD_PID}"
>  else
>          lttng_track_"$domain"_ok "--vpid $((CHILD_PID+1))"
>  fi
>
>  start_lttng_tracing_ok
>  trace_"$domain"_app
>
>  stop_lttng_tracing_ok
>  destroy_lttng_session_ok $SESSION_NAME
>
> After testing this, seems like the validate_trace_empty does not handle the
> case were simply stream allocation did not take place since no application
> was
> 'valid' at the moment of the trace start.
>
> Well we got a good one here. We will wait for your updated patch and go
> from
> there.
>
> Cheers
>
> >
> > Faulty run
> > 1. test_event_tracker starts gen-ust-events
> > 2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
> > 3. gen-ust-event create first event and create AFTER_FIRST_PATH
> > 4. gen-ust-event gets rescheduled before it has created 99 events, e.g
> > after 9 events
> > 5. test_event_track start collecting events (lttng start .....)
> > 6. gen-ust-event is rescheduled and starts generating the remaining
> events.
> > 90 events
> > 7. lttng collects these 90 events since we have not setup "tracking" of
> PID
> > yet
> > 8. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> > 9. test_event_tracker touches BEFORE_LAST_PATH
> > 10. gen-ust-events creates the last event
> > 11. test_event finishes and since it tracks the faulty pid the result
> > should  be 0 events, but we got 90 == FAULTY
> >
> > We can solve this by;
> > A: using NR_ITER=2
> > or
> > B: by adding a flag to gen-ust-events to wait before sending the first
> event
> > 1. test_event_tracker starts gen-ust-events
> > 2. test_event_tracker waits for gen-ust-events waits for
> BEFORE_FIRST_PATH
> > 3. test_event_track start collecting events (lttng start .....)
> > 4. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> > 5. test_event_track creates BEFORE_FIRST_PATH
> > 6. gen_ust_event creates 100 events
> > 7. test_event_tracker waits for gen_event_ust to end
> > 8. test_event finishes and since it tracked the faulty pid the result
> will
> > be 0 events == OK
> >
> > This is in principle how gen-kernel-test-events works (but with different
> > arguments)
> > I would suggest to use B since that will be bulletproof
> >
> > Anders Wallin
> >
> >
> > On Wed, Mar 31, 2021 at 11:33 PM Jonathan Rajotte-Julien <
> > jonathan.rajotte-julien at efficios.com> wrote:
> >
> > > Hi,
> > >
> > > On Wed, Mar 31, 2021 at 11:09:42PM +0200, Anders Wallin wrote:
> > > > Hi Julian,
> > >
> > > You can use Jonathan. ;)
> > >
> > > >
> > > > Neither mine "sleep 0.1" or your version with "while [! -f
> ............"
> > > > are race condition free.
> > >
> > > I might be missing something here but as far as I understand the race
> you
> > > are
> > > trying to mitigate is that the test script execute/continue before the
> > > `backgrounded`
> > > command (background test app) had time to execute, right?
> > >
> > > If so at least waiting for the app to create a file is necessary. Now
> > > gen_kernel_test_events does not have this functionality. The
> > > PATH_WAIT_FILE is
> > > used to control when the testapp can continue. Hence the script still
> > > cannot
> > > know if the app have been scheduled.
> > >
> > > Now based on the test case you might need more synchronization for the
> test
> > > cases.
> > >
> > > Note that in the ust cases, the trace_ust_app uses `touch
> > > "$BEFORE_LAST_PATH"`
> > > that effectively unblock the app and allows it to perform the last
> > > tracepoint
> > > hit and the we `wait` on the background process.
> > >
> > > Note: some tests case are a bit clever and uses "trace_"$domain"_app"
> > > instead of
> > > calling trace_ust_app directly.
> > >
> > > For these tests case it seems that we are only expecting at least a
> single
> > > event
> > > matching the event name under test. Here the last tracepoint hit should
> > > satisfy
> > > this criteria.
> > >
> > > Am I missing a race?
> > >
> > > Cheers
> > >
> > >
> > > > I suggest that we add an option to gen-ust-events to wait before the
> > > first
> > > > event is generated.
> > > > gen_kernel_test_events already have this functionality to wait
> before the
> > > > first event.
> > > >
> > > > Something like this
> > > > static struct option long_options[] =
> > > > {
> > > > /* These options set a flag. */
> > > > {"iter", required_argument, 0, 'i'},
> > > > {"wait", required_argument, 0, 'w'},
> > > > {"sync-after-first-event", required_argument, 0, 'a'},
> > > > {"sync-before-last-event", required_argument, 0, 'b'},
> > > > {"sync-before-last-event-touch", required_argument, 0, 'c'},
> > > > {"sync-before-exit", required_argument, 0, 'd'},
> > > > {"sync-before-exit-touch", required_argument, 0, 'e'},
> > > > *+ {"sync-before-first-event", required_argument, 0, 'f'},*
> > > >
> > > > {0, 0, 0, 0}
> > > > };
> > > > ....
> > > >
> > > > I will create one or more patches for this tomorrow
> > > >
> > > > Anders Wallin
> > > >
> > > >
> > > > On Wed, Mar 31, 2021 at 9:25 PM Jonathan Rajotte-Julien <
> > > > jonathan.rajotte-julien at efficios.com> wrote:
> > > >
> > > > > > #
> > > > > > # SPDX-License-Identifier: GPL-2.0-only
> > > > > >
> > > > > > -TEST_DESC="LTTng - Event traker test"
> > > > > > +TEST_DESC="LTTng - Event tracker test"
> > > > > >
> > > > > > CURDIR=$(dirname "$0")/
> > > > > > TESTDIR="$CURDIR/../../.."
> > > > > > @@ -42,6 +42,8 @@ function prepare_ust_app
> > > > > >
> > > > > >       $TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a
> > > "$AFTER_FIRST_PATH" -b
> > > > > >       "$BEFORE_LAST_PATH" &
> > > > > >       CHILD_PID=$!
> > > > > > +     # voluntary context switch to start $TESTAPP_BIN
> > > > > > +     sleep 0.1
> > > > >
> > > > > A wait on the $AFTER_FIRST_PATH file would be probably more
> > > deterministic
> > > > > than a sleep here.
> > > > >
> > > > >   while [ ! -f "${AFTER_FIRST_PATH}" ]; do
> > > > >           sleep 0.1
> > > > >   done
> > > > >
> > > > > I would also expect something similar for the `prepare_kernel_app`
> > > > > function considering the same race is most probably present and
> simply
> > > not
> > > > > triggered by a chance of luck.
> > > > > Seems like gen-kernel-test-events does not expose the same sync
> > > > > capabilities here, please use gen-ust-events as an example of how
> it is
> > > > > done.
> > > > >
> > > > > Let us know how your testing goes.
> > > > >
> > > > > Thanks
> > > > >
> > >
> > > --
> > > Jonathan Rajotte-Julien
> > > EfficiOS
> > >
>
> --
> Jonathan Rajotte-Julien
> EfficiOS
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20210401/72dbb60d/attachment-0001.htm>
Mathieu Desnoyers April 1, 2021, 4:19 p.m. UTC | #7
----- On Mar 31, 2021, at 2:56 PM, lttng-dev lttng-dev at lists.lttng.org wrote:

> the following tests fails on arm64
> - test_event_vpid_tracker ust 0 "${EVENT_NAME}"
> - test_event_vpid_track_untrack ust 0 "${EVENT_NAME}"
> - test_event_pid_tracker ust 0 "${EVENT_NAME}"
> - test_event_pid_track_untrack ust 0 "${EVENT_NAME}"
> 
> Signed-off-by: Anders Wallin <wallinux at gmail.com>
> ---
> tests/regression/tools/tracker/test_event_tracker | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/regression/tools/tracker/test_event_tracker
> b/tests/regression/tools/tracker/test_event_tracker
> index 711690af..649c7e61 100755
> --- a/tests/regression/tools/tracker/test_event_tracker
> +++ b/tests/regression/tools/tracker/test_event_tracker
> @@ -5,7 +5,7 @@
> #
> # SPDX-License-Identifier: GPL-2.0-only
> 
> -TEST_DESC="LTTng - Event traker test"
> +TEST_DESC="LTTng - Event tracker test"
> 
> CURDIR=$(dirname "$0")/
> TESTDIR="$CURDIR/../../.."
> @@ -42,6 +42,8 @@ function prepare_ust_app
> 
> 	$TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a "$AFTER_FIRST_PATH" -b
> 	"$BEFORE_LAST_PATH" &
> 	CHILD_PID=$!
> +	# voluntary context switch to start $TESTAPP_BIN
> +	sleep 0.1

No, we have been bitten again and again by test issues hidden by sleeps in the
test code. Using sleeps for synchronization is flaky.

I don't know if we documented it, but we as maintainers are strongly against
anything that looks like a delay-based approach to fixing a race in the tests.
This typically just bury the race under the carpet and it shows up only in
specific conditions on the CI workers.

We need to add proper rendez-vous based synchronization to the test if some
is missing.

Adding J?r?mie in CC.

Thanks,

Mathieu


> }
> 
> function trace_ust_app
> --
> 2.31.1
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Mathieu Desnoyers April 1, 2021, 4:21 p.m. UTC | #8
----- On Apr 1, 2021, at 12:19 PM, lttng-dev lttng-dev at lists.lttng.org wrote:

> ----- On Mar 31, 2021, at 2:56 PM, lttng-dev lttng-dev at lists.lttng.org wrote:
> 
>> the following tests fails on arm64
>> - test_event_vpid_tracker ust 0 "${EVENT_NAME}"
>> - test_event_vpid_track_untrack ust 0 "${EVENT_NAME}"
>> - test_event_pid_tracker ust 0 "${EVENT_NAME}"
>> - test_event_pid_track_untrack ust 0 "${EVENT_NAME}"
>> 
>> Signed-off-by: Anders Wallin <wallinux at gmail.com>
>> ---
>> tests/regression/tools/tracker/test_event_tracker | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/tests/regression/tools/tracker/test_event_tracker
>> b/tests/regression/tools/tracker/test_event_tracker
>> index 711690af..649c7e61 100755
>> --- a/tests/regression/tools/tracker/test_event_tracker
>> +++ b/tests/regression/tools/tracker/test_event_tracker
>> @@ -5,7 +5,7 @@
>> #
>> # SPDX-License-Identifier: GPL-2.0-only
>> 
>> -TEST_DESC="LTTng - Event traker test"
>> +TEST_DESC="LTTng - Event tracker test"
>> 
>> CURDIR=$(dirname "$0")/
>> TESTDIR="$CURDIR/../../.."
>> @@ -42,6 +42,8 @@ function prepare_ust_app
>> 
>> 	$TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a "$AFTER_FIRST_PATH" -b
>> 	"$BEFORE_LAST_PATH" &
>> 	CHILD_PID=$!
>> +	# voluntary context switch to start $TESTAPP_BIN
>> +	sleep 0.1
> 
> No, we have been bitten again and again by test issues hidden by sleeps in the
> test code. Using sleeps for synchronization is flaky.
> 
> I don't know if we documented it, but we as maintainers are strongly against
> anything that looks like a delay-based approach to fixing a race in the tests.
> This typically just bury the race under the carpet and it shows up only in
> specific conditions on the CI workers.
> 
> We need to add proper rendez-vous based synchronization to the test if some
> is missing.
> 
> Adding J?r?mie in CC.

And then I read on the rest of the email thread... so it's being taken care of, good! :)

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
> 
>> }
>> 
>> function trace_ust_app
>> --
>> 2.31.1
>> 
>> _______________________________________________
>> lttng-dev mailing list
>> lttng-dev at lists.lttng.org
>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Anders Wallin April 1, 2021, 4:33 p.m. UTC | #9
I agree, new version on the way w/o sleep!
Anders Wallin


On Thu, Apr 1, 2021 at 6:19 PM Mathieu Desnoyers <
mathieu.desnoyers at efficios.com> wrote:

> ----- On Mar 31, 2021, at 2:56 PM, lttng-dev lttng-dev at lists.lttng.org
> wrote:
>
> > the following tests fails on arm64
> > - test_event_vpid_tracker ust 0 "${EVENT_NAME}"
> > - test_event_vpid_track_untrack ust 0 "${EVENT_NAME}"
> > - test_event_pid_tracker ust 0 "${EVENT_NAME}"
> > - test_event_pid_track_untrack ust 0 "${EVENT_NAME}"
> >
> > Signed-off-by: Anders Wallin <wallinux at gmail.com>
> > ---
> > tests/regression/tools/tracker/test_event_tracker | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/tests/regression/tools/tracker/test_event_tracker
> > b/tests/regression/tools/tracker/test_event_tracker
> > index 711690af..649c7e61 100755
> > --- a/tests/regression/tools/tracker/test_event_tracker
> > +++ b/tests/regression/tools/tracker/test_event_tracker
> > @@ -5,7 +5,7 @@
> > #
> > # SPDX-License-Identifier: GPL-2.0-only
> >
> > -TEST_DESC="LTTng - Event traker test"
> > +TEST_DESC="LTTng - Event tracker test"
> >
> > CURDIR=$(dirname "$0")/
> > TESTDIR="$CURDIR/../../.."
> > @@ -42,6 +42,8 @@ function prepare_ust_app
> >
> >       $TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a "$AFTER_FIRST_PATH" -b
> >       "$BEFORE_LAST_PATH" &
> >       CHILD_PID=$!
> > +     # voluntary context switch to start $TESTAPP_BIN
> > +     sleep 0.1
>
> No, we have been bitten again and again by test issues hidden by sleeps in
> the
> test code. Using sleeps for synchronization is flaky.
>
> I don't know if we documented it, but we as maintainers are strongly
> against
> anything that looks like a delay-based approach to fixing a race in the
> tests.
> This typically just bury the race under the carpet and it shows up only in
> specific conditions on the CI workers.
>
> We need to add proper rendez-vous based synchronization to the test if some
> is missing.
>
> Adding J?r?mie in CC.
>
> Thanks,
>
> Mathieu
>
>
> > }
> >
> > function trace_ust_app
> > --
> > 2.31.1
> >
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20210401/c1e2907d/attachment.htm>
diff mbox series

Patch

diff --git a/tests/regression/tools/tracker/test_event_tracker b/tests/regression/tools/tracker/test_event_tracker
index 711690af..649c7e61 100755
--- a/tests/regression/tools/tracker/test_event_tracker
+++ b/tests/regression/tools/tracker/test_event_tracker
@@ -5,7 +5,7 @@ 
 #
 # SPDX-License-Identifier: GPL-2.0-only
 
-TEST_DESC="LTTng - Event traker test"
+TEST_DESC="LTTng - Event tracker test"
 
 CURDIR=$(dirname "$0")/
 TESTDIR="$CURDIR/../../.."
@@ -42,6 +42,8 @@  function prepare_ust_app
 
 	$TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a "$AFTER_FIRST_PATH" -b "$BEFORE_LAST_PATH" &
 	CHILD_PID=$!
+	# voluntary context switch to start $TESTAPP_BIN
+	sleep 0.1
 }
 
 function trace_ust_app