diff mbox series

[lttng-tools] Fix: cleanup stream on snapshot failure

Message ID 20220530141021.267219-1-marcel.hamer@windriver.com
State New
Headers show
Series [lttng-tools] Fix: cleanup stream on snapshot failure | expand

Commit Message

Marcel Hamer May 30, 2022, 2:10 p.m. UTC
When a channel snapshot creation fails the stream should be cleaned up
properly. If the stream is not closed and cleaned properly on a failure,
the next time a snapshot is created an assert is triggered for:

	assert(!stream->trace_chunk);

inside the snapshot_channel function. Since the stream->trace_chunk was
not reset to NULL. The reset to NULL happens inside the
consumer_stream_close function.

Fixes #1352

Signed-off-by: Marcel Hamer <marcel.hamer at windriver.com>
---
 src/common/ust-consumer/ust-consumer.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Jonathan Rajotte-Julien May 30, 2022, 3:27 p.m. UTC | #1
Hi Marcel,

Thanks for sending this patch.

Looks sensible to me, still do you have a reproducer for it? I went back to bug 1352 and even with https://bugs.lttng.org/attachments/546 was unable to force the assert failure.

Cheers

----- Original Message -----
> From: "Marcel Hamer via lttng-dev" <lttng-dev at lists.lttng.org>
> To: "lttng-dev" <lttng-dev at lists.lttng.org>
> Sent: Monday, 30 May, 2022 10:10:21
> Subject: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure

> When a channel snapshot creation fails the stream should be cleaned up
> properly. If the stream is not closed and cleaned properly on a failure,
> the next time a snapshot is created an assert is triggered for:
> 
>	assert(!stream->trace_chunk);
> 
> inside the snapshot_channel function. Since the stream->trace_chunk was
> not reset to NULL. The reset to NULL happens inside the
> consumer_stream_close function.
> 
> Fixes #1352
> 
> Signed-off-by: Marcel Hamer <marcel.hamer at windriver.com>
> ---
> src/common/ust-consumer/ust-consumer.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/src/common/ust-consumer/ust-consumer.c
> b/src/common/ust-consumer/ust-consumer.c
> index f176ca40a..f43216829 100644
> --- a/src/common/ust-consumer/ust-consumer.c
> +++ b/src/common/ust-consumer/ust-consumer.c
> @@ -1147,13 +1147,13 @@ static int snapshot_channel(struct
> lttng_consumer_channel *channel,
> 		if (use_relayd) {
> 			ret = consumer_send_relayd_stream(stream, path);
> 			if (ret < 0) {
> -				goto error_unlock;
> +				goto error_close_stream;
> 			}
> 		} else {
> 			ret = consumer_stream_create_output_files(stream,
> 					false);
> 			if (ret < 0) {
> -				goto error_unlock;
> +				goto error_close_stream;
> 			}
> 			DBG("UST consumer snapshot stream (%" PRIu64 ")",
> 					stream->key);
> @@ -1170,19 +1170,19 @@ static int snapshot_channel(struct
> lttng_consumer_channel *channel,
> 		ret = lttng_ustconsumer_take_snapshot(stream);
> 		if (ret < 0) {
> 			ERR("Taking UST snapshot");
> -			goto error_unlock;
> +			goto error_close_stream;
> 		}
> 
> 		ret = lttng_ustconsumer_get_produced_snapshot(stream, &produced_pos);
> 		if (ret < 0) {
> 			ERR("Produced UST snapshot position");
> -			goto error_unlock;
> +			goto error_close_stream;
> 		}
> 
> 		ret = lttng_ustconsumer_get_consumed_snapshot(stream, &consumed_pos);
> 		if (ret < 0) {
> 			ERR("Consumerd UST snapshot position");
> -			goto error_unlock;
> +			goto error_close_stream;
> 		}
> 
> 		/*
> --
> 2.25.1
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Marcel Hamer May 31, 2022, 11:28 a.m. UTC | #2
Hello Jonathan,

On Mon, May 30, 2022 at 11:27:55AM -0400, Jonathan Rajotte-Julien wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
> 
> Hi Marcel,
> 
> Thanks for sending this patch.
> 
> Looks sensible to me, still do you have a reproducer for it? I went back to bug 1352 and even with https://bugs.lttng.org/attachments/546 was unable to force the assert failure.

I can only reproduce it when running lttng-consumerd in a debugger
environment, in my case gdb. My reproduction scenario is:

1. Setting a breakpoint on snapshot_channel() inside
   src/common/ust-consumer/ust-consumer.c
2. When the breakpoint hits, remove the the complete lttng directory
   containing the session data.
3. Continue the lttng_consumerd process from gdb.
4. In that case you see a negative return value -1 from
   consumer_stream_create_output_files() inside snapshot_channel().
5. Take another snapshot and you will see lttng_consumerd crash because
   of the assert(!stream->trace_chunk); inside snapshot_channel(). This
   last action does not require any breakpoint intervention.

The scenario seems to be very timing sensitive to reproduce. I do not
have a clear command sequence to achieve the same error.

The proposed patch prevents lttng_consumerd from crashing in step 5.

Kind regards,

Marcel

> 
> Cheers
> 
> ----- Original Message -----
> > From: "Marcel Hamer via lttng-dev" <lttng-dev at lists.lttng.org>
> > To: "lttng-dev" <lttng-dev at lists.lttng.org>
> > Sent: Monday, 30 May, 2022 10:10:21
> > Subject: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure
> 
> > When a channel snapshot creation fails the stream should be cleaned up
> > properly. If the stream is not closed and cleaned properly on a failure,
> > the next time a snapshot is created an assert is triggered for:
> >
> >       assert(!stream->trace_chunk);
> >
> > inside the snapshot_channel function. Since the stream->trace_chunk was
> > not reset to NULL. The reset to NULL happens inside the
> > consumer_stream_close function.
> >
> > Fixes #1352
> >
> > Signed-off-by: Marcel Hamer <marcel.hamer at windriver.com>
> > ---
> > src/common/ust-consumer/ust-consumer.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/common/ust-consumer/ust-consumer.c
> > b/src/common/ust-consumer/ust-consumer.c
> > index f176ca40a..f43216829 100644
> > --- a/src/common/ust-consumer/ust-consumer.c
> > +++ b/src/common/ust-consumer/ust-consumer.c
> > @@ -1147,13 +1147,13 @@ static int snapshot_channel(struct
> > lttng_consumer_channel *channel,
> >               if (use_relayd) {
> >                       ret = consumer_send_relayd_stream(stream, path);
> >                       if (ret < 0) {
> > -                             goto error_unlock;
> > +                             goto error_close_stream;
> >                       }
> >               } else {
> >                       ret = consumer_stream_create_output_files(stream,
> >                                       false);
> >                       if (ret < 0) {
> > -                             goto error_unlock;
> > +                             goto error_close_stream;
> >                       }
> >                       DBG("UST consumer snapshot stream (%" PRIu64 ")",
> >                                       stream->key);
> > @@ -1170,19 +1170,19 @@ static int snapshot_channel(struct
> > lttng_consumer_channel *channel,
> >               ret = lttng_ustconsumer_take_snapshot(stream);
> >               if (ret < 0) {
> >                       ERR("Taking UST snapshot");
> > -                     goto error_unlock;
> > +                     goto error_close_stream;
> >               }
> >
> >               ret = lttng_ustconsumer_get_produced_snapshot(stream, &produced_pos);
> >               if (ret < 0) {
> >                       ERR("Produced UST snapshot position");
> > -                     goto error_unlock;
> > +                     goto error_close_stream;
> >               }
> >
> >               ret = lttng_ustconsumer_get_consumed_snapshot(stream, &consumed_pos);
> >               if (ret < 0) {
> >                       ERR("Consumerd UST snapshot position");
> > -                     goto error_unlock;
> > +                     goto error_close_stream;
> >               }
> >
> >               /*
> > --
> > 2.25.1
> >
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Jonathan Rajotte-Julien May 31, 2022, 1:11 p.m. UTC | #3
Hi Marcel,

This is exactly the kind of reproducer we are looking for.

Thanks for providing it. I'll try it out and check if we need anything more in that patch.

Cheers

----- Original Message -----
> From: "Marcel Hamer" <marcel.hamer at windriver.com>
> To: "jonathan rajotte-julien" <jonathan.rajotte-julien at efficios.com>
> Cc: "lttng-dev" <lttng-dev at lists.lttng.org>
> Sent: Tuesday, May 31, 2022 7:28:55 AM
> Subject: Re: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure

> Hello Jonathan,
> 
> On Mon, May 30, 2022 at 11:27:55AM -0400, Jonathan Rajotte-Julien wrote:
>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>> 
>> Hi Marcel,
>> 
>> Thanks for sending this patch.
>> 
>> Looks sensible to me, still do you have a reproducer for it? I went back to bug
>> 1352 and even with https://bugs.lttng.org/attachments/546 was unable to force
>> the assert failure.
> 
> I can only reproduce it when running lttng-consumerd in a debugger
> environment, in my case gdb. My reproduction scenario is:
> 
> 1. Setting a breakpoint on snapshot_channel() inside
>   src/common/ust-consumer/ust-consumer.c
> 2. When the breakpoint hits, remove the the complete lttng directory
>   containing the session data.
> 3. Continue the lttng_consumerd process from gdb.
> 4. In that case you see a negative return value -1 from
>   consumer_stream_create_output_files() inside snapshot_channel().
> 5. Take another snapshot and you will see lttng_consumerd crash because
>   of the assert(!stream->trace_chunk); inside snapshot_channel(). This
>   last action does not require any breakpoint intervention.
> 
> The scenario seems to be very timing sensitive to reproduce. I do not
> have a clear command sequence to achieve the same error.
> 
> The proposed patch prevents lttng_consumerd from crashing in step 5.
> 
> Kind regards,
> 
> Marcel
> 
>> 
>> Cheers
>> 
>> ----- Original Message -----
>> > From: "Marcel Hamer via lttng-dev" <lttng-dev at lists.lttng.org>
>> > To: "lttng-dev" <lttng-dev at lists.lttng.org>
>> > Sent: Monday, 30 May, 2022 10:10:21
>> > Subject: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure
>> 
>> > When a channel snapshot creation fails the stream should be cleaned up
>> > properly. If the stream is not closed and cleaned properly on a failure,
>> > the next time a snapshot is created an assert is triggered for:
>> >
>> >       assert(!stream->trace_chunk);
>> >
>> > inside the snapshot_channel function. Since the stream->trace_chunk was
>> > not reset to NULL. The reset to NULL happens inside the
>> > consumer_stream_close function.
>> >
>> > Fixes #1352
>> >
>> > Signed-off-by: Marcel Hamer <marcel.hamer at windriver.com>
>> > ---
>> > src/common/ust-consumer/ust-consumer.c | 10 +++++-----
>> > 1 file changed, 5 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/src/common/ust-consumer/ust-consumer.c
>> > b/src/common/ust-consumer/ust-consumer.c
>> > index f176ca40a..f43216829 100644
>> > --- a/src/common/ust-consumer/ust-consumer.c
>> > +++ b/src/common/ust-consumer/ust-consumer.c
>> > @@ -1147,13 +1147,13 @@ static int snapshot_channel(struct
>> > lttng_consumer_channel *channel,
>> >               if (use_relayd) {
>> >                       ret = consumer_send_relayd_stream(stream, path);
>> >                       if (ret < 0) {
>> > -                             goto error_unlock;
>> > +                             goto error_close_stream;
>> >                       }
>> >               } else {
>> >                       ret = consumer_stream_create_output_files(stream,
>> >                                       false);
>> >                       if (ret < 0) {
>> > -                             goto error_unlock;
>> > +                             goto error_close_stream;
>> >                       }
>> >                       DBG("UST consumer snapshot stream (%" PRIu64 ")",
>> >                                       stream->key);
>> > @@ -1170,19 +1170,19 @@ static int snapshot_channel(struct
>> > lttng_consumer_channel *channel,
>> >               ret = lttng_ustconsumer_take_snapshot(stream);
>> >               if (ret < 0) {
>> >                       ERR("Taking UST snapshot");
>> > -                     goto error_unlock;
>> > +                     goto error_close_stream;
>> >               }
>> >
>> >               ret = lttng_ustconsumer_get_produced_snapshot(stream, &produced_pos);
>> >               if (ret < 0) {
>> >                       ERR("Produced UST snapshot position");
>> > -                     goto error_unlock;
>> > +                     goto error_close_stream;
>> >               }
>> >
>> >               ret = lttng_ustconsumer_get_consumed_snapshot(stream, &consumed_pos);
>> >               if (ret < 0) {
>> >                       ERR("Consumerd UST snapshot position");
>> > -                     goto error_unlock;
>> > +                     goto error_close_stream;
>> >               }
>> >
>> >               /*
>> > --
>> > 2.25.1
>> >
>> > _______________________________________________
>> > lttng-dev mailing list
>> > lttng-dev at lists.lttng.org
> > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
diff mbox series

Patch

diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c
index f176ca40a..f43216829 100644
--- a/src/common/ust-consumer/ust-consumer.c
+++ b/src/common/ust-consumer/ust-consumer.c
@@ -1147,13 +1147,13 @@  static int snapshot_channel(struct lttng_consumer_channel *channel,
 		if (use_relayd) {
 			ret = consumer_send_relayd_stream(stream, path);
 			if (ret < 0) {
-				goto error_unlock;
+				goto error_close_stream;
 			}
 		} else {
 			ret = consumer_stream_create_output_files(stream,
 					false);
 			if (ret < 0) {
-				goto error_unlock;
+				goto error_close_stream;
 			}
 			DBG("UST consumer snapshot stream (%" PRIu64 ")",
 					stream->key);
@@ -1170,19 +1170,19 @@  static int snapshot_channel(struct lttng_consumer_channel *channel,
 		ret = lttng_ustconsumer_take_snapshot(stream);
 		if (ret < 0) {
 			ERR("Taking UST snapshot");
-			goto error_unlock;
+			goto error_close_stream;
 		}
 
 		ret = lttng_ustconsumer_get_produced_snapshot(stream, &produced_pos);
 		if (ret < 0) {
 			ERR("Produced UST snapshot position");
-			goto error_unlock;
+			goto error_close_stream;
 		}
 
 		ret = lttng_ustconsumer_get_consumed_snapshot(stream, &consumed_pos);
 		if (ret < 0) {
 			ERR("Consumerd UST snapshot position");
-			goto error_unlock;
+			goto error_close_stream;
 		}
 
 		/*