diff mbox series

[lttng-tools] Fix: consumer-stream: use-after-free of metadata bucket

Message ID 20220302092730.GA26479@axis.com
State New
Headers show
Series [lttng-tools] Fix: consumer-stream: use-after-free of metadata bucket | expand

Commit Message

Vincent Whitchurch March 2, 2022, 9:27 a.m. UTC
On Tue, Mar 01, 2022 at 06:19:23PM +0100, J?r?mie Galarneau wrote:
> Thanks a lot for reporting the problem. If I understand the ASAN
> report correctly, the stream itself will also be double free'd, so
> I don't think this is the complete fix.

Yeah, it looked odd that consumer_stream_destroy() is called recursively
on the same stream but AFAICS the code's been like this for a while so I
assumed it was on purpose, and only the metadata bucket stuff was
relatively new.  ASAN doesn't detect any double frees of the stream
itself, but I guess calling call_rcu(..., free_stream_rcu) twice on the
same stream is not expected behaviour and could lead to other problems.

> There definitely seems to be a problem with regards to the ownership
> of the metadata channel vs stream. Let me look into it.

Great, thank you!

> I see that you fall into a case where the metadata setup fails,
> can you share more info about how this can be reproduced?

In the core dump I received (on v2.12.4), consumer_stream_destroy() was
called from the error label in setup_metadata and ret was set to
LTTCOMM_CONSUMERD_ERROR_METADATA.  So consumer_send_relayd_stream() had
returned an error.  I only had the core dump and no other logs, so I did
not know which of the paths inside consumer_send_relayd_stream() had
failed, but since I was primarily interested in fixing the crash itself
I simply forced this code path to be taken:


With the above patch, I could easily reproduce the use-after-free using
the following steps on the latest release v2.13.4, and it was clear that
this use-after-free was the cause of the original core dump on the older
release too.

Build with ASAN:

 lttng-tools$ LDFLAGS=-fsanitize=address CFLAGS=-fsanitize=address ./configure

Shell #1:

 lttng-ust$ tests/compile/api0/hello/hello 10000

Shell #2:

 lttng-tools$ ASAN_OPTIONS=detect_odr_violation=0 ./src/bin/lttng-sessiond/lttng-sessiond

Shell #3:

 lttng-tools$ export ASAN_OPTIONS=detect_odr_violation=0
 lttng-tools$ ./src/bin/lttng/lttng create --live && ./src/bin/lttng/lttng enable-event --userspace 1 && ./src/bin/lttng/lttng start && sleep 1 && ./src/bin/lttng/lttng stop

The ASAN splat should be seen in shell #2.  Note that you may have to
run the command in shell #3 a couple of times since
LTTNG_CONSUMER_SETUP_METADATA doesn't seem to be sent every time.

Comments

Jérémie Galarneau March 7, 2022, 5:37 p.m. UTC | #1
Hi Vincent,

I had a chance to look into this and came up with the following fix:
https://review.lttng.org/c/lttng-tools/+/7478/4

Would you have a chance to try it on your end before I merge it?

Thanks for the great bug report!
J?r?mie

----- Original Message -----
> From: "Vincent Whitchurch" <vincent.whitchurch at axis.com>
> To: "Jeremie Galarneau" <jeremie.galarneau at efficios.com>
> Cc: "lttng-dev" <lttng-dev at lists.lttng.org>, "kernel" <kernel at axis.com>
> Sent: Wednesday, March 2, 2022 4:27:30 AM
> Subject: Re: [lttng-dev] [PATCH lttng-tools] Fix: consumer-stream: use-after-free of metadata bucket

> On Tue, Mar 01, 2022 at 06:19:23PM +0100, J?r?mie Galarneau wrote:
>> Thanks a lot for reporting the problem. If I understand the ASAN
>> report correctly, the stream itself will also be double free'd, so
>> I don't think this is the complete fix.
> 
> Yeah, it looked odd that consumer_stream_destroy() is called recursively
> on the same stream but AFAICS the code's been like this for a while so I
> assumed it was on purpose, and only the metadata bucket stuff was
> relatively new.  ASAN doesn't detect any double frees of the stream
> itself, but I guess calling call_rcu(..., free_stream_rcu) twice on the
> same stream is not expected behaviour and could lead to other problems.
> 
>> There definitely seems to be a problem with regards to the ownership
>> of the metadata channel vs stream. Let me look into it.
> 
> Great, thank you!
> 
>> I see that you fall into a case where the metadata setup fails,
>> can you share more info about how this can be reproduced?
> 
> In the core dump I received (on v2.12.4), consumer_stream_destroy() was
> called from the error label in setup_metadata and ret was set to
> LTTCOMM_CONSUMERD_ERROR_METADATA.  So consumer_send_relayd_stream() had
> returned an error.  I only had the core dump and no other logs, so I did
> not know which of the paths inside consumer_send_relayd_stream() had
> failed, but since I was primarily interested in fixing the crash itself
> I simply forced this code path to be taken:
> 
> diff --git a/src/common/ust-consumer/ust-consumer.c
> b/src/common/ust-consumer/ust-consumer.c
> index fa1c71299..97ed59632 100644
> --- a/src/common/ust-consumer/ust-consumer.c
> +++ b/src/common/ust-consumer/ust-consumer.c
> @@ -908,8 +908,7 @@ static int setup_metadata(struct lttng_consumer_local_data
> *ctx, uint64_t key)
> 
> 	/* Send metadata stream to relayd if needed. */
> 	if (metadata->metadata_stream->net_seq_idx != (uint64_t) -1ULL) {
> -		ret = consumer_send_relayd_stream(metadata->metadata_stream,
> -				metadata->pathname);
> +		ret = -1;
> 		if (ret < 0) {
> 			ret = LTTCOMM_CONSUMERD_ERROR_METADATA;
> 			goto error;
> 
> With the above patch, I could easily reproduce the use-after-free using
> the following steps on the latest release v2.13.4, and it was clear that
> this use-after-free was the cause of the original core dump on the older
> release too.
> 
> Build with ASAN:
> 
> lttng-tools$ LDFLAGS=-fsanitize=address CFLAGS=-fsanitize=address ./configure
> 
> Shell #1:
> 
> lttng-ust$ tests/compile/api0/hello/hello 10000
> 
> Shell #2:
> 
> lttng-tools$ ASAN_OPTIONS=detect_odr_violation=0
> ./src/bin/lttng-sessiond/lttng-sessiond
> 
> Shell #3:
> 
> lttng-tools$ export ASAN_OPTIONS=detect_odr_violation=0
> lttng-tools$ ./src/bin/lttng/lttng create --live && ./src/bin/lttng/lttng
> enable-event --userspace 1 && ./src/bin/lttng/lttng start && sleep 1 &&
> ./src/bin/lttng/lttng stop
> 
> The ASAN splat should be seen in shell #2.  Note that you may have to
> run the command in shell #3 a couple of times since
> LTTNG_CONSUMER_SETUP_METADATA doesn't seem to be sent every time.
Vincent Whitchurch March 8, 2022, 8:10 a.m. UTC | #2
On Mon, Mar 07, 2022 at 06:37:49PM +0100, J?r?mie Galarneau wrote:
> I had a chance to look into this and came up with the following fix:
> https://review.lttng.org/c/lttng-tools/+/7478/4
> 
> Would you have a chance to try it on your end before I merge it?

I've tested the patch stack in patch set #5 and it does fix the problem
for me too.  Please feel free to add this if you like:

 Tested-by: Vincent Whitchurch <vincent.whitchurch at axis.com>

(By the way, I noticed that patch(1) gets confused by the reproduction
 patch which is part of the commit message.  This probably shouldn't
 matter much though, I only noticed it when I tried to revert the patch
 with git show | patch -p1 -R.)

> Thanks for the great bug report!

Thank you for the fix!
diff mbox series

Patch

diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c
index fa1c71299..97ed59632 100644
--- a/src/common/ust-consumer/ust-consumer.c
+++ b/src/common/ust-consumer/ust-consumer.c
@@ -908,8 +908,7 @@  static int setup_metadata(struct lttng_consumer_local_data *ctx, uint64_t key)
 
 	/* Send metadata stream to relayd if needed. */
 	if (metadata->metadata_stream->net_seq_idx != (uint64_t) -1ULL) {
-		ret = consumer_send_relayd_stream(metadata->metadata_stream,
-				metadata->pathname);
+		ret = -1;
 		if (ret < 0) {
 			ret = LTTCOMM_CONSUMERD_ERROR_METADATA;
 			goto error;