[lttng-tools,v2] Fix: check validity of a stream before invoking ust flush command
diff mbox series

Message ID 20190828203603.5316-1-jonathan.rajotte-julien@efficios.com
State Accepted, archived
Headers show
Series
  • [lttng-tools,v2] Fix: check validity of a stream before invoking ust flush command
Related show

Commit Message

Jonathan Rajotte Aug. 28, 2019, 8:36 p.m. UTC
At the time ustctl_flush_buffer is called the ustream object might have
already been freed on lttng-ust side.

This can happen following a lttng_consumer_cleanup_relayd and concurrent
consumer flush command (lttng stop).

The chain of events goes as follows.

An error on communication with lttng-relayd occurs.
lttng_consumer_cleanup_relayd flags the streams for deletion
(CONSUMER_ENDPOINT_INACTIVE). validate_endpoint_status_data_stream calls
consumer_del_stream.

At the same time the hash table of streams is iterated over in the
flush_channel function following a stop command. The loop is iterating on
a given stream. The current thread is unscheduled before taking the stream
lock.

In the initial thread, the same stream is the current iteration of
cds_lfht_for_each_entry in validate_endpoint_status_data_stream.

consumer_del_stream is called on it. The stream lock is acquired, and
destroy_close_stream is called. lttng_ustconsumer_del_stream is eventually
called and at this point the ustream is freed.

Going back to the iteration in flush_channel. The current stream is still
valid from the point of view of the iteration, ustctl_flush_buffer is then
called on a freed ustream object.

This can lead to unknown behaviour since there is no validation on the
lttng-ust side. The underlying memory of the ustream object is garbage at
this point.

To prevent such scenario, we check for the presence of the node in the
hash table via cds_lfht_is_node_deleted while holding the stream lock.
This is valid because the stream destruction removes the node from
the hash table and frees the ustream object with the stream lock held.

This duplicate similar "validation" check of the stream object. [1][2]

[1] src/common/consumer/consumer.c:consumer_close_channel_streams
[2] src/common/ust-consumer/ust-consumer.c:close_metadata

This issue can be reproduced by the following scenario:

    Modify flush_channel to sleep (i.e 10s) before acquiring the lock on
    a stream.

    Modify lttng-ust ustctl_destroy_stream to set the
    ring_buffer_clock_read callback to NULL.
      Note: An assert on !cds_lfht_is_node_deleted in flush channel
      after acquiring the lock can provide the same information. We are
      modifying the callback to simulate the original backtrace from our
      customer.

    lttng-relayd
    lttng-sessiond
    lttng create --live
    lttng enable-event -u -a
    lttng start
    Start some applications to generate data.
    lttng stop
      The stop command force a flush of the channel/streams.
    pkill -9 lttng-relayd
    Expect assert or segfault

The original customer backtrace:

  0  lib_ring_buffer_try_switch_slow (handle=<optimized out>, tsc=<synthetic pointer>, offsets=0x3fffa9b76c80, chan=0x3fff98006e90, buf=<optimized out>,
     mode=<optimized out>) at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/ring_buffer_frontend.c:1834
  1  lib_ring_buffer_switch_slow (buf=0x3fff98016b40, mode=<optimized out>, handle=0x3fff98017670)
     at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/ring_buffer_frontend.c:1952
  2  0x00003fffac680940 in ustctl_flush_buffer (stream=<optimized out>, producer_active=<optimized out>)
     at /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1568
  3  0x0000000010031bc8 in flush_channel (chan_key=<optimized out>) at ust-consumer.c:772
  4  lttng_ustconsumer_recv_cmd (ctx=<optimized out>, sock=<optimized out>, consumer_sockpoll=<optimized out>) at ust-consumer.c:1651
  5  0x000000001000de50 in lttng_consumer_recv_cmd (ctx=<optimized out>, sock=<optimized out>, consumer_sockpoll=<optimized out>) at consumer.c:2011
  6  0x0000000010014208 in consumer_thread_sessiond_poll (data=0x10079430) at consumer.c:3192
  7  0x00003fffac608b30 in start_thread (arg=0x3fffa9b7bdb0) at pthread_create.c:462
  8  0x00003fffac530d0c in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
---
 src/common/ust-consumer/ust-consumer.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Jérémie Galarneau Sept. 5, 2019, 9:29 p.m. UTC | #1
Merged in master, stable-2.11, stable-2.10, and stable-2.9.

Thanks!
J?r?mie

On Wed, Aug 28, 2019 at 04:36:03PM -0400, Jonathan Rajotte wrote:
> At the time ustctl_flush_buffer is called the ustream object might have
> already been freed on lttng-ust side.
> 
> This can happen following a lttng_consumer_cleanup_relayd and concurrent
> consumer flush command (lttng stop).
> 
> The chain of events goes as follows.
> 
> An error on communication with lttng-relayd occurs.
> lttng_consumer_cleanup_relayd flags the streams for deletion
> (CONSUMER_ENDPOINT_INACTIVE). validate_endpoint_status_data_stream calls
> consumer_del_stream.
> 
> At the same time the hash table of streams is iterated over in the
> flush_channel function following a stop command. The loop is iterating on
> a given stream. The current thread is unscheduled before taking the stream
> lock.
> 
> In the initial thread, the same stream is the current iteration of
> cds_lfht_for_each_entry in validate_endpoint_status_data_stream.
> 
> consumer_del_stream is called on it. The stream lock is acquired, and
> destroy_close_stream is called. lttng_ustconsumer_del_stream is eventually
> called and at this point the ustream is freed.
> 
> Going back to the iteration in flush_channel. The current stream is still
> valid from the point of view of the iteration, ustctl_flush_buffer is then
> called on a freed ustream object.
> 
> This can lead to unknown behaviour since there is no validation on the
> lttng-ust side. The underlying memory of the ustream object is garbage at
> this point.
> 
> To prevent such scenario, we check for the presence of the node in the
> hash table via cds_lfht_is_node_deleted while holding the stream lock.
> This is valid because the stream destruction removes the node from
> the hash table and frees the ustream object with the stream lock held.
> 
> This duplicate similar "validation" check of the stream object. [1][2]
> 
> [1] src/common/consumer/consumer.c:consumer_close_channel_streams
> [2] src/common/ust-consumer/ust-consumer.c:close_metadata
> 
> This issue can be reproduced by the following scenario:
> 
>     Modify flush_channel to sleep (i.e 10s) before acquiring the lock on
>     a stream.
> 
>     Modify lttng-ust ustctl_destroy_stream to set the
>     ring_buffer_clock_read callback to NULL.
>       Note: An assert on !cds_lfht_is_node_deleted in flush channel
>       after acquiring the lock can provide the same information. We are
>       modifying the callback to simulate the original backtrace from our
>       customer.
> 
>     lttng-relayd
>     lttng-sessiond
>     lttng create --live
>     lttng enable-event -u -a
>     lttng start
>     Start some applications to generate data.
>     lttng stop
>       The stop command force a flush of the channel/streams.
>     pkill -9 lttng-relayd
>     Expect assert or segfault
> 
> The original customer backtrace:
> 
>   0  lib_ring_buffer_try_switch_slow (handle=<optimized out>, tsc=<synthetic pointer>, offsets=0x3fffa9b76c80, chan=0x3fff98006e90, buf=<optimized out>,
>      mode=<optimized out>) at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/ring_buffer_frontend.c:1834
>   1  lib_ring_buffer_switch_slow (buf=0x3fff98016b40, mode=<optimized out>, handle=0x3fff98017670)
>      at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/ring_buffer_frontend.c:1952
>   2  0x00003fffac680940 in ustctl_flush_buffer (stream=<optimized out>, producer_active=<optimized out>)
>      at /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1568
>   3  0x0000000010031bc8 in flush_channel (chan_key=<optimized out>) at ust-consumer.c:772
>   4  lttng_ustconsumer_recv_cmd (ctx=<optimized out>, sock=<optimized out>, consumer_sockpoll=<optimized out>) at ust-consumer.c:1651
>   5  0x000000001000de50 in lttng_consumer_recv_cmd (ctx=<optimized out>, sock=<optimized out>, consumer_sockpoll=<optimized out>) at consumer.c:2011
>   6  0x0000000010014208 in consumer_thread_sessiond_poll (data=0x10079430) at consumer.c:3192
>   7  0x00003fffac608b30 in start_thread (arg=0x3fffa9b7bdb0) at pthread_create.c:462
>   8  0x00003fffac530d0c in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> ---
>  src/common/ust-consumer/ust-consumer.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c
> index 94b761cb8..f2e62d0ee 100644
> --- a/src/common/ust-consumer/ust-consumer.c
> +++ b/src/common/ust-consumer/ust-consumer.c
> @@ -764,10 +764,19 @@ static int flush_channel(uint64_t chan_key)
>  		health_code_update();
>  
>  		pthread_mutex_lock(&stream->lock);
> +
> +		/*
> +		 * Protect against concurrent teardown of a stream.
> +		 */
> +		if (cds_lfht_is_node_deleted(&stream->node.node)) {
> +			goto next;
> +		}
> +
>  		if (!stream->quiescent) {
>  			ustctl_flush_buffer(stream->ustream, 0);
>  			stream->quiescent = true;
>  		}
> +next:
>  		pthread_mutex_unlock(&stream->lock);
>  	}
>  error:
> -- 
> 2.17.1
>

Patch
diff mbox series

diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c
index 94b761cb8..f2e62d0ee 100644
--- a/src/common/ust-consumer/ust-consumer.c
+++ b/src/common/ust-consumer/ust-consumer.c
@@ -764,10 +764,19 @@  static int flush_channel(uint64_t chan_key)
 		health_code_update();
 
 		pthread_mutex_lock(&stream->lock);
+
+		/*
+		 * Protect against concurrent teardown of a stream.
+		 */
+		if (cds_lfht_is_node_deleted(&stream->node.node)) {
+			goto next;
+		}
+
 		if (!stream->quiescent) {
 			ustctl_flush_buffer(stream->ustream, 0);
 			stream->quiescent = true;
 		}
+next:
 		pthread_mutex_unlock(&stream->lock);
 	}
 error: