lttng-consumerd crash when lttng-relayd is killed
diff mbox

Message ID CAF2baFde4LrR1N7O_tfs-6tXrVOFf1fR3mi-H3gJHtynJ+eZJw@mail.gmail.com
State New
Delegated to: Jérémie Galarneau
Headers show

Commit Message

Anders Wallin March 16, 2017, 10:32 a.m. UTC
Hi,

We have found a a crash in lttng-consumerd when a live session is running
and the lttng-relayd is killed. The crash is from lttng-tools version
2.8.6, but the same problem/code is in newer versions.
This crash is from slightly modified version of 2.8.6, so the lines may not
be correct, but the issue has been reproduced on 2.8.6 w/o patches on an
arm target. The problem was introduced with
8dbd7d838dc2276e5a25057c76c2e219e1d2661b

(gdb)  bt
#0  lttng_index_file_write (index_file=0x0, element=element at entry=0xb3dfea40)
at index.c:132
#1  0x00024798 in consumer_stream_write_index (stream=stream at entry=0xb3f090b0,
element=element at entry=0xb3dfea40) at consumer-stream.c:375
#2  0x00021fb0 in send_empty_index (stream_id=<optimized out>,
ts=<optimized out>, stream=0xb3f090b0) at consumer-timer.c:125
#3  consumer_flush_ust_index (stream=0xb3f090b0) at consumer-timer.c:246
#4  0x0002323c in check_ust_stream (stream=0xb3f090b0) at
consumer-timer.c:297
#5  live_timer (ctx=<optimized out>, sig=<optimized out>, si=0xb3dfebd0,
uc=0x0) at consumer-timer.c:333
#6  consumer_timer_thread (data=0x0) at consumer-timer.c:591
#7  0xb6f000dc in start_thread (arg=0xb3dff340) at pthread_create.c:339
#8  0xb6e89130 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:89 from
/proj/cpptemp/plf_tools/licop-rcs/CXP9031275_4-R9C22/sysroot/lib/libc.so.6
#9  0xb6e89130 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:89 from
/proj/cpptemp/plf_tools/licop-rcs/CXP9031275_4-R9C22/sysroot/lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


The problem happens when;

./src/common/index/index.c
-------------------------------------
int consumer_stream_write_index(struct lttng_consumer_stream *stream,
struct ctf_packet_index *element)
{
int ret;
struct consumer_relayd_sock_pair *relayd;

assert(stream);
assert(element);

rcu_read_lock();
relayd = consumer_find_relayd(stream->net_seq_idx);
>> relayd is shutdown
if (relayd) {
pthread_mutex_lock(&relayd->ctrl_sock_mutex);
ret = relayd_send_index(&relayd->control_sock, element,
stream->relayd_stream_id, stream->next_net_seq_num - 1);
pthread_mutex_unlock(&relayd->ctrl_sock_mutex);
} else {
if (lttng_index_file_write(stream->index_file, element)) {
>> We get in here, but stream->index_file is set to NULL in
consumer_stream_close()
ret = -1;
} else {
ret = 0;
}
}
if (ret < 0) {
goto error;
}

src/common/consumer/consumer-stream.c
---------------------------------------------------------
void consumer_stream_close(struct lttng_consumer_stream *stream)
{
int ret;
struct consumer_relayd_sock_pair *relayd;

assert(stream);

switch (consumer_data.type) {
case LTTNG_CONSUMER_KERNEL:
.....
case LTTNG_CONSUMER32_UST:
case LTTNG_CONSUMER64_UST:
{
...
if (stream->index_file) {
lttng_index_file_put(stream->index_file);
stream->index_file = NULL;
>> Here is stream->index_file set to NULL
}
.....

The following patch fixes the crash, but it's a "band aid" patch since the
flow is
broken and it only fixes the crash. I was not able to find the correct flow
to fix the issue, but maybe the band aid patch should be included anyway!

>From fc896fe08e30435b9d3c78fa4551b2dc5042fb03 Mon Sep 17 00:00:00 2001
From: Anders Wallin <wallinux at gmail.com>
Date: Thu, 16 Mar 2017 11:15:23 +0100
Subject: [PATCH lttng-tools] Fix: crash in lttng-consumerd when lttng-relayd
 is killed

Fixes this crash:
0  lttng_index_file_write (index_file=0x0, element=element at entry=0xb3dfea40)
at index.c:132
1  0x00024798 in consumer_stream_write_index (stream=stream at entry=0xb3f090b0,
element=element at entry=0xb3dfea40) at consumer-stream.c:375
2  0x00021fb0 in send_empty_index (stream_id=<optimized out>, ts=<optimized
out>, stream=0xb3f090b0) at consumer-timer.c:125
3  consumer_flush_ust_index (stream=0xb3f090b0) at consumer-timer.c:246
4  0x0002323c in check_ust_stream (stream=0xb3f090b0) at
consumer-timer.c:297
5  live_timer (ctx=<optimized out>, sig=<optimized out>, si=0xb3dfebd0,
uc=0x0) at consumer-timer.c:333
6  consumer_timer_thread (data=0x0) at consumer-timer.c:591

Signed-off-by: Anders Wallin <wallinux at gmail.com>
---
 src/common/index/index.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

  }

Patch
diff mbox

diff --git a/src/common/index/index.c b/src/common/index/index.c
index b481badb..008d877b 100644
--- a/src/common/index/index.c
+++ b/src/common/index/index.c
@@ -129,11 +129,19 @@  int lttng_index_file_write(const struct
lttng_index_file *index_file,
  const struct ctf_packet_index *element)
 {
  ssize_t ret;
- int fd = index_file->fd;
- size_t len = index_file->element_len;
+ int fd;
+ size_t len;

  assert(element);

+ if (index_file == NULL) {
+ PERROR("index file is NULL");
+ goto error;
+ }
+
+ fd = index_file->fd;
+ len = index_file->element_len;
+
  if (fd < 0) {
  goto error;