lttng-consumerd crash on aarch64 due to x86 arch specific optimization
diff mbox series

Message ID CAF2baFfZ+sxAGfxGY_or7=JJMxDngZTtfrzruUC5=RnZFD4tvA@mail.gmail.com
State New
Delegated to: Jérémie Galarneau
Headers show
Series
  • lttng-consumerd crash on aarch64 due to x86 arch specific optimization
Related show

Commit Message

Anders Wallin Nov. 24, 2017, 8:23 a.m. UTC
Hi,

architectures that has memory alignment restrictions may/will fail with the
optimization done in 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3.
Please revert the patch or make it X86 specific.

Regards

Anders Wallin

------------------------------------------------------------------------------------------------------------
commit 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date:   Sun Sep 25 12:31:11 2016 -0400

    Performance: implement lttng_inline_memcpy

    Because all length parameters received for serializing data coming from
    applications go through a callback, they are never constant, and it
    hurts performance to perform a call to memcpy each time.

    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>

  * inputs, but rely on an inline memcpy for length statically unknown.
@@ -603,7 +625,7 @@ do {
            \
        if (__builtin_constant_p(len))                          \
                memcpy(dest, src, __len);                       \
        else                                                    \
-               inline_memcpy(dest, src, __len);                \
+               lttng_inline_memcpy(dest, src, __len);          \
 } while (0)

 /*

----------------------------------------------------------------------------------------------------
Here is one example of a crash, where 0xf3b7eeb2 is not 8 byte aligned;

  (gdb) bt full
#0  lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized out>) at
/usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610
No locals.
#1  lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0,
config=0xf737c560 <client_config>) at
/usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100
        __len = 8
        handle = 0xf3b2e0c0
        backend_pages = <optimized out>
        chanb = 0xf3b2e2e0
        offset = <optimized out>
#2  lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at
/usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267
No locals.
#3  0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=<optimized
out>, metadata_str=0xf3b7eeb2 "", len=<optimized out>) at
/usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183
        ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0,
data_size = 8, largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size =
8, buf_offset = 163877, pre_offset = 163877, tsc = 0, rflags = 0, ctx_len =
80, ip = 0x0, priv2 = 0x0, padding2 = '\000' <repeats 11 times>,
backend_pages = 0xf690c000}
        chan = 0xf3b2e4d8
        str = 0xf3b7eeb2 ""
        reserve_len = 8
        ret = <optimized out>
        __func__ = '\000' <repeats 34 times>
        __PRETTY_FUNCTION__ = '\000' <repeats 34 times>
---Type <return> to continue, or q <return> to quit---
#4  0x000344cc in commit_one_metadata_packet (stream=stream at entry=0xf3b2e560)
at ust-consumer.c:2206
        write_len = <optimized out>
        ret = <optimized out>
        __PRETTY_FUNCTION__ = "commit_one_metadata_packet"
#5  0x00036538 in lttng_ustconsumer_read_subbuffer
(stream=stream at entry=0xf3b2e560,
ctx=ctx at entry=0x25e6e8) at ust-consumer.c:2452
        len = 4096
        subbuf_size = 4093
        padding = <optimized out>
        err = -11
        write_index = 1
        ret = <optimized out>
        ustream = <optimized out>
        index = {offset = 0, packet_size = 575697416355872, content_size =
17564043391468256584, timestamp_begin = 17564043425827782792, timestamp_end
= 34359738496, events_discarded = 313327092840, stream_id = 4089446416,
stream_instance_id = 17689097291346376608, packet_seq_num = 8589934593}
        __PRETTY_FUNCTION__ = "lttng_ustconsumer_read_subbuffer"
        __func__ = "lttng_ustconsumer_read_subbuffer"
.....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20171124/7d430857/attachment.html>

Comments

Mathieu Desnoyers Nov. 24, 2017, 7:19 p.m. UTC | #1
----- On Nov 24, 2017, at 3:23 AM, Anders Wallin <wallinux at gmail.com> wrote: 

> Hi,
> architectures that has memory alignment restrictions may/will fail with the
> optimization done in 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3.
> Please revert the patch or make it X86 specific.

Hi Anders, 

This was added in the development cycle of lttng-ust 2.9. We could perhaps 
add a test on the pointer alignment for architectures that care about it, and 
fallback to memcpy in those cases. 

The revert approach would have been justified if this commit had been backported 
as a "fix" to a stable branch, which is not the case here. We should work on 
finding an acceptable solution that takes care of dealing with unaligned pointers 
on architectures that care about the difference. 

Thanks, 

Mathieu 

> Regards

> Anders Wallin
> ------------------------------------------------------------------------------------------------------------
> commit 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3
> Author: Mathieu Desnoyers < [ mailto:mathieu.desnoyers at efficios.com |
> mathieu.desnoyers at efficios.com ] >
> Date: Sun Sep 25 12:31:11 2016 -0400

> Performance: implement lttng_inline_memcpy
> Because all length parameters received for serializing data coming from
> applications go through a callback, they are never constant, and it
> hurts performance to perform a call to memcpy each time.
> Signed-off-by: Mathieu Desnoyers < [ mailto:mathieu.desnoyers at efficios.com |
> mathieu.desnoyers at efficios.com ] >

> diff --git a/libringbuffer/backend_internal.h b/libringbuffer/backend_internal.h
> index 90088b89..e597cf4d 100644
> --- a/libringbuffer/backend_internal.h
> +++ b/libringbuffer/backend_internal.h
> @@ -592,6 +592,28 @@ int update_read_sb_index(const struct
> lttng_ust_lib_ring_buffer_config *config,
> #define inline_memcpy(dest, src, n) memcpy(dest, src, n)
> #endif
> +static inline __attribute__((always_inline))
> +void lttng_inline_memcpy(void *dest, const void *src,
> + unsigned long len)
> +{
> + switch (len) {
> + case 1:
> + *(uint8_t *) dest = *(const uint8_t *) src;
> + break;
> + case 2:
> + *(uint16_t *) dest = *(const uint16_t *) src;
> + break;
> + case 4:
> + *(uint32_t *) dest = *(const uint32_t *) src;
> + break;
> + case 8:
> + *(uint64_t *) dest = *(const uint64_t *) src;
> + break;
> + default:
> + inline_memcpy(dest, src, len);
> + }
> +}
> +
> /*
> * Use the architecture-specific memcpy implementation for constant-sized
> * inputs, but rely on an inline memcpy for length statically unknown.
> @@ -603,7 +625,7 @@ do { \
> if (__builtin_constant_p(len)) \
> memcpy(dest, src, __len); \
> else \
> - inline_memcpy(dest, src, __len); \
> + lttng_inline_memcpy(dest, src, __len); \
> } while (0)
> /*

> ----------------------------------------------------------------------------------------------------
> Here is one example of a crash, where 0xf3b7eeb2 is not 8 byte aligned;

> (gdb) bt full
> #0 lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized out>) at
> /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610
> No locals.
> #1 lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0,
> config=0xf737c560 <client_config>) at
> /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100
> __len = 8
> handle = 0xf3b2e0c0
> backend_pages = <optimized out>
> chanb = 0xf3b2e2e0
> offset = <optimized out>
> #2 lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at
> /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267
> No locals.
> #3 0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=<optimized out>,
> metadata_str=0xf3b7eeb2 "", len=<optimized out>) at
> /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183
> ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0, data_size = 8,
> largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size = 8, buf_offset =
> 163877, pre_offset = 163877, tsc = 0, rflags = 0, ctx_len = 80, ip = 0x0, priv2
> = 0x0, padding2 = '\000' <repeats 11 times>, backend_pages = 0xf690c000}
> chan = 0xf3b2e4d8
> str = 0xf3b7eeb2 ""
> reserve_len = 8
> ret = <optimized out>
> __func__ = '\000' <repeats 34 times>
> __PRETTY_FUNCTION__ = '\000' <repeats 34 times>
> ---Type <return> to continue, or q <return> to quit---
> #4 0x000344cc in commit_one_metadata_packet (stream=stream at entry=0xf3b2e560) at
> ust-consumer.c:2206
> write_len = <optimized out>
> ret = <optimized out>
> __PRETTY_FUNCTION__ = "commit_one_metadata_packet"
> #5 0x00036538 in lttng_ustconsumer_read_subbuffer
> (stream=stream at entry=0xf3b2e560, ctx=ctx at entry=0x25e6e8) at ust-consumer.c:2452
> len = 4096
> subbuf_size = 4093
> padding = <optimized out>
> err = -11
> write_index = 1
> ret = <optimized out>
> ustream = <optimized out>
> index = {offset = 0, packet_size = 575697416355872, content_size =
> 17564043391468256584, timestamp_begin = 17564043425827782792, timestamp_end =
> 34359738496, events_discarded = 313327092840, stream_id = 4089446416,
> stream_instance_id = 17689097291346376608, packet_seq_num = 8589934593}
> __PRETTY_FUNCTION__ = "lttng_ustconsumer_read_subbuffer"
> __func__ = "lttng_ustconsumer_read_subbuffer"
> .....

> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Anders Wallin Dec. 28, 2017, 2:13 p.m. UTC | #2
Hi Mathieu,

I finally got some time to dig into this issue. The crash only happens when
metadata is written AND the size of the metadata will end up in a write
that is 8,4,2 or 1 bytes long AND
that the source or destination is not aligned correctly according to HW
limitation. I have not found any simple way to keep the performance
enhancement code that is run most of the time.
Maybe the metadata writes should have it's own write function instead.

Here is an example of a crash (code is from lttng-ust 2.9.1 and lttng-tools
2.9.6) where the size is 8 bytes and the src address is unaligned at
0xf3b7eeb2;

#0  lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized out>) at
/usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610
No locals.
#1  lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0,
config=0xf737c560 <client_config>) at
/usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100
        __len = 8
        handle = 0xf3b2e0c0
        backend_pages = <optimized out>
        chanb = 0xf3b2e2e0
        offset = <optimized out>

#2  lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at
/usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267
No locals.

#3  0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=<optimized
out>, metadata_str=0xf3b7eeb2 "", len=<optimized out>) at
/usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183
        ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0,
data_size = 8, largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size =
8, buf_offset = 163877, pre_offset = 163877, tsc = 0, rflags = 0, ctx_len =
80, ip = 0x0, priv2 = 0x0, padding2 = '\000' <repeats 11 times>,
backend_pages = 0xf690c000}
        chan = 0xf3b2e4d8
        str = 0xf3b7eeb2 ""
        reserve_len = 8
        ret = <optimized out>
        __func__ = '\000' <repeats 34 times>
        __PRETTY_FUNCTION__ = '\000' <repeats 34 times>
---Type <return> to continue, or q <return> to quit---

#4  0x000344cc in commit_one_metadata_packet (stream=stream at entry=0xf3b2e560)
at ust-consumer.c:2206
        write_len = <optimized out>
        ret = <optimized out>
        __PRETTY_FUNCTION__ = "commit_one_metadata_packet"

#5  0x00036538 in lttng_ustconsumer_read_subbuffer
(stream=stream at entry=0xf3b2e560,
ctx=ctx at entry=0x25e6e8) at ust-consumer.c:2452
        len = 4096
        subbuf_size = 4093
        padding = <optimized out>
        err = -11
        write_index = 1
        ret = <optimized out>
        ustream = <optimized out>
        index = {offset = 0, packet_size = 575697416355872, content_size =
17564043391468256584, timestamp_begin = 17564043425827782792, timestamp_end
= 34359738496,

Regards
Anders

fre 24 nov. 2017 kl 20:18 skrev Mathieu Desnoyers <
mathieu.desnoyers at efficios.com>:

> ----- On Nov 24, 2017, at 3:23 AM, Anders Wallin <wallinux at gmail.com>
> wrote:
>
> Hi,
> architectures that has memory alignment restrictions may/will fail with
> the
> optimization done in 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3.
> Please revert the patch or make it X86 specific.
>
>
> Hi Anders,
>
> This was added in the development cycle of lttng-ust 2.9. We could perhaps
> add a test on the pointer alignment for architectures that care about it,
> and
> fallback to memcpy in those cases.
>
> The revert approach would have been justified if this commit had been
> backported
> as a "fix" to a stable branch, which is not the case here. We should work
> on
> finding an acceptable solution that takes care of dealing with unaligned
> pointers
> on architectures that care about the difference.
>
> Thanks,
>
> Mathieu
>
>
>
> Regards
>
> Anders Wallin
>
> ------------------------------------------------------------------------------------------------------------
> commit 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3
> Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> Date:   Sun Sep 25 12:31:11 2016 -0400
>
>     Performance: implement lttng_inline_memcpy
>     Because all length parameters received for serializing data coming from
>     applications go through a callback, they are never constant, and it
>     hurts performance to perform a call to memcpy each time.
>     Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
>
> diff --git a/libringbuffer/backend_internal.h
> b/libringbuffer/backend_internal.h
> index 90088b89..e597cf4d 100644
> --- a/libringbuffer/backend_internal.h
> +++ b/libringbuffer/backend_internal.h
> @@ -592,6 +592,28 @@ int update_read_sb_index(const struct
> lttng_ust_lib_ring_buffer_config *config,
>  #define inline_memcpy(dest, src, n)    memcpy(dest, src, n)
>  #endif
> +static inline __attribute__((always_inline))
> +void lttng_inline_memcpy(void *dest, const void *src,
> +               unsigned long len)
> +{
> +       switch (len) {
> +       case 1:
> +               *(uint8_t *) dest = *(const uint8_t *) src;
> +               break;
> +       case 2:
> +               *(uint16_t *) dest = *(const uint16_t *) src;
> +               break;
> +       case 4:
> +               *(uint32_t *) dest = *(const uint32_t *) src;
> +               break;
> +       case 8:
> +               *(uint64_t *) dest = *(const uint64_t *) src;
> +               break;
> +       default:
> +               inline_memcpy(dest, src, len);
> +       }
> +}
> +
>  /*
>   * Use the architecture-specific memcpy implementation for constant-sized
>   * inputs, but rely on an inline memcpy for length statically unknown.
> @@ -603,7 +625,7 @@ do {
>               \
>         if (__builtin_constant_p(len))                          \
>                 memcpy(dest, src, __len);                       \
>         else                                                    \
> -               inline_memcpy(dest, src, __len);                \
> +               lttng_inline_memcpy(dest, src, __len);          \
>  } while (0)
>  /*
>
>
> ----------------------------------------------------------------------------------------------------
> Here is one example of a crash, where 0xf3b7eeb2 is not 8 byte aligned;
>
>   (gdb) bt full
> #0  lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized out>) at
> /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610
> No locals.
> #1  lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0,
> config=0xf737c560 <client_config>) at
> /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100
>         __len = 8
>         handle = 0xf3b2e0c0
>         backend_pages = <optimized out>
>         chanb = 0xf3b2e2e0
>         offset = <optimized out>
> #2  lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at
> /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267
> No locals.
> #3  0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=<optimized
> out>, metadata_str=0xf3b7eeb2 "", len=<optimized out>) at
> /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183
>         ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0,
> data_size = 8, largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size =
> 8, buf_offset = 163877, pre_offset = 163877, tsc = 0, rflags = 0, ctx_len =
> 80, ip = 0x0, priv2 = 0x0, padding2 = '\000' <repeats 11 times>,
> backend_pages = 0xf690c000}
>         chan = 0xf3b2e4d8
>         str = 0xf3b7eeb2 ""
>         reserve_len = 8
>         ret = <optimized out>
>         __func__ = '\000' <repeats 34 times>
>         __PRETTY_FUNCTION__ = '\000' <repeats 34 times>
> ---Type <return> to continue, or q <return> to quit---
> #4  0x000344cc in commit_one_metadata_packet (stream=stream at entry=0xf3b2e560)
> at ust-consumer.c:2206
>         write_len = <optimized out>
>         ret = <optimized out>
>         __PRETTY_FUNCTION__ = "commit_one_metadata_packet"
> #5  0x00036538 in lttng_ustconsumer_read_subbuffer (stream=stream at entry=0xf3b2e560,
> ctx=ctx at entry=0x25e6e8) at ust-consumer.c:2452
>         len = 4096
>         subbuf_size = 4093
>         padding = <optimized out>
>         err = -11
>         write_index = 1
>         ret = <optimized out>
>         ustream = <optimized out>
>         index = {offset = 0, packet_size = 575697416355872, content_size =
> 17564043391468256584, timestamp_begin = 17564043425827782792, timestamp_end
> = 34359738496, events_discarded = 313327092840, stream_id = 4089446416,
> stream_instance_id = 17689097291346376608, packet_seq_num = 8589934593}
>         __PRETTY_FUNCTION__ = "lttng_ustconsumer_read_subbuffer"
>         __func__ = "lttng_ustconsumer_read_subbuffer"
> .....
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20171228/fa5e6655/attachment.html>

Patch
diff mbox series

diff --git a/libringbuffer/backend_internal.h
b/libringbuffer/backend_internal.h
index 90088b89..e597cf4d 100644
--- a/libringbuffer/backend_internal.h
+++ b/libringbuffer/backend_internal.h
@@ -592,6 +592,28 @@  int update_read_sb_index(const struct
lttng_ust_lib_ring_buffer_config *config,
 #define inline_memcpy(dest, src, n)    memcpy(dest, src, n)
 #endif

+static inline __attribute__((always_inline))
+void lttng_inline_memcpy(void *dest, const void *src,
+               unsigned long len)
+{
+       switch (len) {
+       case 1:
+               *(uint8_t *) dest = *(const uint8_t *) src;
+               break;
+       case 2:
+               *(uint16_t *) dest = *(const uint16_t *) src;
+               break;
+       case 4:
+               *(uint32_t *) dest = *(const uint32_t *) src;
+               break;
+       case 8:
+               *(uint64_t *) dest = *(const uint64_t *) src;
+               break;
+       default:
+               inline_memcpy(dest, src, len);
+       }
+}
+
 /*
  * Use the architecture-specific memcpy implementation for constant-sized