Message ID | CAF2baFfZ+sxAGfxGY_or7=JJMxDngZTtfrzruUC5=RnZFD4tvA@mail.gmail.com |
---|---|
State | New |
Delegated to: | Jérémie Galarneau |
Headers |
From: wallinux at gmail.com (Anders Wallin) Date: Fri, 24 Nov 2017 08:23:17 +0000 Subject: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization Message-ID: <CAF2baFfZ+sxAGfxGY_or7=JJMxDngZTtfrzruUC5=RnZFD4tvA@mail.gmail.com> |
Series |
lttng-consumerd crash on aarch64 due to x86 arch specific optimization
|
|
Commit Message
Anders Wallin
Nov. 24, 2017, 8:23 a.m. UTC
Hi,
architectures that has memory alignment restrictions may/will fail with the
optimization done in 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3.
Please revert the patch or make it X86 specific.
Regards
Anders Wallin
------------------------------------------------------------------------------------------------------------
commit 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date: Sun Sep 25 12:31:11 2016 -0400
Performance: implement lttng_inline_memcpy
Because all length parameters received for serializing data coming from
applications go through a callback, they are never constant, and it
hurts performance to perform a call to memcpy each time.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
* inputs, but rely on an inline memcpy for length statically unknown.
@@ -603,7 +625,7 @@ do {
\
if (__builtin_constant_p(len)) \
memcpy(dest, src, __len); \
else \
- inline_memcpy(dest, src, __len); \
+ lttng_inline_memcpy(dest, src, __len); \
} while (0)
/*
----------------------------------------------------------------------------------------------------
Here is one example of a crash, where 0xf3b7eeb2 is not 8 byte aligned;
(gdb) bt full
#0 lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized out>) at
/usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610
No locals.
#1 lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0,
config=0xf737c560 <client_config>) at
/usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100
__len = 8
handle = 0xf3b2e0c0
backend_pages = <optimized out>
chanb = 0xf3b2e2e0
offset = <optimized out>
#2 lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at
/usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267
No locals.
#3 0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=<optimized
out>, metadata_str=0xf3b7eeb2 "", len=<optimized out>) at
/usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183
ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0,
data_size = 8, largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size =
8, buf_offset = 163877, pre_offset = 163877, tsc = 0, rflags = 0, ctx_len =
80, ip = 0x0, priv2 = 0x0, padding2 = '\000' <repeats 11 times>,
backend_pages = 0xf690c000}
chan = 0xf3b2e4d8
str = 0xf3b7eeb2 ""
reserve_len = 8
ret = <optimized out>
__func__ = '\000' <repeats 34 times>
__PRETTY_FUNCTION__ = '\000' <repeats 34 times>
---Type <return> to continue, or q <return> to quit---
#4 0x000344cc in commit_one_metadata_packet (stream=stream at entry=0xf3b2e560)
at ust-consumer.c:2206
write_len = <optimized out>
ret = <optimized out>
__PRETTY_FUNCTION__ = "commit_one_metadata_packet"
#5 0x00036538 in lttng_ustconsumer_read_subbuffer
(stream=stream at entry=0xf3b2e560,
ctx=ctx at entry=0x25e6e8) at ust-consumer.c:2452
len = 4096
subbuf_size = 4093
padding = <optimized out>
err = -11
write_index = 1
ret = <optimized out>
ustream = <optimized out>
index = {offset = 0, packet_size = 575697416355872, content_size =
17564043391468256584, timestamp_begin = 17564043425827782792, timestamp_end
= 34359738496, events_discarded = 313327092840, stream_id = 4089446416,
stream_instance_id = 17689097291346376608, packet_seq_num = 8589934593}
__PRETTY_FUNCTION__ = "lttng_ustconsumer_read_subbuffer"
__func__ = "lttng_ustconsumer_read_subbuffer"
.....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20171124/7d430857/attachment.html>
Comments
----- On Nov 24, 2017, at 3:23 AM, Anders Wallin <wallinux at gmail.com> wrote: > Hi, > architectures that has memory alignment restrictions may/will fail with the > optimization done in 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3. > Please revert the patch or make it X86 specific. Hi Anders, This was added in the development cycle of lttng-ust 2.9. We could perhaps add a test on the pointer alignment for architectures that care about it, and fallback to memcpy in those cases. The revert approach would have been justified if this commit had been backported as a "fix" to a stable branch, which is not the case here. We should work on finding an acceptable solution that takes care of dealing with unaligned pointers on architectures that care about the difference. Thanks, Mathieu > Regards > Anders Wallin > ------------------------------------------------------------------------------------------------------------ > commit 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3 > Author: Mathieu Desnoyers < [ mailto:mathieu.desnoyers at efficios.com | > mathieu.desnoyers at efficios.com ] > > Date: Sun Sep 25 12:31:11 2016 -0400 > Performance: implement lttng_inline_memcpy > Because all length parameters received for serializing data coming from > applications go through a callback, they are never constant, and it > hurts performance to perform a call to memcpy each time. > Signed-off-by: Mathieu Desnoyers < [ mailto:mathieu.desnoyers at efficios.com | > mathieu.desnoyers at efficios.com ] > > diff --git a/libringbuffer/backend_internal.h b/libringbuffer/backend_internal.h > index 90088b89..e597cf4d 100644 > --- a/libringbuffer/backend_internal.h > +++ b/libringbuffer/backend_internal.h > @@ -592,6 +592,28 @@ int update_read_sb_index(const struct > lttng_ust_lib_ring_buffer_config *config, > #define inline_memcpy(dest, src, n) memcpy(dest, src, n) > #endif > +static inline __attribute__((always_inline)) > +void lttng_inline_memcpy(void *dest, const void *src, > + unsigned long len) > +{ > + switch (len) { > + case 1: > + *(uint8_t *) dest = *(const uint8_t *) src; > + break; > + case 2: > + *(uint16_t *) dest = *(const uint16_t *) src; > + break; > + case 4: > + *(uint32_t *) dest = *(const uint32_t *) src; > + break; > + case 8: > + *(uint64_t *) dest = *(const uint64_t *) src; > + break; > + default: > + inline_memcpy(dest, src, len); > + } > +} > + > /* > * Use the architecture-specific memcpy implementation for constant-sized > * inputs, but rely on an inline memcpy for length statically unknown. > @@ -603,7 +625,7 @@ do { \ > if (__builtin_constant_p(len)) \ > memcpy(dest, src, __len); \ > else \ > - inline_memcpy(dest, src, __len); \ > + lttng_inline_memcpy(dest, src, __len); \ > } while (0) > /* > ---------------------------------------------------------------------------------------------------- > Here is one example of a crash, where 0xf3b7eeb2 is not 8 byte aligned; > (gdb) bt full > #0 lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized out>) at > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610 > No locals. > #1 lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0, > config=0xf737c560 <client_config>) at > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100 > __len = 8 > handle = 0xf3b2e0c0 > backend_pages = <optimized out> > chanb = 0xf3b2e2e0 > offset = <optimized out> > #2 lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267 > No locals. > #3 0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=<optimized out>, > metadata_str=0xf3b7eeb2 "", len=<optimized out>) at > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183 > ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0, data_size = 8, > largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size = 8, buf_offset = > 163877, pre_offset = 163877, tsc = 0, rflags = 0, ctx_len = 80, ip = 0x0, priv2 > = 0x0, padding2 = '\000' <repeats 11 times>, backend_pages = 0xf690c000} > chan = 0xf3b2e4d8 > str = 0xf3b7eeb2 "" > reserve_len = 8 > ret = <optimized out> > __func__ = '\000' <repeats 34 times> > __PRETTY_FUNCTION__ = '\000' <repeats 34 times> > ---Type <return> to continue, or q <return> to quit--- > #4 0x000344cc in commit_one_metadata_packet (stream=stream at entry=0xf3b2e560) at > ust-consumer.c:2206 > write_len = <optimized out> > ret = <optimized out> > __PRETTY_FUNCTION__ = "commit_one_metadata_packet" > #5 0x00036538 in lttng_ustconsumer_read_subbuffer > (stream=stream at entry=0xf3b2e560, ctx=ctx at entry=0x25e6e8) at ust-consumer.c:2452 > len = 4096 > subbuf_size = 4093 > padding = <optimized out> > err = -11 > write_index = 1 > ret = <optimized out> > ustream = <optimized out> > index = {offset = 0, packet_size = 575697416355872, content_size = > 17564043391468256584, timestamp_begin = 17564043425827782792, timestamp_end = > 34359738496, events_discarded = 313327092840, stream_id = 4089446416, > stream_instance_id = 17689097291346376608, packet_seq_num = 8589934593} > __PRETTY_FUNCTION__ = "lttng_ustconsumer_read_subbuffer" > __func__ = "lttng_ustconsumer_read_subbuffer" > ..... > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Hi Mathieu, I finally got some time to dig into this issue. The crash only happens when metadata is written AND the size of the metadata will end up in a write that is 8,4,2 or 1 bytes long AND that the source or destination is not aligned correctly according to HW limitation. I have not found any simple way to keep the performance enhancement code that is run most of the time. Maybe the metadata writes should have it's own write function instead. Here is an example of a crash (code is from lttng-ust 2.9.1 and lttng-tools 2.9.6) where the size is 8 bytes and the src address is unaligned at 0xf3b7eeb2; #0 lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized out>) at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610 No locals. #1 lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0, config=0xf737c560 <client_config>) at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100 __len = 8 handle = 0xf3b2e0c0 backend_pages = <optimized out> chanb = 0xf3b2e2e0 offset = <optimized out> #2 lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267 No locals. #3 0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=<optimized out>, metadata_str=0xf3b7eeb2 "", len=<optimized out>) at /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183 ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0, data_size = 8, largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size = 8, buf_offset = 163877, pre_offset = 163877, tsc = 0, rflags = 0, ctx_len = 80, ip = 0x0, priv2 = 0x0, padding2 = '\000' <repeats 11 times>, backend_pages = 0xf690c000} chan = 0xf3b2e4d8 str = 0xf3b7eeb2 "" reserve_len = 8 ret = <optimized out> __func__ = '\000' <repeats 34 times> __PRETTY_FUNCTION__ = '\000' <repeats 34 times> ---Type <return> to continue, or q <return> to quit--- #4 0x000344cc in commit_one_metadata_packet (stream=stream at entry=0xf3b2e560) at ust-consumer.c:2206 write_len = <optimized out> ret = <optimized out> __PRETTY_FUNCTION__ = "commit_one_metadata_packet" #5 0x00036538 in lttng_ustconsumer_read_subbuffer (stream=stream at entry=0xf3b2e560, ctx=ctx at entry=0x25e6e8) at ust-consumer.c:2452 len = 4096 subbuf_size = 4093 padding = <optimized out> err = -11 write_index = 1 ret = <optimized out> ustream = <optimized out> index = {offset = 0, packet_size = 575697416355872, content_size = 17564043391468256584, timestamp_begin = 17564043425827782792, timestamp_end = 34359738496, Regards Anders fre 24 nov. 2017 kl 20:18 skrev Mathieu Desnoyers < mathieu.desnoyers at efficios.com>: > ----- On Nov 24, 2017, at 3:23 AM, Anders Wallin <wallinux at gmail.com> > wrote: > > Hi, > architectures that has memory alignment restrictions may/will fail with > the > optimization done in 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3. > Please revert the patch or make it X86 specific. > > > Hi Anders, > > This was added in the development cycle of lttng-ust 2.9. We could perhaps > add a test on the pointer alignment for architectures that care about it, > and > fallback to memcpy in those cases. > > The revert approach would have been justified if this commit had been > backported > as a "fix" to a stable branch, which is not the case here. We should work > on > finding an acceptable solution that takes care of dealing with unaligned > pointers > on architectures that care about the difference. > > Thanks, > > Mathieu > > > > Regards > > Anders Wallin > > ------------------------------------------------------------------------------------------------------------ > commit 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3 > Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> > Date: Sun Sep 25 12:31:11 2016 -0400 > > Performance: implement lttng_inline_memcpy > Because all length parameters received for serializing data coming from > applications go through a callback, they are never constant, and it > hurts performance to perform a call to memcpy each time. > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> > > diff --git a/libringbuffer/backend_internal.h > b/libringbuffer/backend_internal.h > index 90088b89..e597cf4d 100644 > --- a/libringbuffer/backend_internal.h > +++ b/libringbuffer/backend_internal.h > @@ -592,6 +592,28 @@ int update_read_sb_index(const struct > lttng_ust_lib_ring_buffer_config *config, > #define inline_memcpy(dest, src, n) memcpy(dest, src, n) > #endif > +static inline __attribute__((always_inline)) > +void lttng_inline_memcpy(void *dest, const void *src, > + unsigned long len) > +{ > + switch (len) { > + case 1: > + *(uint8_t *) dest = *(const uint8_t *) src; > + break; > + case 2: > + *(uint16_t *) dest = *(const uint16_t *) src; > + break; > + case 4: > + *(uint32_t *) dest = *(const uint32_t *) src; > + break; > + case 8: > + *(uint64_t *) dest = *(const uint64_t *) src; > + break; > + default: > + inline_memcpy(dest, src, len); > + } > +} > + > /* > * Use the architecture-specific memcpy implementation for constant-sized > * inputs, but rely on an inline memcpy for length statically unknown. > @@ -603,7 +625,7 @@ do { > \ > if (__builtin_constant_p(len)) \ > memcpy(dest, src, __len); \ > else \ > - inline_memcpy(dest, src, __len); \ > + lttng_inline_memcpy(dest, src, __len); \ > } while (0) > /* > > > ---------------------------------------------------------------------------------------------------- > Here is one example of a crash, where 0xf3b7eeb2 is not 8 byte aligned; > > (gdb) bt full > #0 lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized out>) at > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610 > No locals. > #1 lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0, > config=0xf737c560 <client_config>) at > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100 > __len = 8 > handle = 0xf3b2e0c0 > backend_pages = <optimized out> > chanb = 0xf3b2e2e0 > offset = <optimized out> > #2 lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267 > No locals. > #3 0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=<optimized > out>, metadata_str=0xf3b7eeb2 "", len=<optimized out>) at > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183 > ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0, > data_size = 8, largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size = > 8, buf_offset = 163877, pre_offset = 163877, tsc = 0, rflags = 0, ctx_len = > 80, ip = 0x0, priv2 = 0x0, padding2 = '\000' <repeats 11 times>, > backend_pages = 0xf690c000} > chan = 0xf3b2e4d8 > str = 0xf3b7eeb2 "" > reserve_len = 8 > ret = <optimized out> > __func__ = '\000' <repeats 34 times> > __PRETTY_FUNCTION__ = '\000' <repeats 34 times> > ---Type <return> to continue, or q <return> to quit--- > #4 0x000344cc in commit_one_metadata_packet (stream=stream at entry=0xf3b2e560) > at ust-consumer.c:2206 > write_len = <optimized out> > ret = <optimized out> > __PRETTY_FUNCTION__ = "commit_one_metadata_packet" > #5 0x00036538 in lttng_ustconsumer_read_subbuffer (stream=stream at entry=0xf3b2e560, > ctx=ctx at entry=0x25e6e8) at ust-consumer.c:2452 > len = 4096 > subbuf_size = 4093 > padding = <optimized out> > err = -11 > write_index = 1 > ret = <optimized out> > ustream = <optimized out> > index = {offset = 0, packet_size = 575697416355872, content_size = > 17564043391468256584, timestamp_begin = 17564043425827782792, timestamp_end > = 34359738496, events_discarded = 313327092840, stream_id = 4089446416, > stream_instance_id = 17689097291346376608, packet_seq_num = 8589934593} > __PRETTY_FUNCTION__ = "lttng_ustconsumer_read_subbuffer" > __func__ = "lttng_ustconsumer_read_subbuffer" > ..... > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20171228/fa5e6655/attachment.html>
Hi Anders, Sorry for the long delay on this one, can you have a look at the following fix ? https://review.lttng.org/c/lttng-ust/+/9319 Fix: aarch64: do not perform unaligned stores If it passes your testing, I'll merge this into lttng-ust. Thanks, Mathieu On 2017-12-28 09:13, Anders Wallin wrote: > Hi Mathieu, > > I finally got some time to dig into this issue. The crash only happens > when metadata is written AND the size of the metadata will end up in a > write that is 8,4,2 or 1 bytes long AND > that the source or destination is not aligned correctly according to HW > limitation. I have not found any simple way to keep the performance > enhancement code that is run most of the time. > Maybe the metadata writes should have it's own write function instead. > > Here is an example of a crash (code is from lttng-ust 2.9.1 and > lttng-tools 2.9.6) where the size is 8 bytes and the src address is > unaligned at 0xf3b7eeb2; > > #0? lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized out>) at > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610 > No locals. > #1? lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0, > config=0xf737c560 <client_config>) at > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100 > ? ? ? ? __len = 8 > ? ? ? ? handle = 0xf3b2e0c0 > ? ? ? ? backend_pages = <optimized out> > ? ? ? ? chanb = 0xf3b2e2e0 > ? ? ? ? offset = <optimized out> > > #2? lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267 > No locals. > > #3? 0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=<optimized > out>, metadata_str=0xf3b7eeb2 "", len=<optimized out>) at > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183 > ? ? ? ? ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0, > data_size = 8, largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size > = 8, buf_offset = 163877, pre_offset = 163877, tsc = 0, rflags = 0, > ctx_len = 80, ip = 0x0, priv2 = 0x0, padding2 = '\000' <repeats 11 > times>, backend_pages = 0xf690c000} > ? ? ? ? chan = 0xf3b2e4d8 > ? ? ? ? str = 0xf3b7eeb2 "" > ? ? ? ? reserve_len = 8 > ? ? ? ? ret = <optimized out> > ? ? ? ? __func__ = '\000' <repeats 34 times> > ? ? ? ? __PRETTY_FUNCTION__ = '\000' <repeats 34 times> > ---Type <return> to continue, or q <return> to quit--- > > #4? 0x000344cc in commit_one_metadata_packet > (stream=stream at entry=0xf3b2e560) at ust-consumer.c:2206 > ? ? ? ? write_len = <optimized out> > ? ? ? ? ret = <optimized out> > ? ? ? ? __PRETTY_FUNCTION__ = "commit_one_metadata_packet" > > #5? 0x00036538 in lttng_ustconsumer_read_subbuffer > (stream=stream at entry=0xf3b2e560, ctx=ctx at entry=0x25e6e8) at > ust-consumer.c:2452 > ? ? ? ? len = 4096 > ? ? ? ? subbuf_size = 4093 > ? ? ? ? padding = <optimized out> > ? ? ? ? err = -11 > ? ? ? ? write_index = 1 > ? ? ? ? ret = <optimized out> > ? ? ? ? ustream = <optimized out> > ? ? ? ? index = {offset = 0, packet_size = 575697416355872, > content_size = 17564043391468256584, timestamp_begin = > 17564043425827782792, timestamp_end = 34359738496, > Regards > Anders > > fre 24 nov. 2017 kl 20:18 skrev Mathieu Desnoyers > <mathieu.desnoyers at efficios.com <mailto:mathieu.desnoyers at efficios.com>>: > > ----- On Nov 24, 2017, at 3:23 AM, Anders Wallin <wallinux at gmail.com > <mailto:wallinux at gmail.com>> wrote: > > Hi, > architectures that has memory alignment restrictions may/will > fail with the > optimization done in?51b8f2fa2b972e62117caa946dd3e3565b6ca4a3. > Please revert the patch or make it X86 specific. > > > Hi Anders, > > This was added in the development cycle of lttng-ust 2.9. We could > perhaps > add a test on the pointer alignment for architectures that care > about it, and > fallback to memcpy in those cases. > > The revert approach would have been justified if this commit had > been backported > as a "fix" to a stable branch, which is not the case here. We should > work on > finding an acceptable solution that takes care of dealing with > unaligned pointers > on architectures that care about the difference. > > Thanks, > > Mathieu > > > > Regards > > Anders Wallin > ------------------------------------------------------------------------------------------------------------ > commit 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3 > Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com > <mailto:mathieu.desnoyers at efficios.com>> > Date:? ?Sun Sep 25 12:31:11 2016 -0400 > > ? ? Performance: implement lttng_inline_memcpy > ? ? Because all length parameters received for serializing data > coming from > ? ? applications go through a callback, they are never > constant, and it > ? ? hurts performance to perform a call to memcpy each time. > ? ? Signed-off-by: Mathieu Desnoyers > <mathieu.desnoyers at efficios.com > <mailto:mathieu.desnoyers at efficios.com>> > > diff --git a/libringbuffer/backend_internal.h > b/libringbuffer/backend_internal.h > index 90088b89..e597cf4d 100644 > --- a/libringbuffer/backend_internal.h > +++ b/libringbuffer/backend_internal.h > @@ -592,6 +592,28 @@ int update_read_sb_index(const struct > lttng_ust_lib_ring_buffer_config *config, > ?#define inline_memcpy(dest, src, n)? ? memcpy(dest, src, n) > ?#endif > +static inline __attribute__((always_inline)) > +void lttng_inline_memcpy(void *dest, const void *src, > +? ? ? ? ? ? ? ?unsigned long len) > +{ > +? ? ? ?switch (len) { > +? ? ? ?case 1: > +? ? ? ? ? ? ? ?*(uint8_t *) dest = *(const uint8_t *) src; > +? ? ? ? ? ? ? ?break; > +? ? ? ?case 2: > +? ? ? ? ? ? ? ?*(uint16_t *) dest = *(const uint16_t *) src; > +? ? ? ? ? ? ? ?break; > +? ? ? ?case 4: > +? ? ? ? ? ? ? ?*(uint32_t *) dest = *(const uint32_t *) src; > +? ? ? ? ? ? ? ?break; > +? ? ? ?case 8: > +? ? ? ? ? ? ? ?*(uint64_t *) dest = *(const uint64_t *) src; > +? ? ? ? ? ? ? ?break; > +? ? ? ?default: > +? ? ? ? ? ? ? ?inline_memcpy(dest, src, len); > +? ? ? ?} > +} > + > ?/* > ? * Use the architecture-specific memcpy implementation for > constant-sized > ? * inputs, but rely on an inline memcpy for length statically > unknown. > @@ -603,7 +625,7 @@ do { > ? ? ? ? ? ? ? ? ? ? ? ? \ > ? ? ? ? if (__builtin_constant_p(len))? ? ? ? ? ? ? ? ? ? ? ? ? \ > ? ? ? ? ? ? ? ? memcpy(dest, src, __len);? ? ? ? ? ? ? ? ? ? ? ?\ > ? ? ? ? else? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? \ > -? ? ? ? ? ? ? ?inline_memcpy(dest, src, __len);? ? ? ? ? ? ? ? \ > +? ? ? ? ? ? ? ?lttng_inline_memcpy(dest, src, __len);? ? ? ? ? \ > ?} while (0) > ?/* > > ---------------------------------------------------------------------------------------------------- > Here is one example of a crash, where 0xf3b7eeb2 is not 8 byte > aligned; > > ? (gdb) bt full > #0? lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized > out>) at > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610 > No locals. > #1? lib_ring_buffer_write (len=8, src=0xf3b7eeb2, > ctx=0xf57c47d0, config=0xf737c560 <client_config>) at > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100 > ? ? ? ? __len = 8 > ? ? ? ? handle = 0xf3b2e0c0 > ? ? ? ? backend_pages = <optimized out> > ? ? ? ? chanb = 0xf3b2e2e0 > ? ? ? ? offset = <optimized out> > #2? lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267 > No locals. > #3? 0xf7337ef8 in ustctl_write_one_packet_to_channel > (channel=<optimized out>, metadata_str=0xf3b7eeb2 "", > len=<optimized out>) at > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183 > ? ? ? ? ctx = {chan = 0xf3b2e290, priv = 0x0, handle = > 0xf3b2e0c0, data_size = 8, largest_align = 1, cpu = -1, buf = > 0xf6909000, slot_size = 8, buf_offset = 163877, pre_offset = > 163877, tsc = 0, rflags = 0, ctx_len = 80, ip = 0x0, priv2 = > 0x0, padding2 = '\000' <repeats 11 times>, backend_pages = > 0xf690c000} > ? ? ? ? chan = 0xf3b2e4d8 > ? ? ? ? str = 0xf3b7eeb2 "" > ? ? ? ? reserve_len = 8 > ? ? ? ? ret = <optimized out> > ? ? ? ? __func__ = '\000' <repeats 34 times> > ? ? ? ? __PRETTY_FUNCTION__ = '\000' <repeats 34 times> > ---Type <return> to continue, or q <return> to quit--- > #4? 0x000344cc in commit_one_metadata_packet > (stream=stream at entry=0xf3b2e560) at ust-consumer.c:2206 > ? ? ? ? write_len = <optimized out> > ? ? ? ? ret = <optimized out> > ? ? ? ? __PRETTY_FUNCTION__ = "commit_one_metadata_packet" > #5? 0x00036538 in lttng_ustconsumer_read_subbuffer > (stream=stream at entry=0xf3b2e560, ctx=ctx at entry=0x25e6e8) at > ust-consumer.c:2452 > ? ? ? ? len = 4096 > ? ? ? ? subbuf_size = 4093 > ? ? ? ? padding = <optimized out> > ? ? ? ? err = -11 > ? ? ? ? write_index = 1 > ? ? ? ? ret = <optimized out> > ? ? ? ? ustream = <optimized out> > ? ? ? ? index = {offset = 0, packet_size = 575697416355872, > content_size = 17564043391468256584, timestamp_begin = > 17564043425827782792, timestamp_end = 34359738496, > events_discarded = 313327092840, stream_id = 4089446416, > stream_instance_id = 17689097291346376608, packet_seq_num = > 8589934593} > ? ? ? ? __PRETTY_FUNCTION__ = "lttng_ustconsumer_read_subbuffer" > ? ? ? ? __func__ = "lttng_ustconsumer_read_subbuffer" > ..... > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org <mailto:lttng-dev at lists.lttng.org> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev> > > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com <http://www.efficios.com> >
Hi Matthieu, I've retired and no longer have access to any arch64 target to test it on. Regards Anders Den ons 25 jan. 2023 13:25Mathieu Desnoyers <mathieu.desnoyers at efficios.com> skrev: > Hi Anders, > > Sorry for the long delay on this one, can you have a look at the following > fix ? > > https://review.lttng.org/c/lttng-ust/+/9319 Fix: aarch64: do not perform > unaligned stores > > If it passes your testing, I'll merge this into lttng-ust. > > Thanks, > > Mathieu > > On 2017-12-28 09:13, Anders Wallin wrote: > > Hi Mathieu, > > > > I finally got some time to dig into this issue. The crash only happens > > when metadata is written AND the size of the metadata will end up in a > > write that is 8,4,2 or 1 bytes long AND > > that the source or destination is not aligned correctly according to HW > > limitation. I have not found any simple way to keep the performance > > enhancement code that is run most of the time. > > Maybe the metadata writes should have it's own write function instead. > > > > Here is an example of a crash (code is from lttng-ust 2.9.1 and > > lttng-tools 2.9.6) where the size is 8 bytes and the src address is > > unaligned at 0xf3b7eeb2; > > > > #0 lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized out>) at > > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610 > > No locals. > > #1 lib_ring_buffer_write (len=8, src=0xf3b7eeb2, ctx=0xf57c47d0, > > config=0xf737c560 <client_config>) at > > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100 > > __len = 8 > > handle = 0xf3b2e0c0 > > backend_pages = <optimized out> > > chanb = 0xf3b2e2e0 > > offset = <optimized out> > > > > #2 lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at > > > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267 > > No locals. > > > > #3 0xf7337ef8 in ustctl_write_one_packet_to_channel (channel=<optimized > > out>, metadata_str=0xf3b7eeb2 "", len=<optimized out>) at > > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183 > > ctx = {chan = 0xf3b2e290, priv = 0x0, handle = 0xf3b2e0c0, > > data_size = 8, largest_align = 1, cpu = -1, buf = 0xf6909000, slot_size > > = 8, buf_offset = 163877, pre_offset = 163877, tsc = 0, rflags = 0, > > ctx_len = 80, ip = 0x0, priv2 = 0x0, padding2 = '\000' <repeats 11 > > times>, backend_pages = 0xf690c000} > > chan = 0xf3b2e4d8 > > str = 0xf3b7eeb2 "" > > reserve_len = 8 > > ret = <optimized out> > > __func__ = '\000' <repeats 34 times> > > __PRETTY_FUNCTION__ = '\000' <repeats 34 times> > > ---Type <return> to continue, or q <return> to quit--- > > > > #4 0x000344cc in commit_one_metadata_packet > > (stream=stream at entry=0xf3b2e560) at ust-consumer.c:2206 > > write_len = <optimized out> > > ret = <optimized out> > > __PRETTY_FUNCTION__ = "commit_one_metadata_packet" > > > > #5 0x00036538 in lttng_ustconsumer_read_subbuffer > > (stream=stream at entry=0xf3b2e560, ctx=ctx at entry=0x25e6e8) at > > ust-consumer.c:2452 > > len = 4096 > > subbuf_size = 4093 > > padding = <optimized out> > > err = -11 > > write_index = 1 > > ret = <optimized out> > > ustream = <optimized out> > > index = {offset = 0, packet_size = 575697416355872, > > content_size = 17564043391468256584, timestamp_begin = > > 17564043425827782792, timestamp_end = 34359738496, > > Regards > > Anders > > > > fre 24 nov. 2017 kl 20:18 skrev Mathieu Desnoyers > > <mathieu.desnoyers at efficios.com <mailto:mathieu.desnoyers at efficios.com > >>: > > > > ----- On Nov 24, 2017, at 3:23 AM, Anders Wallin <wallinux at gmail.com > > <mailto:wallinux at gmail.com>> wrote: > > > > Hi, > > architectures that has memory alignment restrictions may/will > > fail with the > > optimization done in 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3. > > Please revert the patch or make it X86 specific. > > > > > > Hi Anders, > > > > This was added in the development cycle of lttng-ust 2.9. We could > > perhaps > > add a test on the pointer alignment for architectures that care > > about it, and > > fallback to memcpy in those cases. > > > > The revert approach would have been justified if this commit had > > been backported > > as a "fix" to a stable branch, which is not the case here. We should > > work on > > finding an acceptable solution that takes care of dealing with > > unaligned pointers > > on architectures that care about the difference. > > > > Thanks, > > > > Mathieu > > > > > > > > Regards > > > > Anders Wallin > > > ------------------------------------------------------------------------------------------------------------ > > commit 51b8f2fa2b972e62117caa946dd3e3565b6ca4a3 > > Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com > > <mailto:mathieu.desnoyers at efficios.com>> > > Date: Sun Sep 25 12:31:11 2016 -0400 > > > > Performance: implement lttng_inline_memcpy > > Because all length parameters received for serializing data > > coming from > > applications go through a callback, they are never > > constant, and it > > hurts performance to perform a call to memcpy each time. > > Signed-off-by: Mathieu Desnoyers > > <mathieu.desnoyers at efficios.com > > <mailto:mathieu.desnoyers at efficios.com>> > > > > diff --git a/libringbuffer/backend_internal.h > > b/libringbuffer/backend_internal.h > > index 90088b89..e597cf4d 100644 > > --- a/libringbuffer/backend_internal.h > > +++ b/libringbuffer/backend_internal.h > > @@ -592,6 +592,28 @@ int update_read_sb_index(const struct > > lttng_ust_lib_ring_buffer_config *config, > > #define inline_memcpy(dest, src, n) memcpy(dest, src, n) > > #endif > > +static inline __attribute__((always_inline)) > > +void lttng_inline_memcpy(void *dest, const void *src, > > + unsigned long len) > > +{ > > + switch (len) { > > + case 1: > > + *(uint8_t *) dest = *(const uint8_t *) src; > > + break; > > + case 2: > > + *(uint16_t *) dest = *(const uint16_t *) src; > > + break; > > + case 4: > > + *(uint32_t *) dest = *(const uint32_t *) src; > > + break; > > + case 8: > > + *(uint64_t *) dest = *(const uint64_t *) src; > > + break; > > + default: > > + inline_memcpy(dest, src, len); > > + } > > +} > > + > > /* > > * Use the architecture-specific memcpy implementation for > > constant-sized > > * inputs, but rely on an inline memcpy for length statically > > unknown. > > @@ -603,7 +625,7 @@ do { > > \ > > if (__builtin_constant_p(len)) > \ > > memcpy(dest, src, __len); > \ > > else > \ > > - inline_memcpy(dest, src, __len); \ > > + lttng_inline_memcpy(dest, src, __len); \ > > } while (0) > > /* > > > > > ---------------------------------------------------------------------------------------------------- > > Here is one example of a crash, where 0xf3b7eeb2 is not 8 byte > > aligned; > > > > (gdb) bt full > > #0 lttng_inline_memcpy (len=8, src=0xf3b7eeb2, dest=<optimized > > out>) at > > > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend_internal.h:610 > > No locals. > > #1 lib_ring_buffer_write (len=8, src=0xf3b7eeb2, > > ctx=0xf57c47d0, config=0xf737c560 <client_config>) at > > /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/backend.h:100 > > __len = 8 > > handle = 0xf3b2e0c0 > > backend_pages = <optimized out> > > chanb = 0xf3b2e2e0 > > offset = <optimized out> > > #2 lttng_event_write (ctx=0xf57c47d0, src=0xf3b7eeb2, len=8) at > > > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust/lttng-ring-buffer-metadata-client.h:267 > > No locals. > > #3 0xf7337ef8 in ustctl_write_one_packet_to_channel > > (channel=<optimized out>, metadata_str=0xf3b7eeb2 "", > > len=<optimized out>) at > > /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1183 > > ctx = {chan = 0xf3b2e290, priv = 0x0, handle = > > 0xf3b2e0c0, data_size = 8, largest_align = 1, cpu = -1, buf = > > 0xf6909000, slot_size = 8, buf_offset = 163877, pre_offset = > > 163877, tsc = 0, rflags = 0, ctx_len = 80, ip = 0x0, priv2 = > > 0x0, padding2 = '\000' <repeats 11 times>, backend_pages = > > 0xf690c000} > > chan = 0xf3b2e4d8 > > str = 0xf3b7eeb2 "" > > reserve_len = 8 > > ret = <optimized out> > > __func__ = '\000' <repeats 34 times> > > __PRETTY_FUNCTION__ = '\000' <repeats 34 times> > > ---Type <return> to continue, or q <return> to quit--- > > #4 0x000344cc in commit_one_metadata_packet > > (stream=stream at entry=0xf3b2e560) at ust-consumer.c:2206 > > write_len = <optimized out> > > ret = <optimized out> > > __PRETTY_FUNCTION__ = "commit_one_metadata_packet" > > #5 0x00036538 in lttng_ustconsumer_read_subbuffer > > (stream=stream at entry=0xf3b2e560, ctx=ctx at entry=0x25e6e8) at > > ust-consumer.c:2452 > > len = 4096 > > subbuf_size = 4093 > > padding = <optimized out> > > err = -11 > > write_index = 1 > > ret = <optimized out> > > ustream = <optimized out> > > index = {offset = 0, packet_size = 575697416355872, > > content_size = 17564043391468256584, timestamp_begin = > > 17564043425827782792, timestamp_end = 34359738496, > > events_discarded = 313327092840, stream_id = 4089446416, > > stream_instance_id = 17689097291346376608, packet_seq_num = > > 8589934593} > > __PRETTY_FUNCTION__ = "lttng_ustconsumer_read_subbuffer" > > __func__ = "lttng_ustconsumer_read_subbuffer" > > ..... > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev at lists.lttng.org <mailto:lttng-dev at lists.lttng.org> > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev> > > > > > > -- > > Mathieu Desnoyers > > EfficiOS Inc. > > http://www.efficios.com <http://www.efficios.com> > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20230126/c65dc569/attachment-0001.htm>
On 2023-01-26 14:32, Anders Wallin wrote: > Hi Matthieu, > > I've retired and no longer have access to any arch64? target to test it on. > Thanks for your reply Anders, I've talked to Henrik and P?r today and they are already testing it out. Enjoy your retirement :) Best regards, Mathieu
diff --git a/libringbuffer/backend_internal.h b/libringbuffer/backend_internal.h index 90088b89..e597cf4d 100644 --- a/libringbuffer/backend_internal.h +++ b/libringbuffer/backend_internal.h @@ -592,6 +592,28 @@ int update_read_sb_index(const struct lttng_ust_lib_ring_buffer_config *config, #define inline_memcpy(dest, src, n) memcpy(dest, src, n) #endif +static inline __attribute__((always_inline)) +void lttng_inline_memcpy(void *dest, const void *src, + unsigned long len) +{ + switch (len) { + case 1: + *(uint8_t *) dest = *(const uint8_t *) src; + break; + case 2: + *(uint16_t *) dest = *(const uint16_t *) src; + break; + case 4: + *(uint32_t *) dest = *(const uint32_t *) src; + break; + case 8: + *(uint64_t *) dest = *(const uint64_t *) src; + break; + default: + inline_memcpy(dest, src, len); + } +} + /* * Use the architecture-specific memcpy implementation for constant-sized