Message ID | 401d796b-8f3c-453f-82f3-bf79e01a25d5.zhenyu.ren@aliyun.com |
---|---|
State | New |
Headers | show |
Series | 回复: 回复:回复: 回复: 回复: 回复: shm leak in traced application? | expand |
Hi Zhenyu, This is exactly why Jonathan and I asked you to fill a bug report on the bug tracker and follow the bug reporting guidelines ( [ https://lttng.org/community/#bug-reporting-guidelines | https://lttng.org/community/#bug-reporting-guidelines ] ). This saves time for everyone. Thanks, Mathieu ----- On Mar 9, 2022, at 11:24 PM, zhenyu.ren <zhenyu.ren at aliyun.com> wrote: > Oh, I see. I have an old ust(2.7). So I have no FD_CLOEXEC in > ustcomm_recv_fds_unix_sock(). > Thanks very much!!! > zhenyu.ren >> ------------------------------------------------------------------ >> ????zhenyu.ren via lttng-dev <lttng-dev at lists.lttng.org> >> ?????2022?3?10?(???) 11:24 >> ????Mathieu Desnoyers <mathieu.desnoyers at efficios.com> >> ? ??lttng-dev <lttng-dev at lists.lttng.org> >> ? ??[lttng-dev] ?????? ??? ??? ??? shm leak in traced application? >>> When this happpens, is the process holding a single (or very few) shm file >> > references, or references to many shm files ? >> It is holding "all" of shm files' reference , neither a single one nor some few >> ones. >> In fact, yesterday, I tried to fix it as the following and it seems work. >> --- a/lttng-ust/libringbuffer/shm.c >> +++ b/lttng-ust/libringbuffer/shm.c >> @@ -32,7 +32,6 @@ >> #include <lttng/align.h> >> #include <limits.h> >> #include <helper.h> >> - >> /* >> * Ensure we have the required amount of space available by writing 0 >> * into the entire buffer. Not doing so can trigger SIGBUS when going >> @@ -122,6 +121,12 @@ struct shm_object *_shm_object_table_alloc_shm(struct >> shm_object_table *table, >> /* create shm */ >> shmfd = stream_fd; >> + if (shmfd >= 0) { >> + ret = fcntl(shmfd, F_SETFD, FD_CLOEXEC); >> + if (ret < 0) { >> + PERROR("fcntl shmfd FD_CLOEXEC"); >> + } >> + } >> ret = zero_file(shmfd, memory_map_size); >> if (ret) { >> PERROR("zero_file"); >> @@ -272,15 +277,22 @@ struct shm_object *shm_object_table_append_shm(struct >> shm_object_table *table, >> obj->shm_fd = shm_fd; >> obj->shm_fd_ownership = 1; >> + if (shm_fd >= 0) { >> + ret = fcntl(shm_fd, F_SETFD, FD_CLOEXEC); >> + if (ret < 0) { >> + PERROR("fcntl shmfd FD_CLOEXEC"); >> + //goto error_fcntl; >> + } >> + } >> ret = fcntl(obj->wait_fd[1], F_SETFD, FD_CLOEXEC); >> if (ret < 0) { >> As it shows, wait_fd[1] has been set FD_CLOEXEC by fcntl() but not shm_fd. Why >> your patch do with wait_fd but not shm_fd? As far as I know, wait_fd is just a >> pipe and it seems not related to shm resource. >> ------------------------------------------------------------------ >> ????Mathieu Desnoyers <mathieu.desnoyers at efficios.com> >> ?????2022?3?10?(???) 00:46 >> ????zhenyu.ren <zhenyu.ren at aliyun.com> >> ? ??Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>; lttng-dev >> <lttng-dev at lists.lttng.org> >> ? ??Re: ???[lttng-dev] ??? ??? ??? shm leak in traced application? >> When this happpens, is the process holding a single (or very few) shm file >> references, or references to many >> shm files ? >> I wonder if you end up in a scenario where an application very frequently >> performs exec(), and therefore >> sometimes the exec() will happen in the window between the unix socket file >> descriptor reception and >> call to fcntl FD_CLOEXEC. >> Thanks, >> Mathieu >> ----- On Mar 8, 2022, at 8:29 PM, zhenyu.ren <zhenyu.ren at aliyun.com> wrote: >> Thanks a lot for reply. I do not reply it in bug tracker since I have not gotten >> a reliable way to reproduce the leak case. >> ------------------------------------------------------------------ >> ????Mathieu Desnoyers <mathieu.desnoyers at efficios.com> >> ?????2022?3?8?(???) 23:26 >> ????zhenyu.ren <zhenyu.ren at aliyun.com> >> ? ??Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>; lttng-dev >> <lttng-dev at lists.lttng.org> >> ? ??Re: [lttng-dev] ??? ??? ??? shm leak in traced application? >> ----- On Mar 8, 2022, at 12:18 AM, lttng-dev lttng-dev at lists.lttng.org wrote: >> > Hi, >> > In shm_object_table_append_shm()/alloc_shm()? why not calling FD_CLOEXEC fcntl() >> > to shmfds? I guess this omission leads to shm fds leak. >> Those file descriptors are created when received by ustcomm_recv_fds_unix_sock, >> and >> immediately after creation they are set as FD_CLOEXEC. >> We should continue this discussion in the bug tracker as suggested by Jonathan. >> It would greatly help if you can provide a small reproducer. >> Thanks, >> Mathieu >> > Thanks >> > zhenyu.ren >> >> ------------------------------------------------------------------ >> >> ????Jonathan Rajotte-Julien <jonathan.rajotte-julien at efficios.com> >> >> ?????2022?2?25?(???) 22:31 >> >> ????zhenyu.ren <zhenyu.ren at aliyun.com> >> >> ? ??lttng-dev <lttng-dev at lists.lttng.org> >> >> ? ??Re: [lttng-dev] ??? ??? shm leak in traced application? >> >> Hi zhenyu.ren, >> >> Please open a bug on our bug tracker and provide a reproducer against the latest >> >> stable version (2.13.x). >> >> [ https://bugs.lttng.org/ | https://bugs.lttng.org/ ] >>>> Please follow the guidelines: [ https://bugs.lttng.org/#Bug-reporting-guidelines >> >> | https://bugs.lttng.org/#Bug-reporting-guidelines ] >> >> Cheers >> >> On Fri, Feb 25, 2022 at 12:47:34PM +0800, zhenyu.ren via lttng-dev wrote: >> >> > Hi, lttng-dev team >> >>> When lttng-sessiond exits, the ust applications should call >> >>> lttng_ust_objd_table_owner_cleanup() and clean up all shm resource(unmap and >> >>> close). Howerver I do find that the ust applications keep opening "all" of the >> >> > shm fds("/dev/shm/ust-shm-consumer-81132 (deleted)") and do NOT free shm. >> >>> If we run lttng-sessiond again, ust applications can get a new piece of shm and >> >>> a new list of shm fds so double shm usages. Then if we kill lttng-sessiond, >> >>> what the mostlikely happened is ust applications close the new list of shm fds >> >>> and free new shm resource but keeping old shm still. In other word, we can not >> >> > free this piece of shm unless we killing ust applications!!! >> >>> So Is there any possilbe that ust applications failed calling >> >>> lttng_ust_objd_table_owner_cleanup()? Do you have ever see this problem? Do you >> >>> have any advice to free the shm without killling ust applications(I tried to >> >> > dig into kernel shm_open and /dev/shm, but not found any ideas)? >> >> > Thanks in advance >> >> > zhenyu.ren >> >> > ------------------------------------------------------------------ >> >> > ????zhenyu.ren via lttng-dev <lttng-dev at lists.lttng.org> >> >> > ?????2022?2?23?(???) 23:09 >> >> > ????lttng-dev <lttng-dev at lists.lttng.org> >> >> > ? ??[lttng-dev] ??? shm leak in traced application? >> >>> >"I found these items also exist in a traced application which is a long-time >> >> > >running daemon" >> >> > Even if lttng-sessiond has been killed!! >> >> > Thanks >> >> > zhenyu.ren >> >> > ------------------------------------------------------------------ >> >> > ????zhenyu.ren via lttng-dev <lttng-dev at lists.lttng.org> >> >> > ?????2022?2?23?(???) 22:44 >> >> > ????lttng-dev <lttng-dev at lists.lttng.org> >> >> > ? ??[lttng-dev] shm leak in traced application? >> >> > Hi, >> >>> There are many items such as "/dev/shm/ust-shm-consumer-81132 (deleted)" exist >> >>> in lttng-sessiond fd spaces. I know it is the result of shm_open() and >> >> > shm_unlnik() in create_posix_shm(). >> >>> However, today, I found these items also exist in a traced application which is >> >>> a long-time running daemon. The most important thing I found is that there >> >> > seems no reliable way to release share memory. >> >>> I tried to kill lttng-sessiond but not always release share memory. Sometimes I >> >>> need to kill the traced application to free share memory....But it is not a >> >> > good idea to kill these applications. >> >> > My questions are: >> >>> 1. Is there any way to release share memory without killing any traced >> >> > application? >> >>> 2. Is it normal that many items such as "/dev/shm/ust-shm-consumer-81132 >> >> > (deleted)" exist in the traced application? >> >> > Thanks >> >> > zhenyu.ren >> >> > _______________________________________________ >> >> > lttng-dev mailing list >> >> > lttng-dev at lists.lttng.org >>>> > [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev | >> >> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ] >> >> -- >> >> Jonathan Rajotte-Julien >> >> EfficiOS >> > _______________________________________________ >> > lttng-dev mailing list >> > lttng-dev at lists.lttng.org >>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev | >> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ] >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> [ http://www.efficios.com/ | http://www.efficios.com ] >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> [ http://www.efficios.com/ | http://www.efficios.com ]
--- a/lttng-ust/libringbuffer/shm.c +++ b/lttng-ust/libringbuffer/shm.c @@ -32,7 +32,6 @@ #include <lttng/align.h> #include <limits.h> #include <helper.h> - /* * Ensure we have the required amount of space available by writing 0 * into the entire buffer. Not doing so can trigger SIGBUS when going @@ -122,6 +121,12 @@ struct shm_object *_shm_object_table_alloc_shm(struct shm_object_table *table, /* create shm */ shmfd = stream_fd; + if (shmfd >= 0) { + ret = fcntl(shmfd, F_SETFD, FD_CLOEXEC); + if (ret < 0) { + PERROR("fcntl shmfd FD_CLOEXEC"); + } + } ret = zero_file(shmfd, memory_map_size); if (ret) { PERROR("zero_file"); @@ -272,15 +277,22 @@ struct shm_object *shm_object_table_append_shm(struct shm_object_table *table, obj->shm_fd = shm_fd; obj->shm_fd_ownership = 1; + if (shm_fd >= 0) { + ret = fcntl(shm_fd, F_SETFD, FD_CLOEXEC); + if (ret < 0) { + PERROR("fcntl shmfd FD_CLOEXEC"); + //goto error_fcntl; + } + } ret = fcntl(obj->wait_fd[1], F_SETFD, FD_CLOEXEC); if (ret < 0) { As it shows, wait_fd[1] has been set FD_CLOEXEC by fcntl() but not shm_fd. Why your patch do with wait_fd but not shm_fd? As far as I know, wait_fd is just a pipe and it seems not related to shm resource. ------------------------------------------------------------------ ????Mathieu Desnoyers <mathieu.desnoyers at efficios.com>