diff mbox series

回复:回复: 回复:回复: 回复: 回复: 回复: shm leak in traced application?

Message ID 09fecb83-300d-4941-9316-fd3b71b9b807.zhenyu.ren@aliyun.com
State New
Headers show
Series 回复:回复: 回复:回复: 回复: 回复: 回复: shm leak in traced application? | expand

Commit Message

zhenyu.ren March 11, 2022, 2:08 a.m. UTC
Hi, Mathieu and Jonathan

    I am sorry for that. I should provide you with ust version in the first place.In fact ,we choose lttng to provide tracing feature long time ago so that we stick to a very old version i.e. 2.7. In fact, there was some chances that we reported some issues with version provided ,but got the similar answers just to upgrade to the lastest software(I know you have not maintained the old version any longer). It is very diffcult for us to upgrade the ust to a new version since it is linked into so many production apps. I think the 2.7 ust is roubust engough and only need some littile fixes,just like this time ,I need a single patch to  ustcomm_recv_fds_unix_sock(). Again I am very very sorry for you take so much time to think our cases. Lttng is the best trace toolsets in the world.

Thanks
zhenyu.ren
------------------------------------------------------------------
????Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
?????2022?3?10?(???) 22:41
????zhenyu.ren <zhenyu.ren at aliyun.com>
????lttng-dev <lttng-dev at lists.lttng.org>
????Re: ???[lttng-dev] ?????? ??? ??? ??? shm leak in traced application?

Hi Zhenyu,

This is exactly why Jonathan and I asked you to fill a bug report on the bug tracker
and follow the bug reporting guidelines (https://lttng.org/community/#bug-reporting-guidelines).

This saves time for everyone.

Thanks,

Mathieu

----- On Mar 9, 2022, at 11:24 PM, zhenyu.ren <zhenyu.ren at aliyun.com> wrote:

Oh, I see. I have an old ust(2.7). So I have no FD_CLOEXEC in ustcomm_recv_fds_unix_sock(). 

Thanks very much!!!
zhenyu.ren
------------------------------------------------------------------
????zhenyu.ren via lttng-dev <lttng-dev at lists.lttng.org>
?????2022?3?10?(???) 11:24
????Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
????lttng-dev <lttng-dev at lists.lttng.org>
????[lttng-dev] ?????? ??? ??? ??? shm leak in traced application?

>When this happpens, is the process holding a single (or very few) shm file references, or references to many shm files ?

It is holding "all" of shm files' reference , neither a single one nor some few ones.

In fact, yesterday, I tried to fix it as the following and it seems work.

????Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>; lttng-dev <lttng-dev at lists.lttng.org>
????Re: ???[lttng-dev] ??? ??? ??? shm leak in traced application?

When this happpens, is the process holding a single (or very few) shm file references, or references to many
shm files ?

I wonder if you end up in a scenario where an application very frequently performs exec(), and therefore
sometimes the exec() will happen in the window between the unix socket file descriptor reception and
call to fcntl FD_CLOEXEC.

Thanks,

Mathieu

----- On Mar 8, 2022, at 8:29 PM, zhenyu.ren <zhenyu.ren at aliyun.com> wrote:
Thanks a  lot for reply. I do not reply it in bug tracker since I have not gotten a reliable way to reproduce the leak case. 
------------------------------------------------------------------
????Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
?????2022?3?8?(???) 23:26
????zhenyu.ren <zhenyu.ren at aliyun.com>
????Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>; lttng-dev <lttng-dev at lists.lttng.org>
????Re: [lttng-dev] ??? ??? ??? shm leak in traced application?



----- On Mar 8, 2022, at 12:18 AM, lttng-dev lttng-dev at lists.lttng.org wrote:

> Hi,
> In shm_object_table_append_shm()/alloc_shm()? why not calling FD_CLOEXEC fcntl()
> to shmfds? I guess this omission leads to shm fds leak.

Those file descriptors are created when received by ustcomm_recv_fds_unix_sock, and
immediately after creation they are set as FD_CLOEXEC.

We should continue this discussion in the bug tracker as suggested by Jonathan.
It would greatly help if you can provide a small reproducer.

Thanks,

Mathieu


> Thanks
> zhenyu.ren

>> ------------------------------------------------------------------
>> ????Jonathan Rajotte-Julien <jonathan.rajotte-julien at efficios.com>
>> ?????2022?2?25?(???) 22:31
>> ????zhenyu.ren <zhenyu.ren at aliyun.com>
>> ? ??lttng-dev <lttng-dev at lists.lttng.org>
>> ? ??Re: [lttng-dev] ??? ??? shm leak in traced application?

>> Hi zhenyu.ren,

>> Please open a bug on our bug tracker and provide a reproducer against the latest
>> stable version (2.13.x).

>> https://bugs.lttng.org/

>> Please follow the guidelines: https://bugs.lttng.org/#Bug-reporting-guidelines

>> Cheers

>> On Fri, Feb 25, 2022 at 12:47:34PM +0800, zhenyu.ren via lttng-dev wrote:
>> > Hi, lttng-dev team
>>> When lttng-sessiond exits, the ust applications should call
>>> lttng_ust_objd_table_owner_cleanup() and clean up all shm resource(unmap and
>>> close). Howerver I do find that the ust applications keep opening "all" of the
>> > shm fds("/dev/shm/ust-shm-consumer-81132 (deleted)") and do NOT free shm.
>>> If we run lttng-sessiond again, ust applications can get a new piece of shm and
>>> a new list of shm fds so double shm usages. Then if we kill lttng-sessiond,
>>> what the mostlikely happened is ust applications close the new list of shm fds
>>> and free new shm resource but keeping old shm still. In other word, we can not
>> > free this piece of shm unless we killing ust applications!!!
>>> So Is there any possilbe that ust applications failed calling
>>> lttng_ust_objd_table_owner_cleanup()? Do you have ever see this problem? Do you
>>> have any advice to free the shm without killling ust applications(I tried to
>> > dig into kernel shm_open and /dev/shm, but not found any ideas)?

>> > Thanks in advance
>> > zhenyu.ren



>> > ------------------------------------------------------------------
>> > ????zhenyu.ren via lttng-dev <lttng-dev at lists.lttng.org>
>> > ?????2022?2?23?(???) 23:09
>> > ????lttng-dev <lttng-dev at lists.lttng.org>
>> > ? ??[lttng-dev] ??? shm leak in traced application?

>>> >"I found these items also exist in a traced application which is a long-time
>> > >running daemon"
>> > Even if lttng-sessiond has been killed!!

>> > Thanks
>> > zhenyu.ren
>> > ------------------------------------------------------------------
>> > ????zhenyu.ren via lttng-dev <lttng-dev at lists.lttng.org>
>> > ?????2022?2?23?(???) 22:44
>> > ????lttng-dev <lttng-dev at lists.lttng.org>
>> > ? ??[lttng-dev] shm leak in traced application?

>> > Hi,
>>> There are many items such as "/dev/shm/ust-shm-consumer-81132 (deleted)" exist
>>> in lttng-sessiond fd spaces. I know it is the result of shm_open() and
>> > shm_unlnik() in create_posix_shm().
>>> However, today, I found these items also exist in a traced application which is
>>> a long-time running daemon. The most important thing I found is that there
>> > seems no reliable way to release share memory.
>>> I tried to kill lttng-sessiond but not always release share memory. Sometimes I
>>> need to kill the traced application to free share memory....But it is not a
>> > good idea to kill these applications.
>> > My questions are:
>>> 1. Is there any way to release share memory without killing any traced
>> > application?
>>> 2. Is it normal that many items such as "/dev/shm/ust-shm-consumer-81132
>> > (deleted)" exist in the traced application?

>> > Thanks
>> > zhenyu.ren



>> > _______________________________________________
>> > lttng-dev mailing list
>> > lttng-dev at lists.lttng.org
>> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

>> --
>> Jonathan Rajotte-Julien
>> EfficiOS
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
diff mbox series

Patch

--- a/lttng-ust/libringbuffer/shm.c
+++ b/lttng-ust/libringbuffer/shm.c
@@ -32,7 +32,6 @@ 
 #include <lttng/align.h>
 #include <limits.h>
 #include <helper.h>
-
 /*
  * Ensure we have the required amount of space available by writing 0
  * into the entire buffer. Not doing so can trigger SIGBUS when going
@@ -122,6 +121,12 @@  struct shm_object *_shm_object_table_alloc_shm(struct shm_object_table *table,
        /* create shm */

        shmfd = stream_fd;
+    if (shmfd >= 0) {
+     ret = fcntl(shmfd, F_SETFD, FD_CLOEXEC);
+     if (ret < 0) {
+   PERROR("fcntl shmfd FD_CLOEXEC");
+     }
+    }
        ret = zero_file(shmfd, memory_map_size);
        if (ret) {
                PERROR("zero_file");
@@ -272,15 +277,22 @@  struct shm_object *shm_object_table_append_shm(struct shm_object_table *table,
        obj->shm_fd = shm_fd;
        obj->shm_fd_ownership = 1;

+    if (shm_fd >= 0) {
+     ret = fcntl(shm_fd, F_SETFD, FD_CLOEXEC);
+     if (ret < 0) {
+   PERROR("fcntl shmfd FD_CLOEXEC");
+   //goto error_fcntl;
+     }
+    }
        ret = fcntl(obj->wait_fd[1], F_SETFD, FD_CLOEXEC);
        if (ret < 0) {

    As it shows, wait_fd[1] has been set FD_CLOEXEC by fcntl() but not shm_fd. Why your patch do with wait_fd but not shm_fd? As far as I know, wait_fd is just a pipe and it seems not related to shm resource.





------------------------------------------------------------------
????Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
?????2022?3?10?(???) 00:46
????zhenyu.ren <zhenyu.ren at aliyun.com>