Message ID | 20171006182247.6874-2-jonathan.rajotte-julien@efficios.com |
---|---|
State | Accepted, archived |
Delegated to: | Jérémie Galarneau |
Headers | show |
Series | [lttng-tools,v3,1/2] lttng-relayd: use tcp keep-alive mechanism to detect dead-peer | expand |
Phil, any comments on this patch? Jérémie On 6 October 2017 at 14:22, Jonathan Rajotte <jonathan.rajotte-julien at efficios.com> wrote: > Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com> > --- > doc/man/lttng-relayd.8.txt | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/doc/man/lttng-relayd.8.txt b/doc/man/lttng-relayd.8.txt > index 218526a1..e37ce845 100644 > --- a/doc/man/lttng-relayd.8.txt > +++ b/doc/man/lttng-relayd.8.txt > @@ -162,6 +162,25 @@ ENVIRONMENT VARIABLES > `LTTNG_RELAYD_HEALTH`:: > Path to relay daemon health's socket. > > +`LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE`:: > + Set to 1 to enable the use of tcp keep-alive allowing the detection of dead > + peers. > + > +`LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME`:: > + See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on Solaris 11. > + A value of -1 lets the operating system manage this parameter (default). > + > +`LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES`:: > + See tcp(7) tcp_keepalive_probes. > + A value of -1 lets the operating system manage this parameter (default). > + > +NOTE: Not supported on Solaris. > + > +`LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`:: > + See tcp(7) tcp_keepalive_intvl. > + A value of -1 lets the operating system manage this parameter (default). > + > +NOTE: Not supported on Solaris. > > FILES > ----- > -- > 2.11.0 >
On 6 October 2017 at 14:22, Jonathan Rajotte <jonathan.rajotte-julien at efficios.com> wrote: > Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com> > --- > doc/man/lttng-relayd.8.txt | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/doc/man/lttng-relayd.8.txt b/doc/man/lttng-relayd.8.txt > index 218526a1..e37ce845 100644 > --- a/doc/man/lttng-relayd.8.txt > +++ b/doc/man/lttng-relayd.8.txt > @@ -162,6 +162,25 @@ ENVIRONMENT VARIABLES > `LTTNG_RELAYD_HEALTH`:: > Path to relay daemon health's socket. > > +`LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE`:: > + Set to 1 to enable the use of tcp keep-alive allowing the detection of dead > + peers. I think we should add an explanation of why this option matters. I would enable the keep-alive mechanism by default considering the fairly severe problems users can experience without this mechanism in place (fd exhaustion and what amounts to a memory leak) when connections are dropped. Given this, it would make sense to change the variable to 'LTTNG_RELAYD_TCP_KEEP_ALIVE' which the user can set to 0/1 to override the default policy (default to 'on'). > + > +`LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME`:: > + See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on Solaris 11. > + A value of -1 lets the operating system manage this parameter (default). Pointing the user to tcp(7) is okay, but a sentence to explain what this setting does would be helpful. "The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes." I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME. Please add the time unit expected in this variable. Also, what would 0 mean here? While I agree with -1 == OS default, I'm wondering if we should be more aggressive. The Linux default of two hours seems like an awfully long time... Let's keep the platform defaults for now and I'll do some research to see what would be a reasonable value here if we decide to shorten the interval. > + > +`LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES`:: > + See tcp(7) tcp_keepalive_probes. > + A value of -1 lets the operating system manage this parameter (default). Same comment about the expected time unit, meaning and valid/expected values. I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT. > + > +NOTE: Not supported on Solaris. Is this option supported by Cygwin? > + > +`LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`:: > + See tcp(7) tcp_keepalive_intvl. > + A value of -1 lets the operating system manage this parameter (default). Same comment about the expected time unit, meaning and valid/expected values. I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBE_INTERVAL. > + > +NOTE: Not supported on Solaris. Same comment about Cygwin. > > FILES > ----- > -- > 2.11.0 >
diff --git a/doc/man/lttng-relayd.8.txt b/doc/man/lttng-relayd.8.txt index 218526a1..e37ce845 100644 --- a/doc/man/lttng-relayd.8.txt +++ b/doc/man/lttng-relayd.8.txt @@ -162,6 +162,25 @@ ENVIRONMENT VARIABLES `LTTNG_RELAYD_HEALTH`:: Path to relay daemon health's socket. +`LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE`:: + Set to 1 to enable the use of tcp keep-alive allowing the detection of dead + peers. + +`LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME`:: + See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on Solaris 11. + A value of -1 lets the operating system manage this parameter (default). + +`LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES`:: + See tcp(7) tcp_keepalive_probes. + A value of -1 lets the operating system manage this parameter (default). + +NOTE: Not supported on Solaris. + +`LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`:: + See tcp(7) tcp_keepalive_intvl. + A value of -1 lets the operating system manage this parameter (default). + +NOTE: Not supported on Solaris. FILES -----
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com> --- doc/man/lttng-relayd.8.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)