diff mbox series

[lttng-tools,v3,2/2] man: document dead-peer detection for relayd

Message ID 20171006182247.6874-2-jonathan.rajotte-julien@efficios.com
State Accepted, archived
Delegated to: Jérémie Galarneau
Headers show
Series [lttng-tools,v3,1/2] lttng-relayd: use tcp keep-alive mechanism to detect dead-peer | expand

Commit Message

Jonathan Rajotte Oct. 6, 2017, 6:22 p.m. UTC
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
---
 doc/man/lttng-relayd.8.txt | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Jérémie Galarneau Nov. 23, 2017, 7:03 p.m. UTC | #1
Phil, any comments on this patch?

Jérémie

On 6 October 2017 at 14:22, Jonathan Rajotte
<jonathan.rajotte-julien at efficios.com> wrote:
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> ---
>  doc/man/lttng-relayd.8.txt | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/doc/man/lttng-relayd.8.txt b/doc/man/lttng-relayd.8.txt
> index 218526a1..e37ce845 100644
> --- a/doc/man/lttng-relayd.8.txt
> +++ b/doc/man/lttng-relayd.8.txt
> @@ -162,6 +162,25 @@ ENVIRONMENT VARIABLES
>  `LTTNG_RELAYD_HEALTH`::
>      Path to relay daemon health's socket.
>
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE`::
> +    Set to 1 to enable the use of tcp keep-alive allowing the detection of dead
> +    peers.
> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME`::
> +    See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on Solaris 11.
> +    A value of -1 lets the operating system manage this parameter (default).
> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES`::
> +    See tcp(7) tcp_keepalive_probes.
> +    A value of -1 lets the operating system manage this parameter (default).
> +
> +NOTE: Not supported on Solaris.
> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`::
> +    See tcp(7) tcp_keepalive_intvl.
> +    A value of -1 lets the operating system manage this parameter (default).
> +
> +NOTE: Not supported on Solaris.
>
>  FILES
>  -----
> --
> 2.11.0
>
Jérémie Galarneau Dec. 4, 2017, 3:23 p.m. UTC | #2
On 6 October 2017 at 14:22, Jonathan Rajotte
<jonathan.rajotte-julien at efficios.com> wrote:
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> ---
>  doc/man/lttng-relayd.8.txt | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/doc/man/lttng-relayd.8.txt b/doc/man/lttng-relayd.8.txt
> index 218526a1..e37ce845 100644
> --- a/doc/man/lttng-relayd.8.txt
> +++ b/doc/man/lttng-relayd.8.txt
> @@ -162,6 +162,25 @@ ENVIRONMENT VARIABLES
>  `LTTNG_RELAYD_HEALTH`::
>      Path to relay daemon health's socket.
>
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE`::
> +    Set to 1 to enable the use of tcp keep-alive allowing the detection of dead
> +    peers.

I think we should add an explanation of why this option matters.

I would enable the keep-alive mechanism by default considering the fairly
severe problems users can experience without this mechanism in place
(fd exhaustion
and what amounts to a memory leak) when connections are dropped.

Given this, it would make sense to change the variable to
'LTTNG_RELAYD_TCP_KEEP_ALIVE' which the user can set to 0/1 to override the
default policy (default to 'on').

> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME`::
> +    See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on Solaris 11.
> +    A value of -1 lets the operating system manage this parameter (default).

Pointing the user to tcp(7) is okay, but a sentence to explain what this
setting does would be helpful.

"The number of seconds a connection needs to be idle before TCP begins
sending out keep-alive probes."

I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_IDLE_TIME.

Please add the time unit expected in this variable. Also, what would 0
mean here?

While I agree with -1 == OS default, I'm wondering if we should be
more aggressive.

The Linux default of two hours seems like an awfully long time... Let's keep
the platform defaults for now and I'll do some research to see what would be a
reasonable value here if we decide to shorten the interval.

> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES`::
> +    See tcp(7) tcp_keepalive_probes.
> +    A value of -1 lets the operating system manage this parameter (default).

Same comment about the expected time unit, meaning and valid/expected values.

I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_MAX_PROBE_COUNT.

> +
> +NOTE: Not supported on Solaris.

Is this option supported by Cygwin?


> +
> +`LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`::
> +    See tcp(7) tcp_keepalive_intvl.
> +    A value of -1 lets the operating system manage this parameter (default).

Same comment about the expected time unit, meaning and valid/expected values.

I propose renaming the variable to LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBE_INTERVAL.

> +
> +NOTE: Not supported on Solaris.

Same comment about Cygwin.


>
>  FILES
>  -----
> --
> 2.11.0
>
diff mbox series

Patch

diff --git a/doc/man/lttng-relayd.8.txt b/doc/man/lttng-relayd.8.txt
index 218526a1..e37ce845 100644
--- a/doc/man/lttng-relayd.8.txt
+++ b/doc/man/lttng-relayd.8.txt
@@ -162,6 +162,25 @@  ENVIRONMENT VARIABLES
 `LTTNG_RELAYD_HEALTH`::
     Path to relay daemon health's socket.
 
+`LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE`::
+    Set to 1 to enable the use of tcp keep-alive allowing the detection of dead
+    peers.
+
+`LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME`::
+    See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on Solaris 11.
+    A value of -1 lets the operating system manage this parameter (default).
+
+`LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES`::
+    See tcp(7) tcp_keepalive_probes.
+    A value of -1 lets the operating system manage this parameter (default).
+
+NOTE: Not supported on Solaris.
+
+`LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`::
+    See tcp(7) tcp_keepalive_intvl.
+    A value of -1 lets the operating system manage this parameter (default).
+
+NOTE: Not supported on Solaris.
 
 FILES
 -----