diff mbox series

call_rcu: Fix race between rcu_barrier() and call_rcu_data_free()

Message ID 20201022223021.GA8535@paulmck-ThinkPad-P72
State New
Headers show
Series call_rcu: Fix race between rcu_barrier() and call_rcu_data_free() | expand

Commit Message

Paul E. McKenney Oct. 22, 2020, 10:30 p.m. UTC
The current code can lose RCU callbacks at shutdown time, which can
result in hangs.  This lossage can happen as follows:

o       A thread invokes call_rcu_data_free(), which executes up through
        the wake_call_rcu_thread().  At this point, the call_rcu_data
        structure has been drained of callbacks, but is still on the
        call_rcu_data_list.  Note that this thread does not hold the
        call_rcu_mutex.

o       Another thread invokes rcu_barrier(), which traverses the
        call_rcu_data_list under the protection of call_rcu_mutex,
        a list which still includes the above newly drained structure.
        This thread therefore adds a callback to the newly drained
        call_rcu_data structure.  It then releases call_rcu_mutex and
        enters a mystifying loop that does futex stuff.

o       The first thread finishes executing call_rcu_data_free(),
        which acquires call_rcu_mutex just long enough to remove the
        newly drained call_rcu_data structure from call_rcu_data_list.
        Which causes one of the rcu_barrier() invocation's callbacks to
        be leaked.

o       The second thread's rcu_barrier() invocation never returns
        resulting in a hang.

This commit therefore changes call_rcu_data_free() to acquire
call_rcu_mutex before checking the call_rcu_data structure for callbacks.
In the case where there are no callbacks, call_rcu_mutex is held across
both the check and the removal from call_rcu_data_list, thus preventing
rcu_barrier() from adding a callback in the meantime.  In the case where
there are callbacks, call_rcu_mutex must be momentarily dropped across
the call to get_default_call_rcu_data(), which can itself acquire
call_rcu_mutex.  This momentary drop is not a problem because any
callbacks that rcu_barrier() might queue during that period of time will
be moved to the default call_rcu_data structure, and the lock will be
held across the full time including moving those callbacks and removing
the call_rcu_data structure that was passed into call_rcu_data_free()
from call_rcu_data_list.

With this fix, a several-hundred-CPU test successfully completes more
than 5,000 executions.  Without this fix, it fails within a few tens
of executions.  Although the failures happen more quickly on larger
systems, in theory this could happen on a single-CPU system, courtesy
of preemption.

Signed-off-by: Paul E. McKenney <paulmck at kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Cc: Stephen Hemminger <stephen at networkplumber.org>
Cc: Alan Stern <stern at rowland.harvard.edu>
Cc: Lai Jiangshan <jiangshanlai at gmail.com>
Cc: <lttng-dev at lists.lttng.org>
Cc: <linux-kernel at vger.kernel.org>

---

 urcu-call-rcu-impl.h |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Mathieu Desnoyers Oct. 26, 2020, 7:58 p.m. UTC | #1
----- On Oct 22, 2020, at 6:30 PM, paulmck paulmck at kernel.org wrote:

> The current code can lose RCU callbacks at shutdown time, which can
> result in hangs.  This lossage can happen as follows:
> 
> o       A thread invokes call_rcu_data_free(), which executes up through
>        the wake_call_rcu_thread().  At this point, the call_rcu_data
>        structure has been drained of callbacks, but is still on the
>        call_rcu_data_list.  Note that this thread does not hold the
>        call_rcu_mutex.
> 
> o       Another thread invokes rcu_barrier(), which traverses the
>        call_rcu_data_list under the protection of call_rcu_mutex,
>        a list which still includes the above newly drained structure.
>        This thread therefore adds a callback to the newly drained
>        call_rcu_data structure.  It then releases call_rcu_mutex and
>        enters a mystifying loop that does futex stuff.
> 
> o       The first thread finishes executing call_rcu_data_free(),
>        which acquires call_rcu_mutex just long enough to remove the
>        newly drained call_rcu_data structure from call_rcu_data_list.
>        Which causes one of the rcu_barrier() invocation's callbacks to
>        be leaked.
> 
> o       The second thread's rcu_barrier() invocation never returns
>        resulting in a hang.
> 
> This commit therefore changes call_rcu_data_free() to acquire
> call_rcu_mutex before checking the call_rcu_data structure for callbacks.
> In the case where there are no callbacks, call_rcu_mutex is held across
> both the check and the removal from call_rcu_data_list, thus preventing
> rcu_barrier() from adding a callback in the meantime.  In the case where
> there are callbacks, call_rcu_mutex must be momentarily dropped across
> the call to get_default_call_rcu_data(), which can itself acquire
> call_rcu_mutex.  This momentary drop is not a problem because any
> callbacks that rcu_barrier() might queue during that period of time will
> be moved to the default call_rcu_data structure, and the lock will be
> held across the full time including moving those callbacks and removing
> the call_rcu_data structure that was passed into call_rcu_data_free()
> from call_rcu_data_list.
> 
> With this fix, a several-hundred-CPU test successfully completes more
> than 5,000 executions.  Without this fix, it fails within a few tens
> of executions.  Although the failures happen more quickly on larger
> systems, in theory this could happen on a single-CPU system, courtesy
> of preemption.

I agree with this fix, will merge in liburcu master, stable-0.12, and stable-2.11.
Out of curiosity, which test is hanging ?  Is it a test which is part of the liburcu
tree or some out-of-tree test ? I wonder why we did not catch it in our CI [1].

Thanks,

Mathieu

[1] https://ci.lttng.org/view/Liburcu/

> 
> Signed-off-by: Paul E. McKenney <paulmck at kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> Cc: Stephen Hemminger <stephen at networkplumber.org>
> Cc: Alan Stern <stern at rowland.harvard.edu>
> Cc: Lai Jiangshan <jiangshanlai at gmail.com>
> Cc: <lttng-dev at lists.lttng.org>
> Cc: <linux-kernel at vger.kernel.org>
> 
> ---
> 
> urcu-call-rcu-impl.h |    7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/src/urcu-call-rcu-impl.h b/src/urcu-call-rcu-impl.h
> index b6ec6ba..18fd65a 100644
> --- a/src/urcu-call-rcu-impl.h
> +++ b/src/urcu-call-rcu-impl.h
> @@ -772,9 +772,13 @@ void call_rcu_data_free(struct call_rcu_data *crdp)
> 		while ((uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOPPED) == 0)
> 			(void) poll(NULL, 0, 1);
> 	}
> +	call_rcu_lock(&call_rcu_mutex);
> 	if (!cds_wfcq_empty(&crdp->cbs_head, &crdp->cbs_tail)) {
> -		/* Create default call rcu data if need be */
> +		call_rcu_unlock(&call_rcu_mutex);
> +		/* Create default call rcu data if need be. */
> +		/* CBs queued here will be handed to the default list. */
> 		(void) get_default_call_rcu_data();
> +		call_rcu_lock(&call_rcu_mutex);
> 		__cds_wfcq_splice_blocking(&default_call_rcu_data->cbs_head,
> 			&default_call_rcu_data->cbs_tail,
> 			&crdp->cbs_head, &crdp->cbs_tail);
> @@ -783,7 +787,6 @@ void call_rcu_data_free(struct call_rcu_data *crdp)
> 		wake_call_rcu_thread(default_call_rcu_data);
> 	}
> 
> -	call_rcu_lock(&call_rcu_mutex);
> 	cds_list_del(&crdp->list);
>  	call_rcu_unlock(&call_rcu_mutex);
Paul E. McKenney Oct. 26, 2020, 8:53 p.m. UTC | #2
On Mon, Oct 26, 2020 at 03:58:11PM -0400, Mathieu Desnoyers wrote:
> ----- On Oct 22, 2020, at 6:30 PM, paulmck paulmck at kernel.org wrote:
> 
> > The current code can lose RCU callbacks at shutdown time, which can
> > result in hangs.  This lossage can happen as follows:
> > 
> > o       A thread invokes call_rcu_data_free(), which executes up through
> >        the wake_call_rcu_thread().  At this point, the call_rcu_data
> >        structure has been drained of callbacks, but is still on the
> >        call_rcu_data_list.  Note that this thread does not hold the
> >        call_rcu_mutex.
> > 
> > o       Another thread invokes rcu_barrier(), which traverses the
> >        call_rcu_data_list under the protection of call_rcu_mutex,
> >        a list which still includes the above newly drained structure.
> >        This thread therefore adds a callback to the newly drained
> >        call_rcu_data structure.  It then releases call_rcu_mutex and
> >        enters a mystifying loop that does futex stuff.
> > 
> > o       The first thread finishes executing call_rcu_data_free(),
> >        which acquires call_rcu_mutex just long enough to remove the
> >        newly drained call_rcu_data structure from call_rcu_data_list.
> >        Which causes one of the rcu_barrier() invocation's callbacks to
> >        be leaked.
> > 
> > o       The second thread's rcu_barrier() invocation never returns
> >        resulting in a hang.
> > 
> > This commit therefore changes call_rcu_data_free() to acquire
> > call_rcu_mutex before checking the call_rcu_data structure for callbacks.
> > In the case where there are no callbacks, call_rcu_mutex is held across
> > both the check and the removal from call_rcu_data_list, thus preventing
> > rcu_barrier() from adding a callback in the meantime.  In the case where
> > there are callbacks, call_rcu_mutex must be momentarily dropped across
> > the call to get_default_call_rcu_data(), which can itself acquire
> > call_rcu_mutex.  This momentary drop is not a problem because any
> > callbacks that rcu_barrier() might queue during that period of time will
> > be moved to the default call_rcu_data structure, and the lock will be
> > held across the full time including moving those callbacks and removing
> > the call_rcu_data structure that was passed into call_rcu_data_free()
> > from call_rcu_data_list.
> > 
> > With this fix, a several-hundred-CPU test successfully completes more
> > than 5,000 executions.  Without this fix, it fails within a few tens
> > of executions.  Although the failures happen more quickly on larger
> > systems, in theory this could happen on a single-CPU system, courtesy
> > of preemption.
> 
> I agree with this fix, will merge in liburcu master, stable-0.12, and stable-2.11.
> Out of curiosity, which test is hanging ?  Is it a test which is part of the liburcu
> tree or some out-of-tree test ? I wonder why we did not catch it in our CI [1].

The hung test was from perfbook [1] in the CodeSamples/datastruct/hash
directory.  A repeat-by is as follows:

# Have userspace RCU preinstalled as you wish.
git clone git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git
cd CodeSamples
make pthreads
cd datastruct/hash
make
time for ((i = 0; i < 2000; i++)); do echo $i; ./hash_bkt_rcu --schroedinger --nreaders 444 --nupdaters 4 --duration 1000 --updatewait 1 --nbuckets 262144 --elems/writer 65536; done

This normally hangs within a few tens of iterations.  With this patch,
the passes more than 6,000 iterations.

I have smaller tests that produce this same hang on my 12-CPU laptop,
but with much lower probability.  Here is one example that did hang on
my laptop, and which could be placed into a similar bash loop as above:

hash_bkt_rcu --schroedinger --nreaders 10 --nupdaters 2 --duration 1000 --updatewait 1 --nbuckets 8192 --elems/writer 4096

But I don't have a good estimate of the hang probability, except a
suspicion that it is lower than would be convenient for a CI test.
Attaching to the hung process using gdb did confirm the type of hang,
however.

It might be possible to create a focused test that races rcu_barrier()
against thread exit, where threads are created and exit repeatedly,
and make a per-thread call_rcu() worker in the meantime..

Thoughts?

							Thanx, Paul

[1]	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git

> Thanks,
> 
> Mathieu
> 
> [1] https://ci.lttng.org/view/Liburcu/
> 
> > 
> > Signed-off-by: Paul E. McKenney <paulmck at kernel.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> > Cc: Stephen Hemminger <stephen at networkplumber.org>
> > Cc: Alan Stern <stern at rowland.harvard.edu>
> > Cc: Lai Jiangshan <jiangshanlai at gmail.com>
> > Cc: <lttng-dev at lists.lttng.org>
> > Cc: <linux-kernel at vger.kernel.org>
> > 
> > ---
> > 
> > urcu-call-rcu-impl.h |    7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/urcu-call-rcu-impl.h b/src/urcu-call-rcu-impl.h
> > index b6ec6ba..18fd65a 100644
> > --- a/src/urcu-call-rcu-impl.h
> > +++ b/src/urcu-call-rcu-impl.h
> > @@ -772,9 +772,13 @@ void call_rcu_data_free(struct call_rcu_data *crdp)
> > 		while ((uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOPPED) == 0)
> > 			(void) poll(NULL, 0, 1);
> > 	}
> > +	call_rcu_lock(&call_rcu_mutex);
> > 	if (!cds_wfcq_empty(&crdp->cbs_head, &crdp->cbs_tail)) {
> > -		/* Create default call rcu data if need be */
> > +		call_rcu_unlock(&call_rcu_mutex);
> > +		/* Create default call rcu data if need be. */
> > +		/* CBs queued here will be handed to the default list. */
> > 		(void) get_default_call_rcu_data();
> > +		call_rcu_lock(&call_rcu_mutex);
> > 		__cds_wfcq_splice_blocking(&default_call_rcu_data->cbs_head,
> > 			&default_call_rcu_data->cbs_tail,
> > 			&crdp->cbs_head, &crdp->cbs_tail);
> > @@ -783,7 +787,6 @@ void call_rcu_data_free(struct call_rcu_data *crdp)
> > 		wake_call_rcu_thread(default_call_rcu_data);
> > 	}
> > 
> > -	call_rcu_lock(&call_rcu_mutex);
> > 	cds_list_del(&crdp->list);
> >  	call_rcu_unlock(&call_rcu_mutex);
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
Mathieu Desnoyers Oct. 27, 2020, 2:47 p.m. UTC | #3
----- On Oct 26, 2020, at 4:53 PM, paulmck paulmck at kernel.org wrote:

> On Mon, Oct 26, 2020 at 03:58:11PM -0400, Mathieu Desnoyers wrote:
>> ----- On Oct 22, 2020, at 6:30 PM, paulmck paulmck at kernel.org wrote:
>> 
>> > The current code can lose RCU callbacks at shutdown time, which can
>> > result in hangs.  This lossage can happen as follows:
>> > 
>> > o       A thread invokes call_rcu_data_free(), which executes up through
>> >        the wake_call_rcu_thread().  At this point, the call_rcu_data
>> >        structure has been drained of callbacks, but is still on the
>> >        call_rcu_data_list.  Note that this thread does not hold the
>> >        call_rcu_mutex.
>> > 
>> > o       Another thread invokes rcu_barrier(), which traverses the
>> >        call_rcu_data_list under the protection of call_rcu_mutex,
>> >        a list which still includes the above newly drained structure.
>> >        This thread therefore adds a callback to the newly drained
>> >        call_rcu_data structure.  It then releases call_rcu_mutex and
>> >        enters a mystifying loop that does futex stuff.
>> > 
>> > o       The first thread finishes executing call_rcu_data_free(),
>> >        which acquires call_rcu_mutex just long enough to remove the
>> >        newly drained call_rcu_data structure from call_rcu_data_list.
>> >        Which causes one of the rcu_barrier() invocation's callbacks to
>> >        be leaked.
>> > 
>> > o       The second thread's rcu_barrier() invocation never returns
>> >        resulting in a hang.
>> > 
>> > This commit therefore changes call_rcu_data_free() to acquire
>> > call_rcu_mutex before checking the call_rcu_data structure for callbacks.
>> > In the case where there are no callbacks, call_rcu_mutex is held across
>> > both the check and the removal from call_rcu_data_list, thus preventing
>> > rcu_barrier() from adding a callback in the meantime.  In the case where
>> > there are callbacks, call_rcu_mutex must be momentarily dropped across
>> > the call to get_default_call_rcu_data(), which can itself acquire
>> > call_rcu_mutex.  This momentary drop is not a problem because any
>> > callbacks that rcu_barrier() might queue during that period of time will
>> > be moved to the default call_rcu_data structure, and the lock will be
>> > held across the full time including moving those callbacks and removing
>> > the call_rcu_data structure that was passed into call_rcu_data_free()
>> > from call_rcu_data_list.
>> > 
>> > With this fix, a several-hundred-CPU test successfully completes more
>> > than 5,000 executions.  Without this fix, it fails within a few tens
>> > of executions.  Although the failures happen more quickly on larger
>> > systems, in theory this could happen on a single-CPU system, courtesy
>> > of preemption.
>> 
>> I agree with this fix, will merge in liburcu master, stable-0.12, and
>> stable-2.11.
>> Out of curiosity, which test is hanging ?  Is it a test which is part of the
>> liburcu
>> tree or some out-of-tree test ? I wonder why we did not catch it in our CI [1].
> 
> The hung test was from perfbook [1] in the CodeSamples/datastruct/hash
> directory.  A repeat-by is as follows:
> 
> # Have userspace RCU preinstalled as you wish.
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git
> cd CodeSamples
> make pthreads
> cd datastruct/hash
> make
> time for ((i = 0; i < 2000; i++)); do echo $i; ./hash_bkt_rcu --schroedinger
> --nreaders 444 --nupdaters 4 --duration 1000 --updatewait 1 --nbuckets 262144
> --elems/writer 65536; done
> 
> This normally hangs within a few tens of iterations.  With this patch,
> the passes more than 6,000 iterations.
> 
> I have smaller tests that produce this same hang on my 12-CPU laptop,
> but with much lower probability.  Here is one example that did hang on
> my laptop, and which could be placed into a similar bash loop as above:
> 
> hash_bkt_rcu --schroedinger --nreaders 10 --nupdaters 2 --duration 1000
> --updatewait 1 --nbuckets 8192 --elems/writer 4096
> 
> But I don't have a good estimate of the hang probability, except a
> suspicion that it is lower than would be convenient for a CI test.
> Attaching to the hung process using gdb did confirm the type of hang,
> however.
> 
> It might be possible to create a focused test that races rcu_barrier()
> against thread exit, where threads are created and exit repeatedly,
> and make a per-thread call_rcu() worker in the meantime..
> 
> Thoughts?

We'll try to come up with a narrowed-down test-case which reproduce the issue.
We may get some inspiration from your test-case. I'll ask Michael Jeanson
to add this to our backlog.

Thanks,

Mathieu

> 
>							Thanx, Paul
> 
> [1]	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git
> 
>> Thanks,
>> 
>> Mathieu
>> 
>> [1] https://ci.lttng.org/view/Liburcu/
>> 
>> > 
>> > Signed-off-by: Paul E. McKenney <paulmck at kernel.org>
>> > Cc: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
>> > Cc: Stephen Hemminger <stephen at networkplumber.org>
>> > Cc: Alan Stern <stern at rowland.harvard.edu>
>> > Cc: Lai Jiangshan <jiangshanlai at gmail.com>
>> > Cc: <lttng-dev at lists.lttng.org>
>> > Cc: <linux-kernel at vger.kernel.org>
>> > 
>> > ---
>> > 
>> > urcu-call-rcu-impl.h |    7 +++++--
>> > 1 file changed, 5 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/src/urcu-call-rcu-impl.h b/src/urcu-call-rcu-impl.h
>> > index b6ec6ba..18fd65a 100644
>> > --- a/src/urcu-call-rcu-impl.h
>> > +++ b/src/urcu-call-rcu-impl.h
>> > @@ -772,9 +772,13 @@ void call_rcu_data_free(struct call_rcu_data *crdp)
>> > 		while ((uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOPPED) == 0)
>> > 			(void) poll(NULL, 0, 1);
>> > 	}
>> > +	call_rcu_lock(&call_rcu_mutex);
>> > 	if (!cds_wfcq_empty(&crdp->cbs_head, &crdp->cbs_tail)) {
>> > -		/* Create default call rcu data if need be */
>> > +		call_rcu_unlock(&call_rcu_mutex);
>> > +		/* Create default call rcu data if need be. */
>> > +		/* CBs queued here will be handed to the default list. */
>> > 		(void) get_default_call_rcu_data();
>> > +		call_rcu_lock(&call_rcu_mutex);
>> > 		__cds_wfcq_splice_blocking(&default_call_rcu_data->cbs_head,
>> > 			&default_call_rcu_data->cbs_tail,
>> > 			&crdp->cbs_head, &crdp->cbs_tail);
>> > @@ -783,7 +787,6 @@ void call_rcu_data_free(struct call_rcu_data *crdp)
>> > 		wake_call_rcu_thread(default_call_rcu_data);
>> > 	}
>> > 
>> > -	call_rcu_lock(&call_rcu_mutex);
>> > 	cds_list_del(&crdp->list);
>> >  	call_rcu_unlock(&call_rcu_mutex);
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com
diff mbox series

Patch

diff --git a/src/urcu-call-rcu-impl.h b/src/urcu-call-rcu-impl.h
index b6ec6ba..18fd65a 100644
--- a/src/urcu-call-rcu-impl.h
+++ b/src/urcu-call-rcu-impl.h
@@ -772,9 +772,13 @@  void call_rcu_data_free(struct call_rcu_data *crdp)
 		while ((uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOPPED) == 0)
 			(void) poll(NULL, 0, 1);
 	}
+	call_rcu_lock(&call_rcu_mutex);
 	if (!cds_wfcq_empty(&crdp->cbs_head, &crdp->cbs_tail)) {
-		/* Create default call rcu data if need be */
+		call_rcu_unlock(&call_rcu_mutex);
+		/* Create default call rcu data if need be. */
+		/* CBs queued here will be handed to the default list. */
 		(void) get_default_call_rcu_data();
+		call_rcu_lock(&call_rcu_mutex);
 		__cds_wfcq_splice_blocking(&default_call_rcu_data->cbs_head,
 			&default_call_rcu_data->cbs_tail,
 			&crdp->cbs_head, &crdp->cbs_tail);
@@ -783,7 +787,6 @@  void call_rcu_data_free(struct call_rcu_data *crdp)
 		wake_call_rcu_thread(default_call_rcu_data);
 	}
 
-	call_rcu_lock(&call_rcu_mutex);
 	cds_list_del(&crdp->list);
 	call_rcu_unlock(&call_rcu_mutex);