A few days ago, I wrote some code which has a fairly significant but not necessarily obvious bug. Can you spot it?
Let me set the scene:
We’re in an event-driven single-threaded application with several fairly loosely-coupled components. The main loop consists of waiting for an incoming event (e.g. from the network), finding the component(s) interested in this event, and notifying those components.
I’m adding a feature I’m calling KeepAlive, which gives components the option to be woken up by the main loop even if there aren’t any new events that concern them. This could be useful if you have to react to something not happening, for example. In JavaScript on the web, you use setTimeout()
for this sort of thing. The main loop will periodically call the central KeepAlive object and would then look something like
while keep_running():
event = wait_for_event(timeout=TIMEOUT)
if event is not None:
dispatch_event(event)
keepalive.notify_all()
Code language: Python (python)
while (keep_running()) {
if(auto event = wait_for_event(TIMEOUT); event.has_value()) {
dispatch_event(*event);
}
keepalive.notify_all();
}
Code language: C++ (cpp)
The KeepAlive class looks like this (this is where the bug is):
class KeepAlive:
"""
Global object to allow components to be woken by the main loop
if they request it.
"""
def __init__(self):
self._callbacks = {}
self._once_callbacks = []
def request_notification(self, func):
"""
Request that the function ``func`` be called at regular
intervals. Returns a handle with which to cancel the request.
"""
handle = KeepAlive.Handle(self)
self._callbacks[handle] = func
return handle
def request_single_notification(self, func):
"""
Request that the function ``func`` be called exactly once from
the main loop.
"""
handle = self.request_notification(func)
self._once_callbacks.append(handle)
def notify_all(self):
"""
Call all functions that have been requested using
``request_notification()`` or ``request_single_notification()``.
This function should generally be called from the main loop.
"""
for func in [*self._callbacks.values()]:
func()
# Make sure the single notification are not triggered again
for handle in self._once_callbacks:
handle.cancel()
self._once_callbacks = []
class Handle:
def __init__(self, keepalive):
self._keepalive = keepalive
def cancel(self):
if self in self._keepalive._callbacks:
del self._keepalive._callbacks[self]
Code language: Python (python)
/*!
Global object to allow components to be woken by the main loop
if they request it.
*/
class KeepAlive {
public:
using Callback = std::function<void()>;
class Handle;
KeepAlive() = default;
/*!
Request that the function 'func' be called at regular intervals
Returns a handle for the request. Destroying the handle cancels
the request (i.e., the function will no longer be called)
*/
[[nodiscard]] Handle request_notification(Callback func) {
auto index = ++m_last_index;
auto handle = Handle{this, index};
m_callbacks.emplace(index, func);
return handle;
}
/*!
Request that the function 'func' be called exactly once from the
main loop. Ensure that any captured references live long enough!
*/
void request_single_notification(Callback func) {
m_once_callbacks.push_back(request_notification(func));
}
/*!
Call all functions that have been requested using
request_notification() or request_single_notification().
This function should generally be called from the main loop.
*/
void notify_all() {
for (auto& [_, func] : m_callbacks) func();
// Make sure the single notifications are not triggered again
m_once_callbacks.clear();
}
class Handle {
public:
Handle() = default;
Handle(const Handle&) = delete;
Handle(Handle&& other) noexcept
: Handle{other.m_keepalive, other.m_index} {
other.m_keepalive = nullptr;
}
Handle& operator=(const Handle&) = delete;
Handle& operator=(Handle&& other) noexcept {
cancel();
m_keepalive = other.m_keepalive;
m_index = other.m_index;
other.m_keepalive = nullptr;
return *this;
}
void cancel() {
if (m_keepalive != nullptr) {
m_keepalive->m_callbacks.erase(m_index);
m_keepalive = nullptr;
}
}
~Handle() { cancel(); }
private:
friend class KeepAlive;
Handle(KeepAlive* keepalive, std::int64_t index)
: m_keepalive{keepalive}, m_index{index} { }
KeepAlive* m_keepalive{};
std::int64_t m_index{};
};
private:
std::int64_t m_last_index{};
std::map<std::int64_t, Callback> m_callbacks;
std::vector<Handle> m_once_callbacks;
};
Code language: C++ (cpp)
The idea is that any component would have a reference to the central KeepAlive object, and would be able to call keepalive.
(or request_
) at any point to request a callback a few moments later.
The code works, at least insofar as the following test passes. However, there is a problem with the implementation that breaks certain (other, untested) use cases quite badly.
keepalive = KeepAlive()
counter = 0
def increment():
global counter
counter += 1
# Test that a callback is called
handle = keepalive.request_notification(increment)
assert counter == 0
keepalive.notify_all()
assert counter == 1
keepalive.notify_all()
assert counter == 2
# Test that the callback is no longer called after being cancelled
handle.cancel()
keepalive.notify_all()
assert counter == 2
# Test that "once" means "once"
counter = 0
keepalive.request_single_notification(increment)
assert counter == 0
keepalive.notify_all()
assert counter == 1
keepalive.notify_all()
keepalive.notify_all()
keepalive.notify_all()
assert counter == 1
Code language: Python (python)
auto keepalive = KeepAlive{};
int counter = 0;
// test that a callback is called
auto handle = keepalive.request_notification([&]{ ++counter; });
assert(counter == 0);
keepalive.notify_all();
assert(counter == 1);
keepalive.notify_all();
assert(counter == 2);
// test that the callback is no longer called after being cancelled
handle.cancel();
keepalive.notify_all();
assert(counter == 2);
// test that "once" means "once"
counter = 0;
keepalive.request_single_notification([&]{ ++counter; });
assert(counter == 0);
keepalive.notify_all();
assert(counter == 1);
keepalive.notify_all();
keepalive.notify_all();
keepalive.notify_all();
assert(counter == 1);
Code language: C++ (cpp)
Can you find the problem with this code? And – if you have found the issue – how would you resolve it?
Hint
This is how I wanted to use the class:
A “retry” component could use the following logic (in pseudo-code):
function maybe_retry_later
if task completed
success!
else if current time is past deadline
retry!
else
# check again later
keepalive.request_single_notification(maybe_retry_later)
end if
end function
Code language: plaintext (plaintext)
The function maybe_retry_later would be called to ensure, asynchronously, that some task is eventually completed. Think about how this example would play out and see if you can see the bug now.
Big hint (a failing test case)
This test case fails, thereby demonstrating the bug:
counter = 0
def increment():
global counter
counter += 1
def increment_twice():
increment()
keepalive.request_single_notification(increment)
keepalive.request_single_notification(increment_twice)
keepalive.notify_all()
assert counter == 1
keepalive.notify_all()
assert counter == 2
keepalive.notify_all()
assert counter == 2
Code language: Python (python)
int counter = 0;
keepalive.request_single_notification([&] {
++counter;
keepalive.request_single_notification([&] { ++counter; });
});
keepalive.notify_all();
assert(counter == 1);
keepalive.notify_all();
assert(counter == 2);
keepalive.notify_all();
assert(counter == 2);
Code language: C++ (cpp)