Friday 25 February 2011

Importance of the scheduling priority in D-Bus

Context switches are expensive. When do they happen in D-Bus communication?

application1 sends a D-Bus message to application2 through dbus-daemon

application1 sends a D-Bus message to application2 through dbus-daemon

The kernel may perform a context switch:

  • During the sendmsg() system call, when application1 sends a D-Bus message to dbus-daemon
  • During the sendmsg() system call, when dbus-daemon sends a D-Bus message to application2
  • At the end of a time slice

I instrumented libdbus in order to trace events such as message parsing, dispatching messages in dbus-daemon,  sendmsg(), recvmsg(). I took inspiration from Federico’s performance scripts but without using strace because straced processes have a big overhead compared to the duration of events I want to measure. My instrumentation doesn’t save traced events in a file to avoid overhead as much as possible, but just keep them in memory and save them after the experiment is finished. This is triggered:

  1. when a process using libdbus exits. The save function is registered by atexit().
  2. when a process using libdbus receives a SIGUSR2 signal. The save function is registered by sigaction().

I collect logs generated by dbus-daemon and all D-Bus peers and feed them to plot-timeline.py to generate timelines. I estimate the overhead of logging between 3 microseconds and 5 microseconds by logging consecutively 2 events. All the experiments are run in a virtual machine with one processor.

Experiment #1: a D-Bus signal delivered to 10 recipients

I sent one D-Bus signal with dbus-send and received by 10 dbus-monitor. You can reuse this script.

In this experiment, dbus-send was context switched off during the send() system call and that system call returned after dbus-daemon received the message, sent it to the 10 recipients, and all of them received it.

We can see dbus-daemon iterating over every recipients and sending the message on each recipient’s socket.

There is 12 context switches in this experiment:

  • 1 context switch from the sender to dbus-daemon
  • 10 context switches to each recipient
  • 1 final context switch to the sender

Experiment #2: 10 D-Bus signal delivered to 10 recipients

It is the same experiment as before but the sender emits 10 D-Bus signals (as fast as it could) instead of a single one. dbus-send cannot do that, so I used dbus-ping-pong.

The first three D-Bus signal are delivered in the same way; this timeline looks like the first timeline. There is also 12 context switches per D-Bus signal emitted.

But something different happen for the 7 next D-Bus signals: dbus-daemon is context switched off during the send() on every recipient.

The scheduler tries to maintain fairness along all the processes by giving them the same amount of CPU time. But dbus-daemon has about 10 times more work to do than every single recipient because it sends 10 messages and the recipients only receive 1 message. So when dbus-daemon writes on a recipient’s socket, there is two processes in the running state and the scheduler has to take a decision between a CPU hungry process (dbus-daemon) and a recipient. The recipient is chosen and it causes a context switches.

The context switches per D-Bus signal emitted are:

  • 1 context switch from the sender to dbus-daemon
  • 1 context switch from dbus-daemon to the first recipient
  • 1 context switch from the first recipient back to dbus-daemon
  • again 2 times 9 context switches between dbus-daemon and the remaining recipients
  • 1 context switch from dbus-daemon to the sender

That’s 22 context switches in total instead of 12 for the same amount of work and the same amount of messages delivered.

The experiment was run in 0.036121 seconds.

Experiment #3: 10 D-Bus signal delivered to 10 recipients with a high priority dbus-daemon

With the command renice, I change the priority of dbus-daemon to -15.

In this case, the kernel chose the first scheduling strategy with 12 context switches per D-Bus signal emitted. The experiment was run in 0.019320 seconds. It is 1.8 times faster.

Most D-Bus messages don’t have so many recipients. But still, it is interesting to see how much the priority of dbus-daemon influences the performance.

On Debian, dbus-daemons (session and system bus) run with the default priority (nice level: 0). On the N900, they run with a higher priority (nice level: -5).

Experiment #4: details of dispatching

Let’s zoom on dbus-daemon’s dispatching for two different sizes of D-Bus signals:277 bytes and 17kB.

277 bytes D-Bus signal:

  • Total: 524 microseconds
  • Receiving: 36 microseconds (7%)
  • Dispatch: 164 microseconds (31%)
    • 79 microseconds in match rules
  • Sending: 324 microseconds (62%)

17kB D-Bus signal:

  • Total: 1026 microseconds
  • Receiving: 413 microseconds (40%)
  • Dispatch: 230 microseconds (22%)
    • 79 microseconds in match rules
  • Sending: 383 microseconds (37%)

No comments:

Post a Comment