Thursday, February 28, 2019

Cursing at ncurses

I've just spent the last hour and 45 minutes debugging a really strange problem I encountered after updating my server to Ubuntu 18.04. Whenever I use ncurses-based programs within tmux, sometimes the text would display strangely, as if the cursor were in the wrong place. Now, it's not immediately obvious who is at fault here - there are three completely different terminal emulators at play here. The program I am running is ncdu (ncurses disk usage), which sends its output to tmux, which is running under mosh, and is finally rendered by xterm.

I was quickly able to determine that the problem must lie between ncdu and tmux, because switching tmux to a different window and switching back did not fix the problem. I also verified this by doing tmux capture-pane -p, and it indicated that everything I was seeing was exactly now tmux was drawing it.

So what changed? Well, ncdu and ncurses both got upgraded. I'm still using my old custom-compiled tmux, though, so something must have changed in the way ncdu communicates. I suspected one of two things: either something else was writing to the terminal, or ncurses had been upgraded to a version that is more "clever" about the way it draws the screen. Luckily, tmux has a feature to log all terminal output to a file, so I was able to replay it. I wrote a quick Python script to slowly dump the log to the terminal, byte by byte, so I can pinpoint what it is sending when the problem occurs:

What I found out was that when I press Page Down, it was sending a command to scroll a region of the screen: \033[11S. However, tmux didn't seem to understand this instruction, and just ignored it. I verified this by looking at the source itself.

So what could I do? I could just use the normal version of tmux from the Ubuntu repository. It's a newer version that does support the scroll command, but the reason I switched back to the old version is because the newer version was crashing. I'd rather use an older, stable version than deal with crashes that take out all my sessions. So the only chance is to convice ncdu not to send the scroll command. Since ncdu uses ncurses, which reads a database of command codes for different terminal types. What I ended up having to do was:

  1. Download ncurses.
  2. Edit misc/terminfo.src, and remove indn=\E[%p1%dS, from the screen section.
  3. Run tic terminfo.src to convert it to the binary format read by ncurses.

This is the kind of problem that I enjoy solving. A subtle bug, which doesn't endanger my data, but is just annoying enough to fix, and has a root cause that is easy to understand - and not just a stupid mistake on my part.

No comments: