Hacking Up an RGB Framebuffer Driver for Wii-Linux

Hacks II – Partial Update

So we now have a kernel framebuffer driver that solves the YUY2 format issue for all other programs, which is much cleaner and easier to maintain than patching every program that requires direct framebuffer access. But what about performance? We are doing 50 or 60, depending on the TV signal type, full frame conversions every second, even when little or nothing on the screen has changed, as is true for a significant part of typical desktop usage. In other words, we are wasting precious CPU cycles on a low-end platform.

Obviously, only converting parts of the screen that have changed from previous frame will be much more economical. It is impossible for the kernel driver to keep track of such changes without keeping a copy of previous screen in memory and making a pixel-for-pixel comparison at each vtrace, so it’s a trade-off between memory and efficiency. Whether such partial updates are really better would require some benchmarking results to demonstrate.

In the previous section, when we allocated the virtual framebuffer, we used the same size of the physical framebuffer, which is actually double the size of maximum frame size. We can therefore just store a copy of the current frame in the virtual framebuffer after the actual frame data. At next vtrace, we can compare the new frame against stored old frame in 2-pixel pairs (because of the YUY2 format ties them together). Pairs that are unchanged will be ignored and pairs that have changed will be stored in the copy framebuffer as well as converted into physical framebuffer. The implementation of such a partial update is actually very trivial:

@@ -1256,14 +1256,20 @@ static void vi_dispatch_vtrace(struct vi
 	unsigned int height = info->var.yres;
 	uint32_t *addr0 = (uint32_t *)info->screen_base;
 	uint32_t *addr1 = fb_mem;
+	uint32_t *addr2 = addr0 + width * height;

 	while (height--) {
 		int j = width;
 		while (j--) {
-			*(addr1 + j) = rgbrgb16toycbycr(*(addr0 + j));
+			uint32_t k = *(addr0 + j);
+			if ( k != *(addr2 + j)) {
+				*(addr2 + j) = k;
+				*(addr1 + j) = rgbrgb16toycbycr(k);
+			}
 		}
 		addr0 += width;
 		addr1 += width;
+		addr2 += width;
 	}

 	spin_lock_irqsave(&ctl->lock, flags);

Does partial update actually improve performance? Let’s see some benchmark figures. We use the x11per utility and compare results of a few selected and definitely non-representative benchmarks (full benchmarks available here). Note that we also include the ‘cube’ X driver, even though it only works in X.

x11perf test physical YUY2 physical YUY2
with cube driver
virtual RGB565 virtual RGB565
with partial update
Dot 4060000.0/sec 2940000/sec 2070000.0/sec 2960000.0/sec
1×1 rectangle 1450000.0/sec 1070000/sec 743000.0/sec 1110000.0/sec
10×10 rectangle 207000.0/sec 134000.0/sec 102000.0/sec 136000.0/sec
100×100 rectangle 4370.0/sec 2770/sec 2300.0/sec 2470.0/sec
500×500 rectangle 179.0/sec 94.7/sec 84.4/sec 86.9/sec
Copy 10×10 from window to window 61500.0/sec 17900.0/sec 31000.0/sec 43600.0/sec
Copy 100×100 from window to window 3240.0/sec 817.0/sec 1460.0/sec 2130.0/sec
Copy 500×500 from window to window 133.0/sec 74.9/sec 52.3/sec 74.7/sec

You can see that YUY2-RGB565 conversions take a heavy toll on performance, regardless of where the conversion takes place: at the X video driver level for ‘cube’ and at the kernel framebuffer driver level for ‘virtual RGB565′. Introduction of partial update significantly improves the performance of ‘virtual RGB565′ framebuffer driver across the board, in some benchmarks by about 30~50%. Keep in mind that this will only hold true for use cases where large areas of the display are unchanged; otherwise, for instance in full screen video playback,  there will be very little ‘partial’ updating and the extra comparison and backup procedures will only decrease performance even more. Another point worth noting is that the comparison between ‘cube’ X driver and ‘virtual RGB565 with partial update’ is rather mixed with ‘cube’ having an advantage in a large part of the benchmarks, most likely because being an X video driver, ‘cube’ can benefit from X window manager’s built-in partial update mechanisms.

There are other possible improvements that could be made to the driver. For instance, since we are using a virtual framebuffer anyway, there is not much point in sticking to 2-bytes-per-pixel RGB565, which is cumbersome to handle. Why not just use (A)RGB8888 (alpha not used), to both simplify the conversion arithmetics in the driver and make pixel operations easier for other programs? This would of course use twice as much memory but still could be worth it. On the other hand, if memory conservation is of uttermost importance, one can stick to RGB565 and put the frame backup in the unused area of the physical framebuffer, saving 720K.

YUY2 is not without its merits compared to RGB formats, though, especially for video playback, because most encoded videos use planar YUV colour space, which is way easier to convert to YUY2 than RGB. GeeXboX for Wii takes advantage of this fact and renders in YUY2 directly onto Wii’s framebuffer. Ideally, the framebuffer driver should be able to work both in fake RGB mode and real YUY2 mode, switching between the two according to either a boot parameter, or a /proc entry value. Even more ideally, the driver should be able to pass area(s) of the virtual RGB framebuffer without conversion to the actual YUY2 framebuffer according to /proc entry values. This would basically consititute a crude window management system. Such a ‘mosaic’ or ‘hybrid’ mode probably would improve playing back of YUV video from within RGB GUI.

Problem Solved?

Although the approach outlined here will enable ALL programs to display correct colours in Linux on Wii, it does not solve the fundamental drawback of the YUY2 format: adjacent odd and even pixels are not completely independent, because they share the same chroma (colour) values and differ only in luma (brightness) values. If you pay close attention to the display, you will notice that vertical borders between different colours are often not as sharp as they should be and look as if there are shadows being dropped. This ‘issue’ will never be solved, unless you are willing to sacrifice half the horizontal resolution and view everything stretched 2X horizontally.

Goodies for the Patient

Pre-compiled Kernels

Hotfile

These are ready made 2.6.32.41 kernels plus corresponding kernel modules.

You must have a working Linux for Wii installation to be able to use these. Please note that if you are using X and the ‘cube’ driver, you will have to manually switch to ‘fbdev’ driver to get correct display. If you don’t have X installed yet and want to try the new driver in X, just install xf86-video-fbdev (package name xserver-xorg-video-fbdev in Debian-based systems) along with your preferred desktop environment.

The compressed archive contains both IOS (under ‘apps’ folder) and MINI (under ‘ppcboot’ folder) versions, the latter including versions for different TV signal types. All kernels’ gcnfb drivers are given boot arguments to do overscan compensation. To disable this (to enlarge screen size), you have to hexedit the kernel boot arguments and add ‘nostalgic’ to gcnfb module options.

All kernels assume the root filesystem to be located on second partition of front slot SD (Whiite style). If your root is elsewhere, you will have to manually hexedit the kernel boot arguments and change the value of ‘root’ parameter.

After extracting the kernel modules, remember to run ‘depmod -a’ to enable them.

BTW, the kernels all include Con Kolivas‘s BFS scheduler.

Kernel Patch

This is only for those trying to compile their own kernels. If you just want to use Linux on your Wii, you can safely ignore this.

Patch

This entry was posted in Linux, Wii and tagged , , , , , . Bookmark the permalink.

63 Responses to Hacking Up an RGB Framebuffer Driver for Wii-Linux

  1. fishears says:

    Me again :)
    Do you happen to have a copy of the BFS patch you used? Con Kolivas isn’t hosting anything pre 3.0 now. I’ve looked for one myself but they all seem to rely on different source files than the ones I have (and the ones on kernel.org – which is weird).
    Thanks if have. No worries if you don’t.

  2. fishears says:

    Not sure if you still read these but I’ve just got back into Wii Linux for writing fiction on. I’ve got your 480p kernel running with fbdev and its fantastic having no problems with colour and also having the screen fit my TV. I got b43 loading (add to /etc/modules) and now all that’s missing is swap on RVL-MEM2. I was using it just fine on the previous mikep5 2.6.32 kernel but now its gone. I’ve set up a swap file for now (which is ok but slow). Can you tell me if MEM2 is available as a device in your kernel and if so, how to get at it (it doesn’t list in /proc/devices)?
    THANK YOU

  3. DeltaLink says:

    Which driver would be best for console (TTY) use with a framebuffer (with no X) in your opinion? I’m looking for a driver with YUV colour that requires the least CPU for use with a resource heavy GUI console application. Do any of the drivers allow for resolution or bit depth reduction by default or with little modification?

    • DeltaLink says:

      I mean YUY2 colour space not YUV.

    • farter says:

      Hi. Both drivers are actually quite similar, the real difference being that the unmodified wii linux driver only hardcodes color conversion for text console, while the modified RGB driver converts everything. Neither supports bit depth or resolution change, but you can choose to use centered small area for display.

      The fastest approache IMO is to make your app directly output YUY2 images onto kernel framebuffer. If you need resolution or bit depth reduction, do it in your app too. This is how GeeXboX for Wii output images.

      Alternatively, you could use devkitppc to develop Wii mode app. I’m not sure, but devkit apps seem to be able to harness some GPU capabilities.

  4. -DarkAceZ- says:

    OK, thanks! I had already extracted the modules.zip, but forgot to run depmod.
    Is there anyway to make it automaticly do modprobe b43 on bot? I noticed that it trys to start an internet connection, but because the hardware isn’t started it can’t do anything.