Middleware, development tools, realtime operating system
software and services for superior embedded design
 
 
Home
QNX Community Resources
Technical Articles
View Article - Who Gives a Blit?

 

Who Gives a Blit?

by John Fehr

If you're writing an application that draws raw images into a window, you want to use the fastest blit possible. In this article, I'll describe three ways to blit images using the QNX® Neutrino® - from slowest to fastest. I'll assume you don't want any dithering, blending, or anything complex; you just want the image to be drawn in your window as quickly as possible. I'll also assume your display is in 16-bit mode. (32-bit mode is much the same, except the size of the image buffer is twice as big, and the pixel format is different.)


Method 1: Memory blit


The first way to blit is to use one of two image-drawing routines: PgDrawImage or PgDrawImagemx. The only difference is that PgDrawImagemx can detect and use a shared memory pointer instead of a normal malloced memory pointer. (More on shared memory later.)
With PgDrawImagemx, however, data isn't physically copied into the draw buffer until the draw buffer is flushed with PgFlush(). So if you call PgDrawImagemx(), change the image data, then call PgFlush(), you'll see the changed image data, not the original. So for this purpose, it's easiest to put a PgFlush() call immediately after the PgDrawImagemx() function call.

Let's make a program that creates a bunch of images that we'll blit using the methods I'll be outlining in the article. We'll start with the first:

 

 #include <stdio.h>
 #include <photon/PtWidget.h>
 #include <photon/PtWindow.h>
 #include <photon/PdDirect.h>

 // we'll be testing three different types of blits, defaulting to the first
 static long blittype=0;

 // this structure will contain our image data
 typedef struct
 {
  unsigned short *buffer;   // raw RGB565 data
  long width;               // width of image
  long height;              // height of image
  long pitch;               // pitch (bytes per row) of image
 } CoolImage;

 #define NUMIMAGES 48        // # images in our animation
 #define WH 256              // width & height (we'll use a square buffer)
 #define RAD (WH>>1)         // 1/2 the width/height for computing animation
 #define REPS 100            // how many times we want to blit each image

 // returns the allocated buffer
 CoolImage *AllocBuffer(long w,long h)
 {
  CoolImage *i=(CoolImage*)malloc(sizeof(*i));

  if (!i) return 0;

  // the width/height are always what we're passed in
  i->width=w;
  i->height=h;

  // our blit type 0 is a straight memory blit
  if (blittype==0)
  {
   i->pitch=w*2;
   if (i->buffer=(unsigned short*)malloc(w*h*2))
    return i;
  }

  // if we fail, free the CoolImage structure, and return 0
  free(i);
  return 0;
 }

 // this function frees the image given
 void FreeBuffer(CoolImage *i)
 {
  // for blit type 0, we just free the memory previously malloced
  if (blittype==0)
   free(i->buffer);

  // free the structure as well
  free(i);
 }

 // this function blits the given buffer using our blit type method
 void BlitBuffer(PtWidget_t *win,CoolImage *i)
 {
  // For blit type 0, we use PgDrawImagemx(). We have to make sure
  // to set the region to the windows region first.  Don't forget
  // to flush! :)
  if (blittype==0)
  {
   PhPoint_t pos={0,0};
   PhDim_t size={i->width,i->height};
   PgSetRegion(PtWidgetRid(win));
   PgDrawImagemx(i->buffer,0,&pos,&size,i->pitch,0);
   PgFlush();
  }
 }

 main(int argc,char *argv[])
 {
  CoolImage *images[NUMIMAGES];
  int x,y;
  int i,j;
  PtWidget_t *win;
  PtArg_t args[3];
  PhDim_t dim={WH,WH};
  PhPoint_t pos={50,50};

  // if a paramater was passed, grab it as the blit type
  if (argc>1) blittype=atoi(argv[1]);

  // initialize our connection to Photon, and create/realize a window
  PtInit("/dev/photon");
  PtSetArg(&args[0],Pt_ARG_POS,&pos,0);
  PtSetArg(&args[1],Pt_ARG_DIM,&dim,0);
  win=PtCreateWidget(PtWindow,Pt_NO_PARENT,2,args);
  PtRealizeWidget(win);

  // Allocate and fill a series of NUMIMAGES images with a little
  // fading type animation.  Put your own animation in here if you like.
  for (i=0;i<NUMIMAGES;i++)
  {
   images[i]=AllocBuffer(WH,WH);
   if (!images[i])
   {
    printf("Couldn't allocate image %d, try setting NUMIMAGES to %d\n",i,i);
    exit(0);
   }
   for (y=0;y<RAD;y++)
    for (x=0;x<RAD;x++)
    {
     int val=((x<y)?x:y);
     val+=(i*RAD)/NUMIMAGES;
     val=(val*128)/RAD;
     if (val&0x40) val=0x3f-(val&0x3f);
     images[i]->buffer[(y*(images[i]->pitch>>1))+x]=
     images[i]->buffer[(((WH-1)-y)*(images[i]->pitch>>1))+x]=
     images[i]->buffer[(y*(images[i]->pitch>>1))+((WH-1)-x)]=
     images[i]->buffer[(((WH-1)-y)*(images[i]->pitch>>1))+((WH-1)-x)]=
      (val&0x3f)<<5;
    }
  }

  // blit the NUMIMAGES images REPS times.
  for (j=0;j<REPS;j++)
   for (i=0;i<NUMIMAGES;i++)
    BlitBuffer(win,images[i]);

  printf("Blitted %d frames using method %d\n",REPS*NUMIMAGES,blittype);

  // now free the images
  for (i=0;i<NUMIMAGES;i++)
   FreeBuffer(images[i]);

  /// hide the window and destroy it.
  PtUnrealizeWidget(win);
  PtDestroyWidget(win);
 }


Assuming we call this file blit.c, we can compile and link it with the command qcc blit.c -o blit -lph. Then, we can run it by typing ./blit. Check it out... A neat little fading-green animation. It actually blit pretty fast (depending on your machine's CPU). To get an fps count, divide the number of frames blitted (4800 if you didn't change the REPS or NUMIMAGES defines) by the 'real' time value returned when you run time ./blit. On my machine, I get about 278 fps. Now let's see if we can top it!


Method 2: Shared memory blit


One of the problems with the method first outlined is that the blitting is done by the graphics driver, which doesn't reside in the same process space as the app. The entire image is copied into a message that's sent to the graphics driver (or several messages, depending on how big your message buffer is), which picks off the image data and blits it. But there is a way for two processes to share memory: use shared memory!
For those of you who are new to shared memory, PgShmemCreate() and PgShmemDelete() allocate and free shared memory. Luckily, you can use the same function you did before to blit it to your window: PgDrawImagemx(). The QNX Photon? microGUI is smart enough to figure out that the pointer you're passing to PgDrawImagemx() is a shared memory pointer, so it only has to pass the shared memory reference to the image data instead of the whole image.

The best way to illustrate the difference is with some code. Add the following just before free(i) in the AllocBuffer() function:

 

  else
  if (blittype==1)
  {
   i->pitch=w*2;
   if (i->buffer=(unsigned short*)PgShmemCreate(w*h*2,NULL))
    return i;
  }

The NULL in the PgShmemCreate() call tells the Photon library to make up its own unique name for the shared memory. You can specify your own, but to me, that sounds too much like work.

You'll also need to free the memory, of course. Add the following just before free(i) in FreeBuffer():

 

  else
  if (blittype==1)
   PgShmemDestroy(i->buffer);

Since the blit uses the same function call to blit, you just need to change the


  if (blittype==0)

line in the BlitBuffer() function to


  if (blittype==0 || blittype==1)

and you're done! Recompile it, and type:

time ./blit 0
time ./blit 1

Notice any difference in the times? The second one should take less than half as long as the first. A quick fps calculation on my machine shows I went from 278 fps to 629 fps! Nice improvement, eh?

If you open another terminal and type ls /dev/shmem while blit is running, you'll see all the shared memory that was allocated for the task. This directory is quite handy if you want to check for shared memory leaks. Try hitting <ctrl><c> in the terminal you're running ./blit 1 from - while blit is still running - and then type ls /dev/shmem. You'll notice that the shared memory hasn't been destroyed. You can free it up yourself simply by typing rm /dev/shmem/*. Be careful, though: this will free up all of the shared memory currently allocated, so if you're running another application with its own shared memory, you could get into trouble. In a real application, you might want to go to the trouble of naming your shared memory so you can easily see which entries in /dev/shmem are yours.


Method 3: Video memory blit


OK, you've fixed the message-passing bottleneck. Now what can you do to speed this up even more? One current problem: although our images are now static, the graphics driver still has to keep copying the image from shared memory into the graphics cards' video memory. So let's skip the shared memory use altogether, and just use video memory.
Introducing PdCreateOffscreenContext(), PdGetOffscreenContextPtr(), PhDCRelease(), and PgContextBlit(). How do they work? PdCreateOffscreenContext() creates an offscreen context (video RAM), PdGetOffscreenContextPtr() returns a pointer to the video memory for the given offscreen context (which you can modify), PhDCRelease() releases that offscreen context, and PgContextBlit() blits the offscreen context into the another context (in this case, your window).

Let's try it out. First, add an extra variable to variable to our CoolImage structure:

 

  PdOffscreenContext_t *ctx;

Then, add the following just before free(i) in the AllocBuffer() function:


  else
  if (blittype==2)
  {
   if (i->ctx=PdCreateOffscreenContext(0,w,h,Pg_OSC_MEM_PAGE_ALIGN))
   {
    i->pitch=i->ctx->pitch;
    i->buffer=PdGetOffscreenContextPtr(i->ctx);
    return i;
   }
  }

Now, add the following just before free(i) in the FreeBuffer() function:

 

  else
  if (blittype==2)
   PhDCRelease(i->ctx);

Finally, add your new blit function just before the closing } in the BlitBuffer() function:

 

  else
  if (blittype==2)
  {
   PhRect_t r={{0,0},{i->width,i->height}};
   PgSetRegion(PtWidgetRid(win));
   PgContextBlit(i->ctx,&r,NULL,&r);
   PgFlush();
  }

Recompile it, and run it with ./blit 2. If you get an error asking you to lower NUMIMAGES, try changing the NUMIMAGES definition to that suggested, and run it again. When it works, type:

time ./blit 1
time ./blit 2

WOW! Mine went from 629 fps to 3153 fps! If yours is anything like mine, the image in the window doesn't even look square anymore - it looks like the top of the fading box is narrower than the bottom. Have we hit lightspeed?

What's actually happening is that the blit is finishing so fast, your monitor can't keep up. By the time the monitor is finished drawing the first line of a given image, we've already blitted the next image, so it then uses the second line from the next image for the second line in our window, and so on... Or, we might have blitted two or more images in that time. The point is, we won't see our entire blitted image because each component is being blitted so fast! :)

Offscreen context 'gotchas'

Offscreen context does have some limitations, however. First, you're limited to your video card's memory - including what it's already using for the current display. There's no such thing as virtual video memory, so if you try to grab more than the graphics card has, it fails.

Second, buffers determine the depth of your display. If your app has 16-bit image data that it wants to blit onto a 32-bit display, you'll have to convert the image to 32 bits before you can use it in an offscreen context.

Finally, the user can very easily "dirty" your context by switching screen resolution or depth. When this happens, your offscreen context is no longer valid, you'll lose the data it contained, and the blit won't work. (No blit, Sherlock.) This means we'd have to recreate the offscreen contexts for the current screen mode.

There's also a problem if your app is running on one machine, but drawing to another machine's display: PdCreateOffscreenContext() will give us an offscreen context, but PdGetOffscreenContextPtr() will NOT return a pointer. This is because the actual video memory is on the remote machine's video card, not the local card. Instead, you should create a PhImage_t structure, put your image into that structure, and draw that PhImage_t to your offscreen context:

 

 PhImage_t image;
 PhDrawContext_t *odc;
 PdOffscreenContext_t *ctx;

 My_Create_Image_Function(&image);
 ctx=PdCreateOffscreenContext(0,image.size.w,image.size.h,Pg_OSC_MEM_PAGE_ALIGN);
 odc=PhDCSetCurrent(ctx);
 PgDrawPhImage(image);
 PgFlush();
 PhDCSetCurrent(odc);

My_Create_Image_Function() would be your function that initializes the given image with whatever you wanted to blit.

Also note that although the offscreen context blit is much faster than the other two methods, creating the offscreen context is quite a bit more expensive, so should be done as little as possible. (By this, I mean you should create the offscreen context in the initialization part of your code if possible, rather than in a portion that is frequently executed.)


Method 4?


I hope this helps you figure out the best way to do your blits. There is one other way to get images to the screen: video overlay. Slightly faster then using an offscreen context, video overlay has in some ways more flexibility, and in others, more limitations. These could become the subject of a future article.