Feel the power of parallel computing (OpenMP)

These two weeks, I am working on our product UI side to improve the performance of animation rendering. Previously, there is only one single thread to decode the animation line by line, and it takes around 50ms for the whole frame.

org

Now, I change the way of rendering, and let all lines parallel decode to fully take advantage of modern multi-core CPU.

new

 

Visual Studio natively supports OpenMP, it gives me a easy way to access this powerful tool.

To set this compiler option in the Visual Studio development environment

  1. Open the project’s Property Pages dialog box. For details, see How to: Open Project Property Pages.
  2. Expand the Configuration Properties node.
  3. Expand the C/C++ node.
  4. Select the Language property page.
  5. Modify the OpenMP Support property.

After some simple code update, surprisingly, I found that my frame decoding performance boosts 950% (almost 10 times faster), from 8 FPS to 76 FPS!

 

Let’s do simple test with the following code:

#define TEST_LENGTH 0x3fffffff

double mptest()
{
    LARGE_INTEGER  large_interger;
    double dff;
    __int64  c1, c2;
    QueryPerformanceFrequency(&large_interger);
    dff = large_interger.QuadPart;
    //
    unsigned char *test = new unsigned char[TEST_LENGTH];
    QueryPerformanceCounter(&large_interger);
    c1 = large_interger.QuadPart;
    #pragma omp parallel for
    for (int i = 0; i<TEST_LENGTH; i++)
    {
        test[i] = rand();
    }
    QueryPerformanceCounter(&large_interger);
    c2 = large_interger.QuadPart;
    delete test;
    return (c2 - c1) * 1000.0f / dff;
}

double test()
{
    LARGE_INTEGER  large_interger;
    double dff;
    __int64  c1, c2;
    QueryPerformanceFrequency(&large_interger);
    dff = large_interger.QuadPart;
    //
    unsigned char *test = new unsigned char[TEST_LENGTH];
    QueryPerformanceCounter(&large_interger);
    c1 = large_interger.QuadPart;
    for (int i = 0; i<TEST_LENGTH; i++)
    {
        test[i] = rand();
    }
    QueryPerformanceCounter(&large_interger);
    c2 = large_interger.QuadPart;
    delete test;
    return (c2 - c1) * 1000.0f / dff;
}

int _tmain(int argc, _TCHAR* argv[])
{
    printf("Random generation cost with MP %lfmsn", mptest());
    printf("Random generation cost without MP %lfmsn", test());
    _getch();
    return 0;
}

Look at the huge difference!

result