GPR Introduction

The General Purpose Raw (GPR) is 12-bit raw image coding format that is based on Adobe DNG® standard. Image compression is a balance of speed, file size and photo quality, and typically one can only choose two. GPR was designed to provide a better tradeoff for all three parameters than what’s possible with DNG or any other raw format. The intention of GPR is not to compete with DNG, rather to be as close as possible to DNG. This guarantees compatibility with applications that already understand DNG, but provide an alternate compression scheme in situations where compression and encoding/decoding speed matter.

Action cameras, like that from GoPro, have limited computing resources, so ability to compress data using fewest CPU cycles matters. File sizes matter because GoPro cameras can record thousands of images very quickly using timelapse and burst mode features. As the world shifts from desktop to mobile, people now shoot and process more and more photos on smartphones which are always limited on storage space and bandwidth. And last but not the least, image quality matters because we want GPR to provide visually transparent image quality when compared to uncompressed DNG. All this combined enables customers to capture DSLR-class image quality in a GPR file that has nearly same size as JPEG, on a camera that is as small and rugged as a GoPro.

DNG allows storage of RAW sensor data in three main formats: uncompressed, lossless JPEG or lossy JPEG. Lossless mode typically achieves 2:1 compression that is clearly not enough in the mobile-first age. Lossy mode uses the 8x8 DCT transform that was developed for JPEG more than 25 years ago (when photo resolutions were much smaller), achieving compression ratios around 4:1. In comparison, GPR achieves typical compression ratios between 10:1 and 4:1. This is due to Full-Frame Wavelet Transform (FFWT). FFWT has a few nice properties compared to DCT:

The wavelet codec in GPR is not new, but has been a SMPTE® standard under the name VC-5. VC-5 shares a lot of technical barebones with the CineForm®, an open and cross-platform intermediate codec designed for high-resolution video editing.

File Types

Following file types are discussed in this document:

Included Within This Repository

License Terms

GPR is licensed under either:

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Quick Start for Developers

Setup Source Code

Clone the project from Github (git clone https://github.com/gopro/gpr). You will need CMake version 3.5.1 or better to compile source code.

Compiling Source Code

Run following commands:

$ mkdir build
$ cd build
$ cmake ../

For Xcode, use command line switch -G Xcode. On Windows, CMake should automatically figure out installed version of Visual Studio and generate corresponding project files.

Mac build instructions, after running the above:

$ make .
$ ./source/app/gpr_tools/gpr_tools

Linux build instructions, after running the above:

$ make 
$ ./source/app/gpr_tools/gpr_tools

Using gpr_tools

Some example commands are shown below:

Convert GPR to DNG:

$ gpr_tools -i INPUT.GPR -o OUTPUT.DNG

Extract RAW from GPR:

$ gpr_tools -i INPUT.GPR -o OUTPUT.RAW

Convert DNG to GPR:

$ gpr_tools -i INPUT.DNG -o OUTPUT.GPR

Analyze a GPR (or even DNG) and output parameters that define DNG metadata to a file:

$ gpr_tools -i INPUT.GPR -d 1 > PARAMETERS.TXT

Read RAW pixel data, along with parameters that define DNG metadata and apply to an output GPR (or DNG) file:

$ gpr_tools -i INPUT.RAW -o OUTPUT.DNG -a PARAMETERS.TXT

Read GPR file and output PPM preview:

$ gpr_tools -i INPUT.GPR -o OUTPUT.PPM

Read GPR file and output JPG preview:

$ gpr_tools -i INPUT.GPR -o OUTPUT.JPG

For a complete list of commands, please refer to data/tests/run_tests.sh

Using vc5_encoder_app

vc5_encoder_app is an optional tool that can be used to convert RAW image data to VC5 essence, as shown below:

$ vc5_encoder_app -i INPUT.RAW -o OUTPUT.VC5 -w 4000 -h 3000 -p 8000

Using vc5_decoder_app

vc5_decoder_app is an optional tool that can be used to decode VC5 essence into RAW image data, as shown below:

$ vc5_decoder_app -i INPUT.VC5 -o OUTPUT.RAW

Source code organization

Folder structure

Source code is organized inside source folder. All library and sdk code is located in lib folder, while all tools and applications that use lib are located in app folder.

The lib folder is made up of following folders:

The app folder is made up of following folders:

Important Defines

Here are some important compile time definitions:

GPR-SDK API

GPR-SDK API is defined in header files in the following folders:

An application needs to include above two folders in order to access API. API defines various functions named gpr_convert_XXX_to_XXX which convert from one format to another. As an example, GPR to DNG conversion is done from gpr_convert_gpr_to_dng. When output file is GPR or DNR, gpr_parameters structure has to be specified. Fields in this structure map to DNG metadata tags, and we have tried to abstract low-level DNG details in a very clean and easy to use structure.

Compression Technology

Wavelet Transforms

The wavelet used within VC-5 is a 2D three-level 2-6 Wavelet. If you look up wavelets on Wikipedia, prepare to get confused fast. Wavelet compression of images is fairly simple if you don’t get distracted by the theory. The wavelet is a one dimensional filter that separates low frequency data from high frequency data, and the math is simple. For each two pixels in an image simply add them (low frequency):

For high frequency it can be as simple as the difference of the same two pixels:

For a 2-6 wavelet this math is for the high frequency:

The math doesn’t get much more complex than that.

To wavelet compress a monochrome frame (color can be compressed as separate monochrome channels), we start with a 2D array of pixels (a.k.a image.)

If you store data with low frequencies (low pass) on the left and the high frequencies (high pass) on the right you get the image below. A low pass image is basically the average, and high pass image is like an edge enhance.

You repeat the same operation vertically using the previous output as the input image.

Resulting in a 1 level 2D wavelet:

For a two level wavelet, you repeat the same horizontal and vertical wavelet operations of the top left quadrant to provide:

Repeating again for the third level.

Quantization

All that grey is easy to compress. The reason there is very little information in these high frequency regions is that the high frequency data of the image has been quantized. The human eye is not very good at seeing subtle changes in high frequency regions, so this is exploited by scaling the high-frequency samples before they are stored:

Entropy Coding

After the wavelet and quantization stages, you have the same number of samples as the original source. The compression is achieved as the samples are no longer evenly distributed (after wavelet and quantization.) There are many many zeros and ones, than higher values, so we can store all these values more efficiently, often up to 10 times more so.

Run length

The output of the quantization stage has a lot of zeros, and many in a row. Additional compression is achieved by counting runs of zeros, and storing them like: a “z15” for 15 zeros, rather than “0,0,0,0,0,0,0,0,0,0,0,0,0,0,0”

Variable length coding

After all previous steps, the high frequency samples are stored with a variable length coding scheme using Huffman coding. A table then maps sample values to codewords with differing bit lengths where most common codewords are expressed in few bits and rare codewords are expressed in larger bits.

The lack of complexity is what makes VC-5 fast. Low pass filter is just addition. High pass filter is 6 tap where all coefficients are rational numbers and no multiplication or division is required. Variable length coding can be implemented with a lookup table, an approach that is faster than other entropy coding techniques.

To Decode

Reverse all the steps.

Thumbnail and Preview Generation

A nice property of the Wavelet codec is scalability support: i.e. various resolutions ranging from original coded resolution to one-sixteenth resolution are encoded and can be retrieved efficiently. Scalability means that extracting lowest resolution is fastest and cost of extracting resolutions increases as resolution goes up. For application scenarios where rendering a smaller resolution suffices, a decoder can very cheaply extract lower resolution. Common examples of this use case are thumbnail previews in file browsers or rendering image on devices with smaller resolution e.g. mobile phones.

Scalability is more efficient than decoding full resolution image, performing demosaicing and downsampling. To avoid demosaic, DNG allows mechanism to store a separate thumbnail and preview image (often encoded in JPG). File browers use thumbnail, while preview is useful for rendering higher resolution version of image. Since these are separately enoded images and do not exploit compression amongst each other or with original RAW image, file sizes add up quickly.

As an example, GoPro Hero6 Black captures 4000x3000 RAW image in Bayer RGGB format. Red, blue and two green channels are split up and separately encoded into wavelet resolutions of 2000x1500 (or 2:1), 1000x750 (or 4:1), 500x375 (or 8:1). The Low-Low band of lowest resolution wavelet is a 250x188 (or 16:1) image, and it is stored uncompressed (generating thumbnail is essentially a memory copy). RGB images at other resolutions can be obtained at successive complexity levels, without performing demosaicing. To illustrate this, decoding speed of various resolutions is measured and shown using gpr_tools (in milliseconds inside square brackets).

Decode GPR to 8-bit PPM (250x188)
[    6-ms] [BEG] gpr_convert_gpr_to_rgb() gpr.cpp (line 1695)
[   16-ms] [END] gpr_convert_gpr_to_rgb() gpr.cpp (line 1738)
Decode GPR to 8-bit PPM (500x375)
[    6-ms] [BEG] gpr_convert_gpr_to_rgb() gpr.cpp (line 1695)
[   37-ms] [END] gpr_convert_gpr_to_rgb() gpr.cpp (line 1738)
Decode GPR to 8-bit PPM (1000x750)
[    5-ms] [BEG] gpr_convert_gpr_to_rgb() gpr.cpp (line 1695)
[  130-ms] [END] gpr_convert_gpr_to_rgb() gpr.cpp (line 1738)
Decode GPR to 8-bit PPM (2000x1500)
[    5-ms] [BEG] gpr_convert_gpr_to_rgb() gpr.cpp (line 1695)
[  357-ms] [END] gpr_convert_gpr_to_rgb() gpr.cpp (line 1738)

And here is the output of full GPR to DNG decoding.

Decode GPR to DNG
[    6-ms] [BEG] gpr_convert_gpr_to_dng() gpr.cpp (line 1748)
[  422-ms] [END] gpr_convert_gpr_to_dng() gpr.cpp (line 1768)

To summarize, here are speed gain factors over full resolution DNG decoding:

Resolution 250x188 500x375 1000x750 2000x1500
Speed factor 41.6x 13.4x 3.3x 1.2x

Demosaicing has a higher complexity than GPR decoding, so numbers for RGB output after demosaic will be higher. Similar speed improvements can also be seen when writing JPG file.

GoPro and CineForm are trademarks of GoPro, Inc.
DNG, Photoshop and Lightroom is trademarks of Adobe Inc.