Comparison of SURF implementations
The SURF descriptor is a state of the art image region descriptor that is scale, orientation, and illumination invariant. By using an integral image it can be computed efficiently across different scales. In recent years it has emerged as one of the more popular and frequently used feature descriptors, but it is not a trivial algorithm to implement and several different implementations exist. The following study compares several different libraries against each other to determine their relative stability and runtime performance.
|BoofCV: SURF||10/2011||Java||Fast but less accurate. See FactoryDescribeRegionPoint.surf()|
|BoofCV: MSURF||10/2011||Java||Accurate but slower. See FactoryDescribeRegionPoint.msurf()|
|OpenCV||2.3.1 SVN r6879||C++||http://opencv.willowgarage.com/wiki/|
Benchmark Source Code:
|Lower is better.||Lower is better.|
|Higher is better.|
For sake of those with no attention span, the summary results are posted first and a discussion of testing methodology follows below. The top two plots show how fast each library is at detecting and describing features. Detection is when the location and scale of interest points are detected inside the image using each library's implementation of the Fast Hessian detector. A feature is described by estimating its orientation and computing the SURF-64 description. The bottom most plot shows a summary of the descriptors relative stability across a standard set of test images. The description stability metric was computed by finding the sum of all correct associations through out the entire image data set then dividing the number by the best result.
One reason for JavaSURF's poor stability performance is that is only implements an upright version of SURF, so images with any rotation cause it to fail. Not computing the orientation also helps JavaSURF on the description runtime benchmark because it has fewer computations to perform. JOpenSURF is a straight forward port of the OpenSURF library to Java and shows comparable stability with the expected hit on runtime performance. JOpenSURF, OpenSURF and BoofCV-M compute an enhanced version of the SURF descriptor, while the BoofCV descriptor is closer to the SURF paper with some improvements. I suspect that the descriptor computed by the reference library is also an improvement over what was presented in the SURF paper, but source code is closed so this theory cannot be directly verified.
OpenCV is a bit of an odd ball library as far as SURF is concerned. It did not provide an interface such that an interest point location and scale could be provided alone to the descriptor, orientation had to be already estimated. Comments in the code mention that parts of it are multi-threaded, while every other library is single threaded. Unfortunately the code is also complex and no simple fix could be found. Because of these issues, its own interest points were used instead of the precomputed ones used by every other library when testing stability. Speed wise a special test was done for OpenCV where both features were detected and described at the same time, which took 1940 (ms) for 6485 features. Making it about 20% slower than OpenSURF's combined time.
Runtime performance results suggest that the biggest determining factor in speed was not the language used, but the algorithms used and how well implemented it was. Before a flame war start, no Java is not faster than C++ and on average Java is slower. However, well written code is many times faster than poorly written code.
Tests were performed using standardized test images from , which have known transformations. Because the transformation between images is known this allows the true associations to be known. Stability was measured based upon the number of correct associations between two images in the dataset. The testing procedure for each library is summarized below:
- For each image, detect features (scale and location) using the fast Hessian detector in BoofCV.
- Save results to a file and use the same file for all libraries.
- For each image, compute a feature description (including orientation) for all found features.
- In each image sequence, associate features in the first image to the Nth image, where N > 1.
- Association is done by minimizing Euclidean error
- Validation is done using reverse association. E.g. This association must be the optimal association going from frame 1 to N and N to 1.
- Compute the number of correct associations.
- An association is correct if it is within 3 pixels of the true location.
Since the transformation is known between images the true location could have been used. However, in reality features will not lie at the exact point and a descriptor needs to be tolerant to this type of errors. Thus this is a more accurate measure of the description's strength.
Configuration: All libraries were configured to describe oriented SURF-64 features as defined in the original SURF paper. JavaSURF does not support orientation estimation. OpenCV forces orientation to be estimated inside the feature detector. Thus it was decided that the lesser evil for OpenCV was to let it detect its own features. OpenCV's threshold was adjusted so that it detected about the same number of features.
How fast enough library can compute the description and detect features was also benchmarked. Each test was performed several times with only the best time being shown. Java libraries tended to exhibit more variability than native libraries, while all libraries showed a significant amount of variability from trial to trial.
Only image processing time essential to SURF was measured and not loading in images. This would include converting an image to integral image format, but not converting the image to gray scale. Assuming that it was possible to not include the gray scale conversion. Elapsed time was measured in the actual application using System.currentTimeMillis() in Java and clock() in C++.
- Kill all extraneous processes.
- Load feature location and size from file.
- Compute descriptors (including orientation) for each feature while recording elapsed time.
- Compute elapsed time 10 times and output best result.
- Run the whole experiment 4 times for each library and record the best time.
- Ubuntu 10.10 64bit
- Quadcore Q6600 2.4 GHz
- Memory 8194 GB
- g++ 4.4.5
- Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Compiler and JRE Configuration
- All native libraries were compiled with -O3
- Java applications were run with no special flags
Describe Specific Setup:
- input image was boat/img1
- Fast Hessian features from BoofCV
- 6415 Total
Detect Specific Setup:
- Impossible to configure libraries to detect exact same features.
- Adjusted detection threshold to top out at around 2000 features
- Octaves: 4
- Scales: 4
- Base Size: 9
- Initial Pixel Skip: 1
Results can be found at the top of the page. OpenCV was ommited from runtime results because it could not be configured the same way as the other libraries. A special test was performed just for OpenCV and is discussed above. It should also be noted that it is not known what pixel skip was used inside of OpenCV.