Class RecognitionNearestNeighborInvertedFile<Point>

Type Parameters:
Point - Data type for the 'point'. Typically this is a Tuple.
All Implemented Interfaces:

public class RecognitionNearestNeighborInvertedFile<Point> extends Object implements VerbosePrint

Implementation of the "classical" Bog-Of-Words (BOW) (a.k.a. Bag-Of-Visual-Words) [1] for object/scene recognition that uses an inverted file for fast image retrieval [2].

An image is described using a set of local image features (e.g. SIFT) which results in a set of n-dimensional vectors. Each feature vector is converted into a word, which is then used to build a histogram of words in the image. A similarity score is computed between two images using the histogram. Words are learned using k-means clustering when applied to a large initial training set of image features. This implementation is designed to be simple and flexible. Allowing different algorithms in the same family to be swapped out. For example, the nearest-neighbor (NN) search can be done using a brute force approach, kd-tree, or an approximate kd-tree. There is no single source for this specific paper that inspired this implementation and it borrows ideas from several papers. The paper below is one of the earlier works to discuss the concept for visual BOW.
  1. Sivic, Josef, and Andrew Zisserman. "Video Google: A text retrieval approach to object matching in videos." Computer Vision, IEEE International Conference on. Vol. 3. IEEE Computer Society, 2003.
  2. Nister, David, and Henrik Stewenius. "Scalable recognition with a vocabulary tree." 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). Vol. 2. Ieee, 2006.
  • Field Details

  • Constructor Details

    • RecognitionNearestNeighborInvertedFile

      public RecognitionNearestNeighborInvertedFile()
  • Method Details

    • initialize

      public void initialize(NearestNeighbor<Point> nearestNeighbor, int numWords)
      Initializes the data structures.
      nearestNeighbor - Search used to find the words.
      numWords - Number of words
    • clearImages

      public void clearImages()
      Discards all memory of words which were added
    • addImage

      public void addImage(int imageID, List<Point> imageFeatures)
      Adds a new image to the database.
      imageID - The image's unique ID for later reference
      imageFeatures - Feature descriptors from an image
    • query

      public boolean query(List<Point> queryImage, @Nullable BoofLambdas.FilterInt filter, int limit)
      Looks up the best BowMatch from the database. The list of all potential matches can be accessed by calling #getMatches().
      queryImage - Set of feature descriptors from the query image
      filter - Filter which can be used to reject matches that the user doesn't want returned. False = reject.
      limit - Maximum number of matches it will return.
      The best matching image with score from the database
    • setDistanceType

      public void setDistanceType(BowDistanceTypes type)
      Used to change distance function to one of the built in types
    • setVerbose

      public void setVerbose(@Nullable @Nullable PrintStream out, @Nullable @Nullable Set<String> configuration)
      Specified by:
      setVerbose in interface VerbosePrint