PHash Blog

perceptual hashing information

View My GitHub Profile

ClipSeekr™: Video Clip Recognition System | PHash Blog

ClipSeekr™: Video Clip Recognition System

01 Jun 2019 - by starkdg

ClipSeekr is a real-time video clip recognition system
designed to detect video sequences that occur in a video

How It Works

Clipseekr works by indexing fingerprints of video clips.
A 64-bit fingerprint is created for each frame of the clip
from the spatial frequency information extracted from its
discrete cosine transform. These 64-bit integers are then
stored in a reverse index. This reverse index is simply a
redis database of key-value pairs, where the key is a frame’s
fingerprint pointing to a value consisting of an ID and some
sequence information. Unknown streams can then be monitored
to recognized the appearance of these indexed clips. The
basic principle is simple. When the number of consecutive
frames recognized for a particular ID reaches a specified
threshold, the clip can then be identified together with its
timestamp in the stream. This threshold is adjustable, but
a good value for a 29.97 fps stream seems to be between 5
and 10 consecutive frames.


The code can be found in the github repository here:


Test Results

To evaluate this method, we streamed four hours of television
and copied the commercial spots into new files for indexing.
Altogether, there were 142 of these ad spots, 135 of which
being unique video sequences. In brief, only one spot failed
to be detected outright - i.e. a “false negative” - while five
were detected falsely - i.e. “false positives”. The rest were
successfuly detected within seconds of the occurence in the stream.
This would roughly make for a false posive rate of 3.3%, and a
false negative rate of 0.01%. The following table logs the
results more precisely. The first two columns mark the clips
and the timestamps for where they actually occur in the stream.
The next two columns indicate the clips that get recognized
along with their timestamps.

A black font represents correct detections; a red font
represents false positives; and blue is for false negatives.


The only one that failed to be detected was a McDonald’s
commercial, called “Uber Eats”. The only thing noteworthy
is that the frames seemed exceptionally dark in contrast.
Perhaps not enough definition in the fingerprints. Another
noteworthy issue is the second detection of the spot called
“Jack Daniels”. While the first one was a correct match,
the second detection, even though it was a different clip,
it shared enough of the first clip in common that the second was
recognized as the first. This is an inherent weakness in the
fingerprinting system, since there is not enough temporal
information preserved to differentiate the two in real-time.

A few notes for further study:


However, this is an extremely slow approach when dealing with 30
fps streams. Recurrent Neural Networks could possibly add in
some temporary information, but might be limited to video clips
of a fixed set length.

Thanks you for your time in reading this post.
Comments and suggestions are welcome.

Comments and Suggestions