WO2010135082A1 - Affectation localisée de bits faibles - Google Patents
Affectation localisée de bits faibles Download PDFInfo
- Publication number
- WO2010135082A1 WO2010135082A1 PCT/US2010/033657 US2010033657W WO2010135082A1 WO 2010135082 A1 WO2010135082 A1 WO 2010135082A1 US 2010033657 W US2010033657 W US 2010033657W WO 2010135082 A1 WO2010135082 A1 WO 2010135082A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hash value
- query
- value
- hash
- recited
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9014—Indexing; Data structures therefor; Storage structures hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/41—Indexing; Data structures therefor; Storage structures
Definitions
- the present invention relates generally to localized weak bit assignment. More specifically, embodiments of the present invention relate to locally assigning weak bits to each portion of a hash value when searching for the hash value.
- Media clips or media content are segments of audio media, video media, or audio/visual (AV) media and include information that is embodied, stored, transmitted, received, processed, or otherwise used with at least one medium.
- Common media clip formats include FLV format (flash video), Windows Media Video, RealMedia, Quicktime, MPEG, MP3, and DivX.
- FLV format flash video
- Windows Media Video RealMedia
- Quicktime MPEG
- MPEG MPEG
- MP3 DivX
- Media clips may be defined with one or more images.
- video media may be a combination of a set of temporally related frames or images at particular points in time of the video media.
- audio media may be represented as one or more images using many different techniques known in the art.
- audio information may be captured in a spectrogram.
- the horizontal axis can represent time
- the vertical axis can represent frequency
- the amplitude of a particular frequency at a particular time can be represented in a third dimension.
- the amplitude may be represented with thicker lines, more intense colors or grey values.
- Images that define media content may be associated with a corresponding fingerprint ("fingerprint” used interchangeably with and equivalent to "signature”).
- fingerprints of media content may be derived (e.g., extracted, generated) from information within, or which comprises a part of the media content.
- a media fingerprint embodies or captures an essence of the media content of the corresponding media and may be uniquely identified therewith.
- Video fingerprints are media fingerprints that may be derived from images or frames of a video clip.
- Audio fingerprints are media fingerprints that may be derived from images with embedded audio information (e.g., spectrograms).
- media fingerprint may refer to a low bit rate representation of the media content with which they are associated and from which they are derived.
- Most applications of content identification using media fingerprints rely on a large database of media fingerprints. Any query fingerprint that is extracted from query media is compared against this database of media fingerprints to identify one or more closest matches. As the size of database increases in terms of number of hours of media, it is desirable that the uniqueness of fingerprint codewords is not reduced.
- a fingerprint codeword also known as, and to referred to herein as a hash value, a signature, or a sub-fingerprint
- a fingerprint codeword generally represents a sequence of fingerprint bits that is used for indexing (e.g., in a hash table) the media fingerprints.
- collisions Multiple fingerprints/media files being linked to the same hash value is referred to as collisions.
- the fewer the number of collisions (e.g. , fingerprints/media files) for the same hash value the lesser the computations required to determine which one of the fingerprints/media files corresponding to the hash value are equivalent or the best match to the query fingerprint.
- the fingerprints that have a small number of average collisions per hash value will result in shorter search duration.
- Such fingerprints are scalable for searching through a larger database of fingerprints than fingerprints for which the average number of collisions is higher.
- the hash-table look-up based matching could be easily misguided with a single bit- flip in the derived signature of the query media content (e.g., bit- flip caused by modification of an original media content to obtain the query media content).
- a notion of global weak bit assignment may be used when comparing signatures of reference and query media content using a hash table based look-up. For example, when using weak bits, a subset of S signature bits is globally selected from all the signature bits in a signature derived from query video and marked as weak. The selection of S signature bits for a signature or hash value may be determined by globally identifying S signature bits from all the signature bits in a signature that are most likely to flip when media content is processed.
- variations of the query signature may be obtained by toggling the S weak bits. For example, if each bit may be assigned one of two values (e.g. , 0 and 1), all 2 s possible variations of the query signature may be tried while performing the hash-table look-up to find the hash entry of the target matching signature in the database. The target matching signature is then used to identify the corresponding target media content in the database, from which the query media content was derived or which is identical to the query media content.
- FIG. 1 depicts an example method of global weak bit assignment
- FIG. 2 depicts an example method for localized weak bit assignment, according to an embodiment of the present invention
- FIG. 3 depicts an example method for searching for a query hash value, according to an embodiment of the present invention
- FIG. 4 depicts an example data structure, according to an embodiment of the present invention
- FIG. 5 depicts a block diagram that illustrates a computer system upon which an embodiment of the present invention may be implemented.
- FIG. 6 depicts an example IC device, according to an embodiment of the present invention.
- Example embodiments described herein relate to locally assigning weak bits to each portion of a hash value when searching for the hash value.
- numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
- Example embodiments are described herein according to the following outline:
- a method includes computing a query hash value, partitioning the query hash value into at least a first portion and a second portion, locally assigning weak bits within the first portion of the query hash value and weak bits within the second portion of the query hash value.
- the method further includes determining one or more variations of the query hash value with toggling one or more weak bits in the first portion and one or more weak bits in the second portion of the query hash value and identifying a target hash value that is identical to a variation of the query hash value, [0018] Identifying the target hash value that is identical to the variation of the query hash value may include determining an index value, of an array, based on a variation of the first portion of the query hash value, wherein data is associated with the index value of the array, identifying a second portion of the target hash value in the data associated with the index value of the array, determining that the second portion of the target hash value is identical to the second portion of the variation of the query hash value.
- Locally assigning weak bits within the first portion and the second portion of the query hash value may include selecting a first predetermined number of weak bits from the first portion and selecting a second predetermined number of weak bits from the second portion.
- Partitioning the query hash value into at least a first portion and a second portion may include partitioning the query hash value into equal sized portions or unequal sized portions. A ratio of weak bits to other bits may be similar in the first portion and the second portion.
- bits prior to partitioning the query hash value, bits may be reordered based on a predetermined reordering scheme. The reordering scheme may include transferring hash bits that are expected to have a high bit error rate from the first portion of the hash value to the second portion of the hash value.
- a method includes deriving consecutive hash values from a plurality of consecutive media content frames, selecting at least a portion from each hash value of the consecutive hash values to obtain a plurality of portions extracted from consecutive hash values, and generating a new hash value based on the plurality of portions extracted from consecutive hash values.
- a method includes computing a query hash value, partitioning the query hash value into at least a first portion and a second portion, identifying a first array index of an array based on the first portion of the query hash value, determining a second array index of the array based on an offset stored in data associated with the first array index, and determining that the second portion of the query hash value is stored in data associated with the second array index.
- a method includes obtaining a plurality of hash values, partitioning each hash value in the plurality of hash values into at least a first portion and a second portion, sorting each hash value in the plurality of hash values into a plurality of groups based on the first portion of that hash value. The method further includes subsequent to sorting each hash value in the plurality of hash values into the plurality of groups, storing each group of hash values.
- Storing each group of hash values may include identifying a data structure associated with the group of hash values; and storing the second portion of each hash value in the data structure associated with the group.
- Each hash value in the plurality of values may be associated with a corresponding pointer value and the second portion of each hash value may be stored in the data structure with the corresponding pointer value of that hash value.
- hash values derived from media content may be any use of hash values such as finding items in a database, detecting duplicated or similar records in a large file, finding similar stretches in DNA sequences, etc.
- hash values generally represent any number or set of numbers that are computed using a well-defined procedure or mathematical function (which may be referred to as a hash function) that is applied to possibly larger or variable-sized data.
- a hash value corresponding to a fingerprint (or sub-fingerprint) of media content may be derived based on one or more features in the media content.
- Each numerical value computed from one or more features in the media content may be compared to a threshold value to determine the corresponding hash bit (e.g., 1 if the numerical value meets the threshold value or 0 if that numerical value doesn't meet the threshold value).
- a modification to the media content may result in modification of one or more features that are used to calculate the numerical value.
- a numerical value that is close to the threshold value may easily cross over the threshold line with a modified feature.
- the corresponding hash bit for the numerical value may easily flip when the media content is modified.
- a hash bit that easily flips may be referred to as a weak bit, an unreliable bit or a bit with a low confidence measure.
- a global assignment of weak bits involves selecting weak bits from the entire hash value. For example, if a hash value has 36 bits, the 6 weakest bits (e.g., corresponding to six numerical values that were closest to the threshold value for determining the corresponding hash bit) may be selected as weak bits.
- Figure 1 depicts an example method of global weak bit assignment.
- FIG. 2 depicts an example method for localized weak bit assignment.
- the hash bits in the hash value (204a) may be reordered (Step 252).
- the hash bits may be reordered according to a predetermined reordering scheme based on training data.
- a training set of hash values may be derived from a training set of images, and a determination may be made that particular hash bits in the hash values frequently tend to be weak bits or have a high bit error rate (BER).
- This knowledge of where weak bits generally fall based on the hash function applied to a set of training data may be used to determine the reordering scheme.
- knowledge of the partitioning of the hash value into portions may also be used to reorder the weak bits.
- the reordering scheme may use the knowledge from the training set and the knowledge of the hash value partitioning to reorder hash bits.
- bit #2 and bit #7 are switched according to a reordering scheme applied to each hash value and previously determined based on a training set of data.
- the same reordering scheme of hash bits that is determined from a training set and performed in Step 252 on the hash value (204a) to obtain the reordered hash value (204b) may be applied to all hash values that are stored into the database or hash values which are searched for in the database for consistency.
- the hash value (204b) is partitioned into two or more portions (Step 254).
- Partitioning of hash value (204b) may include designating a portion of the hash value (204b) as part of a first portion (204c) and designating another portion of the hash value (204b) as part of a second portion (204d). Partitioning of the hash value (204b) may or may not involve separately storing different portions of the hash value, or storing the different portions under different variables. Although the partitioning shown in exemplary figure 2, shows partitioning the hash value into two mutually exclusive portions, one or more embodiments of the invention may involve partitioning the hash value (204b) into any number of portions. Furthermore, one or more bits may be overlapping in different portions. For example, the first portion may include bit#l, bit#7, bit#3, bit#4, and bit#5 and the second portion may include bit #5, bit #6, bit#2, and bit#8.
- the different portions (e.g. , 204c and 204d) of the hash value (204b) are each locally assigned weak bits (Step 256).
- Locally assigning weak bits in a portion of a hash value may include assigning weak bits to a predetermined number of bits within the hash bits of that portion. For example, for a 64 bit hash value that is partitioned into two 32 bit portions, 8 weakest bits within each portion may be labeled (or assigned as) weak bits. As a variant of the above example, we may allocate up to 8 weakest bits from each 32-bit portion, but not label them as weak bits if their corresponding features' distances from the threshold that is used for converting features to hash bits exceeds a pre-determined value.
- one weak bit is to be assigned to weakest bit in the first portion (204c), and two weak bits are to be assigned to the weakest bits in the second portion (204d).
- the weakest bit in the first portion (204c) is determined to be bit #1.
- the two weakest bits in the second portion (204d) are determined to be bit #6 and bit #8.
- the assignment of hash bits may involve decreasing the number of weak bits to decrease the number of variations that have to be enumerated.
- different types of data structures may be used for storing different portions of the hash value.
- Figure 3 depicts an example method for searching for a query hash value.
- a query hash value is received as input from a user, a program, or from any suitable source (Step 350).
- the query hash value received may be associated with any application that manages data using hash values.
- Step 352-Step 356 are similar to Step 252-Step 256 described above, for reordering hash bits in the query hash value, partitioning the query hash value into at least two portions, and locally selecting a subset of weak bits from the hash bits in the first portion and from the hash bits in the second portion. Variations of each portion are then determined and may be searched for, as described below.
- an index value of an array is determined based on a variation of the first portion of the query hash value (Step 358).
- the variation of the first portion of the query hash value is obtained by toggling the locally assigned weak bits in the first portion of the query hash value.
- Step 360 Thereafter a determination is made whether data is associated with the index value of the array (Step 360).
- Associated data may be stored at that index value of the array or a pointer to the associated data may be stored at that index value of the array.
- a pointer at an array index may point to a tree based data structure in which the 2 nd portion of each hash value, with the first portion equal to the index value, may be stored.
- different portions of the hash value are stored in different types of data structures or implicitly stored based on location in an array.
- Figure 4 depicts an example data structure, in accordance with one or more embodiments.
- a data flag (402) may be used to indicate whether data (406) is associated with the index value in the array index (404).
- a separate array may be used to indicate whether the data array holds data at the particular index. If the index value was equal to T, then Array Index T of the array index (404) would be identified. Furthermore, data including the second portion of hash values B, C, and G would be identified in associated of array index T. In this example, each of the hash values B, C, and G have a first portion equivalent to T. However, T does not necessarily have to be stored as this information may be implicitly known based on the array index value. Accordingly, only a portion of the hash value needs to be stored in the data structure.
- Step 362 If data is not associated with the array index based on the first portion, another variation of the first portion may be identified by toggling another weak bit in the first portion. However, if data is associated with the array index based on the first portion, a variation of the second portion of the query hash value may be determined by toggling weak bits in the second portion of the query hash value (Step 362). If a variation of the second portion of the query hash value is found in the data associated with the array index (Step 364), then a match is found (Step 370). If a variation of the second portion of the query hash value is not found in the data then a determination is made whether another variation is possible (Step 366).
- the comparison of the variations of the second portion of the query hash value with the data found at an array index may be dynamic. For example, if the number of variations are fewer than the hash value portions stored in association with the index value, then each of the variations may be searched for in the data. However, if the number of hash value portions stored in association with the index value are fewer than the number of variations that are possible, then the hash value portions may be searched for in the list of possible variations of the second portion of the query hash value.
- a search may be performed by bitwise-XORing the second portion of a stored hash value with the second portion of the query hash value and then bitwise- ANDing the result with the weak bit pattern in negation (where a weak bit is designated as a "0" bit, other bits designated as T) in the second portion of the query hash value.
- a final output of 0 means the second portion of that stored hash value is within the list of possible variations.
- Additional variations of the first portion may also be exhausted by checking other variations of the first portion (Step 368) in a search for the query hash value. If all variations are exhausted without finding a match, then a determination is made that the match is not found (Step 372).
- hash values may be computed by combining hash values derived from media content. Combining hash values may increase the randomness property of the hash value, thus decreasing collisions of different fingerprints for the same hash value. For example, consecutive hash values, which are hash values derived from consecutive frames of media content, may be combined to obtain new hash values.
- One method of combining hash values may include taking a portion of bits from each hash value of a consecutive set of hash values and concatenating the portions from the different hash values to obtain a new hash value. For example, if the hash index is 32-bits, 4 bits may be taken from each set of 8 consecutive hash values and concatenated to form a new 32 bit hash value.
- any other suitable combination may be used in order to obtain a new hash value from other hash values derived from media content.
- empty entries in a hash table are reused.
- storing a hash value comprises partitioning the hash value into at least two portions, and storing only the second portion of the hash value in association with an array index corresponding to the first portion of the hash value.
- an array index that is not associated with any data may be used to used to store overflow data.
- Array Index S is not associated with any data because no hash values are stored in the data structure which have a first portion that corresponds to S. Accordingly, Array Index S may be reused to store overflow data that would normally be stored in association with another index value.
- Array Index R may be associated with an index offset value of 1 indicating that additional data associated with Array Index R is stored at index offset 1 from Array Index R.
- the offset field is narrower than the data field of the second portion.
- Hash Value Z which has a first portion corresponding to R, may then be stored in association with Array Index S, which is at index offset 1 from Array Index R.
- the offset may be a negative value indicating that associated data is stored at a lower index value. Accordingly, embodiments of the invention allow for storing a portion of a hash value at a different array index with use of an offset, and without necessarily storing the entire array index value.
- a database storing hash values is built by sorting the hash values into groups and storing each group of hash values into the database, a group at a time.
- Sorting the hash values into groups may include partitioning the hash values into at least two or more portions. Thereafter, the hash values may be sorted into groups based on the first portion of each hash value. For example, all hash values with similar first portions are sorted into a group.
- Storing each group of hash values into a database includes storing the hash values a group at a time into the database.
- only the second portion of each hash value is stored into a data structure associated with that group (e.g., the data structure may be associated with the common first portion shared by all hash values in that group).
- Storing a group of hash values may include flushing the hash values from a Random
- RAM Random Access Memory
- FIG. 5 depicts a block diagram that illustrates a computer system 500 upon which an embodiment of the present invention may be implemented.
- Computer system 500 includes a bus 502 or other communication mechanism for communicating information, and a processor 504 coupled with bus 502 for processing information.
- Computer system 500 also includes a main memory 506, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 502 for storing information and instructions to be executed by processor 504.
- Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504.
- Computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504.
- ROM read only memory
- a storage device 510 such as a magnetic disk or optical disk, is provided and coupled to bus 502 for storing information and instructions.
- Computer system 500 may be coupled via bus 502 to a display 512, such as a cathode ray tube (CRT), liquid crystal display (LCD), plasma screen display, or the like, for displaying information to a computer user.
- a display 512 such as a cathode ray tube (CRT), liquid crystal display (LCD), plasma screen display, or the like, for displaying information to a computer user.
- An input device 514 including alphanumeric (or non-alphabet based writing systems and/or non- Arabic number based) and other keys, is coupled to bus 502 for communicating information and command selections to processor 504.
- cursor control 516 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 504 and for controlling cursor movement on display 512.
- This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), which allows
- Embodiments may relate to the use of computer system 500 for implementing techniques described herein. According to an embodiment of the invention, such techniques are performed by computer system 500 in response to processor 504 executing one or more sequences of one or more instructions contained in main memory 506. Such instructions may be read into main memory 506 from another machine-readable medium, such as storage device 510. Execution of the sequences of instructions contained in main memory 506 causes processor 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
- machine -readable medium refers to any storage medium that participates in providing data that causes a machine to operation in a specific fashion.
- various machine- readable media are involved, for example, in providing instructions to processor 504 for execution.
- Such a medium may take many forms, including but not limited to storage media and transmission media.
- Storage media includes both non-volatile media and volatile media.
- Non- volatile media includes, for example, optical or magnetic disks, such as storage device 510.
- Volatile media includes dynamic memory, such as main memory 506.
- Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 502.
- Transmission media can also take the form of acoustic or electromagnetic waves, such as those generated during radio-wave and infra-red and other optical data communications. Such media are tangible to enable the instructions carried by the media to be detected by a physical mechanism that reads the instructions into a machine.
- Machine-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, other legacy media, or any other physical medium with patterns of holes or darkened spots, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
- Various forms of machine-readable media may be involved in carrying one or more sequences of one or more instructions to processor 504 for execution.
- the instructions may initially be carried on a magnetic disk of a remote computer.
- the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
- a modem local to computer system 500 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
- An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 502.
- Bus 502 carries the data to main memory 506, from which processor 504 retrieves and executes the instructions.
- the instructions received by main memory 506 may optionally be stored on storage device 510 either before or after execution by processor 504.
- Computer system 500 also includes a communication interface 518 coupled to bus 502.
- Communication interface 518 provides a two-way data communication coupling to a network link 520 that is connected to a local network 522.
- communication interface 518 may be an integrated services digital network (ISDN) card or a digital subscriber line (DSL) or cable modem (traditionally modulator/demodulator) to provide a data communication connection to a corresponding type of telephone line.
- ISDN integrated services digital network
- DSL digital subscriber line
- cable modem traditionally modulator/demodulator
- communication interface 518 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
- LAN local area network
- Wireless links may also be implemented.
- communication interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
- Network link 520 typically provides data communication through one or more networks to other data devices.
- network link 520 may provide a connection through local network 522 to a host computer 524 or to data equipment operated by an Internet Service Provider (ISP) 526.
- ISP 526 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 528.
- Internet 528 uses electrical, electromagnetic or optical signals that carry digital data streams.
- the signals through the various networks and the signals on network link 520 and through communication interface 518, which carry the digital data to and from computer system 500, are exemplary forms of carrier waves transporting the information.
- Computer system 500 can send messages and receive data, including program code, through the network(s), network link 520 and communication interface 518.
- a server 530 might transmit a requested code for an application program through Internet 528, ISP 526, local network 522 and communication interface 518.
- the received code may be executed by processor 504 as the received code is received, and/or stored in storage device 510, or other non-volatile storage for later execution. In this manner, computer system 500 may obtain application code in the form of a carrier wave.
- FIG. 6 depicts an example IC device 600, with which a possible embodiment of the present invention may be implemented.
- IC device 600 may have an input/output (I/O) feature 601.
- I/O feature 601 receives input signals and routes them via routing fabric 610 to a central processing unit (CPU) 602, which functions with storage 603.
- I/O feature 601 also receives output signals from other component features of IC device 600 and may control a part of the signal flow over routing fabric 610.
- a digital signal processing (DSP) feature performs at least a function relating to digital signal processing.
- An interface 605 accesses external signals and routes them to I/O feature 601, and allows IC device 600 to export signals. Routing fabric 610 routes signals and power between the various component features of IC device 600.
- Configurable and/or programmable processing elements (CPPE) 611 such as arrays of logic gates may perform dedicated functions of IC device 600, which in an embodiment may relate to deriving and processing media fingerprints that generally correspond to media content.
- Storage 612 dedicates sufficient memory cells for CPPE 611 to function efficiently.
- CPPE may include one or more dedicated DSP features 614.
- a method comprising: computing a query hash value; partitioning the query hash value into at least a first portion and a second portion; locally assigning weak bits within the first portion of the query hash value and weak bits within the second portion of the query hash value; determining one or more variations of the query hash value with toggling one or more weak bits in the first portion and one or more weak bits in the second portion of the query hash value; identifying a target hash value that is identical to a variation of the query hash value, wherein the method is performed by a general purpose machine comprising a processor and configured to be a special purpose machine based on a set of software instructions.
- identifying the target hash value that is identical to the variation of the query hash value comprises: determining an index value, of an array, based on a variation of the first portion of the query hash value, wherein data is associated with the index value of the array; identifying a second portion of the target hash value in the data associated with the index value of the array; and determining that the second portion of the target hash value is identical to a variation of the second portion of the query hash value.
- identifying the target hash value that is identical to the variation of the query hash value comprises: identifying a first portion of the target hash value that is identical to a first portion of the variation of the query hash value; identifying a second portion of the target hash value associated with the first portion of the target hash value; and determining that the second portion of the target hash value is identical to the second portion of the variation of the query hash value.
- Example Embodiment 6 The method as recited in Enumerated Example Embodiment 1, wherein locally assigning weak bits within the first portion comprises: selecting a predetermined number of feature values that are closest to a threshold value, wherein the feature values are derived from media content; determining if a distance of each selected feature value is within a predetermined range from the threshold value; and assigning bits in the query hash value as the one or more weak bits if the distance of a corresponding feature value from the threshold value is within the predetermined range.
- computing the query hash value comprises: deriving one or more feature values from media content; and for at least one feature value, determining two or more bits in the query hash value from a single feature value.
- computing the query hash value comprises: deriving one or more feature values from media content; partitioning the one or more feature values derived from the media content into three or more intervals; and assigning at least one bit in the query hash value for each feature value based on an interval corresponding to that feature value.
- partitioning the query hash value into at least a first portion and a second portion comprises partitioning the query hash value into equal sized portions.
- partitioning the query hash value into at least a first portion and a second portion comprises partitioning the query hash value into unequal sized portions.
- Example Embodiment 12 The method as recited in Enumerated Example Embodiment 1, wherein the query hash value is partitioned into three or more portions, wherein subsequent to partitioning the query hash value, weak bits are locally assigned to each portion of the query hash value, and wherein determining one or more variations of the query hash value comprises toggling one or more weak bits in each portion in the three or more portions of the query hash value.
- Example Embodiment 13 The method as recited in Enumerated Example Embodiment 1, wherein the method further comprises: prior to portioning the query hash value into at least a first portion and a second portion, reordering hash bits in the query hash value.
- reordering the hash bits in the query hash value comprises transferring hash bits that are expected to have a high bit error rate from the first portion of the hash value to the second portion of the hash value.
- a method comprising: deriving consecutive hash values from a plurality of consecutive media content frames; selecting at least a portion from each hash value of the consecutive hash values to obtain a plurality of portions extracted from consecutive hash values; and generating a new hash value based on the plurality of portions extracted from consecutive hash values; wherein the method is performed by a general purpose machine comprising a processor and configured to be a special purpose machine based on a set of software instructions.
- a method comprising: computing a query hash value; partitioning the query hash value into at least a first portion and a second portion; identifying a first array index of an array based on the first portion of the query hash value; determining a second array index of the array based on an offset stored in data associated with the first array index; and determining that the second portion of the query hash value is stored in data associated with the second array index; wherein the method is performed by a general purpose machine comprising a processor and configured to be a special purpose machine based on a set of software instructions.
- Example Embodiment 17 The method as recited in Enumerated Example Embodiment 17, further comprising determining that the second portion of the query hash value is not stored in data associated with the first array index prior to determining the second array index based on the offset stored in data associated with the first array index.
- a method comprising: obtaining a plurality of hash values; partitioning each hash value in the plurality of hash values into at least a first portion and a second portion; sorting each hash value in the plurality of hash values into a plurality of groups based on the first portion of that hash value; subsequent to sorting each hash value in the plurality of hash values into the plurality of groups, and storing each group of hash values; wherein storing each group of hash values comprises: identifying a data structure associated with the group of hash values; and storing the second portion of each hash value in the data structure associated with the group; wherein the method is performed by a general purpose machine comprising a processor and configured to be a special purpose machine based on a set of software instructions.
- each hash value in the plurality of hash values is associated with a corresponding pointer value; and wherein the second portion of each hash value is stored in the data structure with the corresponding pointer value of that hash value.
- storing the second portion of each hash value in the data structure associated with the group comprises: storing the second portion of each hash value on a Random Access Memory (RAM) buffer; and subsequent to storing the pointer value associated with each hash value in the group on the RAM buffer, flushing at least a portion of the RAM buffer to non-volatile solid state memory.
- RAM Random Access Memory
- a computer readable storage medium having encoded instructions which, when executed by one or more processors, cause performance of the steps of a method as recited in any of Enumerated Example Embodiments 1-21.
- a system comprising: one or processors; and a computer readable storage medium having encoded instructions which, when executed by the one or more processors, cause performance of a method as recited in any of Enumerated Example Embodiments 1-21.
- a system comprising means for performing steps of a method as recited in any of Enumerated Example Embodiments 1-21.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
La présente invention concerne l'identification d'une valeur de hachage cible. L'identification de valeurs de hachage cibles comprend le calcul d'une valeur de hachage d'interrogation, le partitionnement de cette valeur de hachage d'interrogation en au moins une première partie et une seconde partie, et l'allocation locale de bits faibles dans la première partie de la valeur de hachage d'interrogation et de bits faibles dans la seconde partie de la valeur de hachage d'interrogation. Ce procédé comprend en outre la détermination d'une ou de plusieurs variantes de la valeur de hachage d'interrogation en faisant basculer un ou plusieurs bits faibles dans la première partie et un ou plusieurs bits faibles dans la seconde partie de la valeur de hachage d'interrogation; et l'identification d'une valeur de hachage cible identique à une variante de la valeur de hachage requise.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17970609P | 2009-05-19 | 2009-05-19 | |
US61/179,706 | 2009-05-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010135082A1 true WO2010135082A1 (fr) | 2010-11-25 |
Family
ID=42942079
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2010/033657 WO2010135082A1 (fr) | 2009-05-19 | 2010-05-05 | Affectation localisée de bits faibles |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2010135082A1 (fr) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105144141A (zh) * | 2013-03-15 | 2015-12-09 | 康格尼蒂夫媒体网络公司 | 用于使用距离关联性散列法对媒体数据库定址的系统和方法 |
EP3001871A4 (fr) * | 2013-03-15 | 2017-02-22 | Cognitive Media Networks, Inc. | Systèmes et procédés pour interroger une base de données multimédia à l'aide d'un hachage associatif à distance |
US9838753B2 (en) | 2013-12-23 | 2017-12-05 | Inscape Data, Inc. | Monitoring individual viewing of television events using tracking pixels and cookies |
US9906834B2 (en) | 2009-05-29 | 2018-02-27 | Inscape Data, Inc. | Methods for identifying video segments and displaying contextually targeted content on a connected television |
US9955192B2 (en) | 2013-12-23 | 2018-04-24 | Inscape Data, Inc. | Monitoring individual viewing of television events using tracking pixels and cookies |
US10080062B2 (en) | 2015-07-16 | 2018-09-18 | Inscape Data, Inc. | Optimizing media fingerprint retention to improve system resource utilization |
US10116972B2 (en) | 2009-05-29 | 2018-10-30 | Inscape Data, Inc. | Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device |
US10169455B2 (en) | 2009-05-29 | 2019-01-01 | Inscape Data, Inc. | Systems and methods for addressing a media database using distance associative hashing |
US10192138B2 (en) | 2010-05-27 | 2019-01-29 | Inscape Data, Inc. | Systems and methods for reducing data density in large datasets |
US10375451B2 (en) | 2009-05-29 | 2019-08-06 | Inscape Data, Inc. | Detection of common media segments |
US10405014B2 (en) | 2015-01-30 | 2019-09-03 | Inscape Data, Inc. | Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device |
US10482349B2 (en) | 2015-04-17 | 2019-11-19 | Inscape Data, Inc. | Systems and methods for reducing data density in large datasets |
US10606879B1 (en) | 2016-02-29 | 2020-03-31 | Gracenote, Inc. | Indexing fingerprints |
US10873788B2 (en) | 2015-07-16 | 2020-12-22 | Inscape Data, Inc. | Detection of common media segments |
US10902048B2 (en) | 2015-07-16 | 2021-01-26 | Inscape Data, Inc. | Prediction of future views of video segments to optimize system resource utilization |
US10949458B2 (en) | 2009-05-29 | 2021-03-16 | Inscape Data, Inc. | System and method for improving work load management in ACR television monitoring system |
US10983984B2 (en) | 2017-04-06 | 2021-04-20 | Inscape Data, Inc. | Systems and methods for improving accuracy of device maps using media viewing data |
US11272248B2 (en) | 2009-05-29 | 2022-03-08 | Inscape Data, Inc. | Methods for identifying video segments and displaying contextually targeted content on a connected television |
US11308144B2 (en) | 2015-07-16 | 2022-04-19 | Inscape Data, Inc. | Systems and methods for partitioning search indexes for improved efficiency in identifying media segments |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002065782A1 (fr) * | 2001-02-12 | 2002-08-22 | Koninklijke Philips Electronics N.V. | Contenu multi-media : creation et mise en correspondance de hachages |
WO2003067466A2 (fr) * | 2002-02-05 | 2003-08-14 | Koninklijke Philips Electronics N.V. | Stockage efficace d'empreintes textuelles |
WO2007148290A2 (fr) * | 2006-06-20 | 2007-12-27 | Koninklijke Philips Electronics N.V. | Génération d'empreintes de signaux d'information |
-
2010
- 2010-05-05 WO PCT/US2010/033657 patent/WO2010135082A1/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002065782A1 (fr) * | 2001-02-12 | 2002-08-22 | Koninklijke Philips Electronics N.V. | Contenu multi-media : creation et mise en correspondance de hachages |
WO2003067466A2 (fr) * | 2002-02-05 | 2003-08-14 | Koninklijke Philips Electronics N.V. | Stockage efficace d'empreintes textuelles |
WO2007148290A2 (fr) * | 2006-06-20 | 2007-12-27 | Koninklijke Philips Electronics N.V. | Génération d'empreintes de signaux d'information |
Non-Patent Citations (3)
Title |
---|
HAITSMA J ET AL: "Robust Audio Hashing for Content Identification", CONTENT-BASED MULTIMEDIA INDEXING (CBMI 2001) BRESCIA, ITALY,, 1 January 2001 (2001-01-01), pages 8PP, XP002264645 * |
OOSTVEEN J ET AL: "FEATURE EXTRACTION AND A DATABASE STRATEGY FOR VIDEO FINGERPRINTING", PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS: SECOND INTERNATIONAL SYMPOSIUM, ISPA 2004 PROCEEDINGS, HONG KONG, CHINA, DECEMBER 13 - 15, 2004 (IN: LECTURE NOTES IN COMPUTER SCIENCES), SPRINGER, DE LNKD- DOI:10.1007/3-540-45925-1_11, vol. 2314, 11 March 2002 (2002-03-11), pages 117 - 128, XP009017770, ISBN: 978-3-540-24128-7 * |
YU-XIN ZHAO ET AL: "Robust Hashing Based on Persistent Points for Video Copy Detection", COMPUTATIONAL INTELLIGENCE AND SECURITY, 2008. CIS '08. INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 13 December 2008 (2008-12-13), pages 305 - 308, XP031379129, ISBN: 978-0-7695-3508-1 * |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10116972B2 (en) | 2009-05-29 | 2018-10-30 | Inscape Data, Inc. | Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device |
US11272248B2 (en) | 2009-05-29 | 2022-03-08 | Inscape Data, Inc. | Methods for identifying video segments and displaying contextually targeted content on a connected television |
US9906834B2 (en) | 2009-05-29 | 2018-02-27 | Inscape Data, Inc. | Methods for identifying video segments and displaying contextually targeted content on a connected television |
US11080331B2 (en) | 2009-05-29 | 2021-08-03 | Inscape Data, Inc. | Systems and methods for addressing a media database using distance associative hashing |
US10375451B2 (en) | 2009-05-29 | 2019-08-06 | Inscape Data, Inc. | Detection of common media segments |
US10169455B2 (en) | 2009-05-29 | 2019-01-01 | Inscape Data, Inc. | Systems and methods for addressing a media database using distance associative hashing |
US10185768B2 (en) | 2009-05-29 | 2019-01-22 | Inscape Data, Inc. | Systems and methods for addressing a media database using distance associative hashing |
US10271098B2 (en) | 2009-05-29 | 2019-04-23 | Inscape Data, Inc. | Methods for identifying video segments and displaying contextually targeted content on a connected television |
US10949458B2 (en) | 2009-05-29 | 2021-03-16 | Inscape Data, Inc. | System and method for improving work load management in ACR television monitoring system |
US10820048B2 (en) | 2009-05-29 | 2020-10-27 | Inscape Data, Inc. | Methods for identifying video segments and displaying contextually targeted content on a connected television |
US10192138B2 (en) | 2010-05-27 | 2019-01-29 | Inscape Data, Inc. | Systems and methods for reducing data density in large datasets |
EP3001871A4 (fr) * | 2013-03-15 | 2017-02-22 | Cognitive Media Networks, Inc. | Systèmes et procédés pour interroger une base de données multimédia à l'aide d'un hachage associatif à distance |
CN105144141A (zh) * | 2013-03-15 | 2015-12-09 | 康格尼蒂夫媒体网络公司 | 用于使用距离关联性散列法对媒体数据库定址的系统和方法 |
US11039178B2 (en) | 2013-12-23 | 2021-06-15 | Inscape Data, Inc. | Monitoring individual viewing of television events using tracking pixels and cookies |
US10284884B2 (en) | 2013-12-23 | 2019-05-07 | Inscape Data, Inc. | Monitoring individual viewing of television events using tracking pixels and cookies |
US9838753B2 (en) | 2013-12-23 | 2017-12-05 | Inscape Data, Inc. | Monitoring individual viewing of television events using tracking pixels and cookies |
US10306274B2 (en) | 2013-12-23 | 2019-05-28 | Inscape Data, Inc. | Monitoring individual viewing of television events using tracking pixels and cookies |
US9955192B2 (en) | 2013-12-23 | 2018-04-24 | Inscape Data, Inc. | Monitoring individual viewing of television events using tracking pixels and cookies |
US11711554B2 (en) | 2015-01-30 | 2023-07-25 | Inscape Data, Inc. | Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device |
US10405014B2 (en) | 2015-01-30 | 2019-09-03 | Inscape Data, Inc. | Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device |
US10945006B2 (en) | 2015-01-30 | 2021-03-09 | Inscape Data, Inc. | Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device |
US10482349B2 (en) | 2015-04-17 | 2019-11-19 | Inscape Data, Inc. | Systems and methods for reducing data density in large datasets |
US10080062B2 (en) | 2015-07-16 | 2018-09-18 | Inscape Data, Inc. | Optimizing media fingerprint retention to improve system resource utilization |
US10902048B2 (en) | 2015-07-16 | 2021-01-26 | Inscape Data, Inc. | Prediction of future views of video segments to optimize system resource utilization |
US10873788B2 (en) | 2015-07-16 | 2020-12-22 | Inscape Data, Inc. | Detection of common media segments |
US10674223B2 (en) | 2015-07-16 | 2020-06-02 | Inscape Data, Inc. | Optimizing media fingerprint retention to improve system resource utilization |
US11308144B2 (en) | 2015-07-16 | 2022-04-19 | Inscape Data, Inc. | Systems and methods for partitioning search indexes for improved efficiency in identifying media segments |
US11451877B2 (en) | 2015-07-16 | 2022-09-20 | Inscape Data, Inc. | Optimizing media fingerprint retention to improve system resource utilization |
US11659255B2 (en) | 2015-07-16 | 2023-05-23 | Inscape Data, Inc. | Detection of common media segments |
US11971919B2 (en) | 2015-07-16 | 2024-04-30 | Inscape Data, Inc. | Systems and methods for partitioning search indexes for improved efficiency in identifying media segments |
US11436271B2 (en) | 2016-02-29 | 2022-09-06 | Gracenote, Inc. | Indexing fingerprints |
US10606879B1 (en) | 2016-02-29 | 2020-03-31 | Gracenote, Inc. | Indexing fingerprints |
US12045277B2 (en) | 2016-02-29 | 2024-07-23 | Gracenote, Inc. | Indexing fingerprints |
US10983984B2 (en) | 2017-04-06 | 2021-04-20 | Inscape Data, Inc. | Systems and methods for improving accuracy of device maps using media viewing data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2010135082A1 (fr) | Affectation localisée de bits faibles | |
EP3767483B1 (fr) | Procédé, dispositif, système, et serveur de récupération d'image, et support d'informations | |
US8478730B2 (en) | Scalable deduplication system with small blocks | |
US9971770B2 (en) | Inverted indexing | |
US9361307B2 (en) | Rejecting rows when scanning a collision chain that is associated with a page filter | |
US10783163B2 (en) | Instance-based distributed data recovery method and apparatus | |
US10698912B2 (en) | Method for processing a database query | |
US8423562B2 (en) | Non-transitory, computer readable storage medium, search method, and search apparatus | |
US20150220684A1 (en) | System and method for characterizing biological sequence data through a probabilistic data structure | |
CN111247518A (zh) | 数据库分片 | |
CN111552692B (zh) | 一种加减法布谷鸟过滤器 | |
CN108027713A (zh) | 用于固态驱动器控制器的重复数据删除 | |
CN111552693B (zh) | 一种标签布谷鸟过滤器 | |
CN112148928A (zh) | 一种基于指纹家族的布谷鸟过滤器 | |
US10757227B2 (en) | Security-oriented compression | |
Chen et al. | A high-throughput FPGA accelerator for short-read mapping of the whole human genome | |
GB2433336A (en) | Metadata verification during measurement processing | |
US8868584B2 (en) | Compression pattern matching | |
US20150278543A1 (en) | System and Method for Optimizing Storage of File System Access Control Lists | |
CN104123102B (zh) | 一种ip硬盘及其数据处理方法 | |
CN112783971B (zh) | 交易记录方法、交易查询方法、电子设备及存储介质 | |
CN112241336A (zh) | 用于备份数据的方法、设备和计算机程序产品 | |
CN107704472B (zh) | 一种查找数据块的方法及装置 | |
US20230168830A1 (en) | Method and apparatus for data access of nand flash file, and storage medium | |
KR101666758B1 (ko) | 개선된 블룸 필터를 이용하는 데이터 검색 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10723444 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10723444 Country of ref document: EP Kind code of ref document: A1 |