US20160019671A1 - Identifying multimedia objects based on multimedia fingerprint - Google Patents

Identifying multimedia objects based on multimedia fingerprint Download PDF

Info

Publication number
US20160019671A1
US20160019671A1 US14869554 US201514869554A US2016019671A1 US 20160019671 A1 US20160019671 A1 US 20160019671A1 US 14869554 US14869554 US 14869554 US 201514869554 A US201514869554 A US 201514869554A US 2016019671 A1 US2016019671 A1 US 2016019671A1
Authority
US
Grant status
Application
Patent type
Prior art keywords
fingerprint
fingerprints
reference
training
derived
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US14869554
Inventor
Claus Bauer
Lie Lu
Mingqing Hu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby Laboratories Licensing Corp
Original Assignee
Dolby Laboratories Licensing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30017Multimedia data retrieval; Retrieval of more than one type of audiovisual media
    • G06F17/30023Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor ; File system structures therefor in structured data stores
    • G06F17/30345Update requests
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor ; File system structures therefor in structured data stores
    • G06F17/30587Details of specialised database models
    • G06F17/30595Relational databases
    • G06F17/30598Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00624Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
    • G06K9/00711Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
    • G06K9/00744Extracting features from the video content, e.g. video "fingerprints", or characteristics, e.g. by automatic extraction of representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6288Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • G06K9/6292Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion of classification results, e.g. of classification results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30244Information retrieval; Database structures therefor ; File system structures therefor in image databases
    • G06F17/30247Information retrieval; Database structures therefor ; File system structures therefor in image databases based on features automatically derived from the image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/3074Audio data retrieval
    • G06F17/30743Audio data retrieval using features automatically derived from the audio content, e.g. descriptors, fingerprints, signatures, MEP-cepstral coefficients, musical score, tempo
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor ; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor ; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre

Abstract

Embodiments of identifying multimedia objects based on multimedia fingerprints are provided. Query fingerprints are derived from a multimedia object according to differing fingerprint algorithms. For each fingerprint algorithm, decisions are calculated through at least one classifier corresponding to the fingerprint algorithm based on the query fingerprint and reference fingerprints, the reference fingerprints being derived from reference multimedia objects according to the same fingerprint algorithm. Each of the decisions indicates a possibility that the query fingerprint and the reference fingerprint are not derived from the same multimedia content. For each of the reference multimedia objects, a distance is calculated as a weighted sum of the decisions relating to the reference fingerprints. The multimedia object is identified as matching the reference multimedia object with the smallest distance less than a threshold.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application claims benefit as a Divisional of application Ser. No. 13/854,276, filed 1 Apr. 2013, which claims priority to U.S. Provisional Patent Application No. 61/625,889 filed on 18 Apr. 2012 entitled “Identifying Multimedia Objects Based on Multimedia Fingerprint” by Claus Bauer et. al., the entire contents of the aforementioned are hereby incorporated by reference as if fully set forth herein, under 35 U.S.C. §120 and 121. The applicant(s) hereby rescind any disclaimer of claim scope in the parent application(s) or the prosecution history thereof and advise the USPTO that the claims in this application may be broader than any claim in the parent application(s).
  • TECHNICAL FIELD
  • The present invention relates generally to digital signal processing. More specifically, embodiments of the present invention relate to identifying multimedia objects based on multimedia fingerprints.
  • BACKGROUND
  • A multimedia (e.g., audio, video or image) fingerprint is a content-based compact signature that summarizes a multimedia recording. Multimedia fingerprinting technologies have been widely investigated and are increasingly used for various applications since they allow the monitoring of multimedia objects independently of its format and without the need of metadata or watermark embedding. In an example of the applications, given a fingerprint derived from a multimedia recording (e.g., audio or video), a matching algorithm searches a database of fingerprints to find the best match.
  • Various fingerprint algorithms to derive multimedia fingerprints have been proposed. Multimedia fingerprints can be described as low-bit rate identifiers that uniquely identify even small segments of a multimedia recording such as an audio file, a video file or an image file. A recording or segment which a fingerprint is derived from is also called as a multimedia object hereafter.
  • Fingerprints based on different fingerprint algorithms differ in the degree of robustness to content modifications and sensitivity to content change. In general, fingerprints are designed in a way that increased fingerprint robustness leads to a decrease in content sensitivity and vice-versa. It is difficult to achieve high robustness and high sensitivity with one fingerprint algorithm.
  • SUMMARY
  • According to an embodiment of the invention, a method of identifying a multimedia object is provided. According to the method, query fingerprints fq,1 to fq,T are acquired. The query fingerprints are derived from the multimedia object according to fingerprint algorithms F1 to FT respectively. The fingerprint algorithms F1 to FT are different from each other, and T>1. For each fingerprint algorithm Ft, decisions, are calculated using at least one classifier corresponding to the fingerprint algorithm Ft based on the query fingerprint fq,t and reference fingerprints derived from a plurality of reference multimedia objects according to the fingerprint algorithm Ft. Each of the decisions may indicate a possibility that the query fingerprint and the reference fingerprint for calculating the decision are not derived from the same multimedia content. For each of the reference multimedia objects, a distance D is calculated as a weighted sum of the decisions relating to the reference fingerprints derived from the reference multimedia object according to the fingerprint algorithms F1 to FT respectively. Accordingly, the multimedia object is identified as matching the reference multimedia object with the smallest distance which is less than a threshold THc. The matching between two multimedia objects means that the multimedia objects can be identified as the same multimedia content.
  • According to an embodiment of the invention, an apparatus for identifying a multimedia object is provided. The apparatus includes an acquiring unit, a plurality of classifying units, a combining unit and an identifying unit. Each fingerprint algorithm Ft corresponds to at least one of the classifying units. The acquiring unit is configured to acquire query fingerprints fq,1 to fq,T. The query fingerprints are derived from the multimedia object according to the fingerprint algorithms F1 to FT respectively. The fingerprint algorithms F1 to FT are different from each other, and T>1. Each of the classifying units is configured to calculate decisions through a classifier based on the query fingerprint fq,t and reference fingerprints derived from a plurality of reference multimedia objects according to the fingerprint algorithm Ft. Each of the decisions may indicate a possibility that the query fingerprint and the reference fingerprint for calculating the decision are not derived from the same multimedia content. The combining unit is configured to, for each of the reference multimedia objects, calculate a distance D as a weighted sum of the decisions relating to the reference fingerprints derived from the reference multimedia object according to the fingerprint algorithms F1 to FT respectively. The identifying unit is configured to identify the multimedia object as matching the reference multimedia object with the smallest distance which is less than a threshold THc.
  • According to an embodiment of the invention, a method of training a model for identifying multimedia objects is provided. This method uses training data provided as samples. According to the method, each of one or more samples includes a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object or not. For each sample, training query fingerprints are derived from the training query multimedia object according to fingerprint algorithms F1 to FG respectively. The fingerprint algorithms F1 to FG are different from each other, and G>1. For each sample, also, training reference fingerprints are derived from the training reference multimedia object according to the fingerprint algorithms F1 to FG respectively. For each fingerprint algorithm Ft, at least one candidate classifier is generated based on the training query fingerprints and the training reference fingerprints derived according to the fingerprint algorithm Ft. The candidate classifier is adapted to calculate a decision for any two fingerprints derived according to the fingerprint algorithm Ft. The decision may indicate a possibility that the two fingerprints are not derived from the same multimedia content. The model is generated as including a weighted sum of classifiers selected from the candidate classifiers and a threshold THc for evaluating the weighted sum, such that the identifying error obtained by applying the model to the training query fingerprints and the training reference fingerprints derived from the samples is minimized. The selected classifiers in the generated model may or may not correspond to more than one fingerprinting algorithm. It is possible, that for some of the fingerprint algorithms no classifier is selected.
  • According to an embodiment of the invention, an apparatus for training a model for identifying multimedia objects is provided. The apparatus includes a fingerprint calculator and a training unit. The apparatus is provided with a set of samples. Each of one or more samples includes a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object or not. For each sample, the fingerprint calculator is configured to derive training query fingerprints from the training query multimedia object according to fingerprint algorithms F1 to FG respectively. The fingerprint algorithms F1 to FG are different from each other, and G>1. The fingerprint calculator is also configured to derive training reference fingerprints from the training reference multimedia object according to the fingerprint algorithms F1 to FG respectively. For each fingerprint algorithm Ft, the training unit is configured to generate at least one candidate classifier based on the training query fingerprints and the training reference fingerprints derived according to the fingerprint algorithm Ft. The candidate classifier is adapted to calculate a decision for any two fingerprints derived according to the fingerprint algorithm Ft. The decision may indicate a possibility that the two fingerprints are not derived from the same multimedia content. The training unit is further configured to generate the model including a weighted sum of classifiers selected from the candidate classifiers and a threshold THc for evaluating the weighted sum, such that the identifying error obtained by applying the model to the training query fingerprints and the training reference fingerprints derived from the samples is minimized. The selected classifiers in the generated model may or may not correspond to more than one fingerprinting algorithm. It is possible, that for some of the fingerprint algorithms no classifier is selected.
  • Further features and advantages of the invention, as well as the structure and operation of various embodiments of the invention, are described in detail below with reference to the accompanying drawings. It is noted that the invention is not limited to the specific embodiments described herein. Such embodiments are presented herein for illustrative purposes only. Additional embodiments will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein.
  • BRIEF DESCRIPTION OF DRAWINGS
  • The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
  • FIG. 1 is a block diagram illustrating an example apparatus for identifying a multimedia object according to an embodiment of the invention;
  • FIG. 2 is a flow chart illustrating an example method of identifying a multimedia object according to an embodiment of the invention;
  • FIG. 3 depicts pseudo codes for illustrating an example process of searching a tree according to an example;
  • FIG. 4 is a block diagram illustrating an example apparatus for training a model for identifying multimedia objects according to an embodiment of the invention;
  • FIG. 5 is a flow chart illustrating an example process of Adaboosting method according to an embodiment of the invention;
  • FIG. 6 is a flow chart illustrating an example method of training a model for identifying multimedia objects according to an embodiment of the invention; and
  • FIG. 7 is a block diagram illustrating an exemplary system for implementing embodiments of the present invention.
  • DETAILED DESCRIPTION
  • The embodiments of the present invention are below described by referring to the drawings. It is to be noted that, for purpose of clarity, representations and descriptions about those components and processes known by those skilled in the art but not necessary to understand the present invention are omitted in the drawings and the description.
  • As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, a device (e.g., a cellular telephone, portable media player, personal computer, television set-top box, or digital video recorder, or any media player), a method or a computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, microcode, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
  • A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wired line, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the users computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the users computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • FIG. 1 is a block diagram illustrating an example apparatus 100 for identifying a multimedia object q according to an embodiment of the invention.
  • A fingerprint algorithm may capture features, perceptual or imperceptible, of a multimedia object and represent them in a bit sequence called as a fingerprint of the multimedia object. The accuracy of a fingerprint algorithm may be defined in terms of robustness and sensitivity.
  • The robustness refers to the degree to which the fingerprint is robust against content-preserving modification of the multimedia object from which it is derived according to the fingerprint algorithm. For an audio file, e.g., the modification may be a transcoding of the content. For a video or image file, e.g., the modification may be a rotation or a cropping of the video. The content-preserving modification of the multimedia object does not prevent a human being from recognizing that the modified multimedia object and the unmodified multimedia object contain the same content, and only lead to a relatively small change of the fingerprint.
  • The sensitivity refers to the degree to which the fingerprint is sensitive to changes of content. If the sensitivity is higher, fingerprints derived from different multimedia contents can differ more significantly.
  • Various multimedia fingerprint algorithms have been proposed. These fingerprint algorithms differ in robustness to content-preserving modifications and in sensitivity to content changes. In general, fingerprint algorithms are designed in a way that increased fingerprint robustness leads to a decrease in content sensitivity, and vice-versa. Further, fingerprint algorithms may differ by bitrate, i.e., the number of bits needed to uniquely represent a multimedia object under specific robustness and sensitivity requirements.
  • By using multiple fingerprint algorithms, it is possible to identify a multimedia object as matching or not matching another multimedia object with increased robustness and sensitivity, in view of the fact that the fingerprint algorithms may be an enhancement to each other in either robustness or sensitivity. In most query scenarios, there is a set of multimedia objects which are constant or temporarily constant. If another multimedia object is present, a query request to identify whether the other multimedia object matches one of the multimedia objects from the set of multimedia objects is generated. Because the multimedia objects of the set act as references in the query scenarios, they are also called reference multimedia objects in the present context. Correspondingly, the multimedia object to be identified is also called as a query multimedia object in the present context. Therefore, fingerprints derived from reference multimedia objects and query multimedia objects are also called reference fingerprints and query fingerprints respectively.
  • As illustrated in FIG. 1, the apparatus 100 includes an acquiring unit 101, a number C>1 of classifying units 102-1 to 102-C, a combining unit 103 and an identifying unit 104.
  • The acquiring unit 101 is configured to acquire query fingerprints fq,1 to fq,T which are derived from the multimedia object q according to fingerprint algorithms F1 to FT respectively, where T>1. In other words, each query fingerprint fq,t, 1≦t≦T is derived from the multimedia object q according to the fingerprint algorithm Ft. The fingerprint algorithms F1 to FT are different from each other. In an embodiment of the apparatus 100, the query fingerprints fq,1 to fq,T may be derived by the acquiring unit 101. Alternatively, in another embodiment of the apparatus 100, the query fingerprints fq,1 to fq,T may be derived at a location such as a client device external to the acquiring unit 101, and the acquiring unit 101 receives the query fingerprints fq,1 to fq,T from the location via a connection such as bus, network or application-specific link.
  • Each fingerprint algorithm Ft may correspond to at least one of the classifying units 102-1 to 102-C. One fingerprint algorithm may correspond to only one classifying unit, or may correspond to more than one classifying unit. In an example, one or more of the fingerprint algorithms F1 to FT may each correspond to only one classifying unit. In another example, one or more of the fingerprint algorithms F1 to FT may each correspond to at least two classifying units. In case that a fingerprint algorithm corresponds to a classifying unit, the classifying unit may be applied to fingerprints derived according to the fingerprint algorithm.
  • There is a set of reference multimedia object r1 to rN. For each reference multimedia object rj, reference fingerprints fr j ,1 to fr j ,T are derived from the reference multimedia object rj according to each fingerprint algorithm Ft of the fingerprint algorithms F1 to FT. Accordingly, each classifying unit 102-i, 1≦i≦C, corresponding to one of the fingerprint algorithms Ft, 1≦t≦T, is configured to calculate decisions hi(q,rj), 1≦j≦N, through a classifier based on the query fingerprint fq,t and the reference fingerprints fr 1 ,t to fr N ,t derived from the reference multimedia objects r1 to rN according to the fingerprint algorithm Ft. Each decision hi(q,rj) indicates a possibility that the query fingerprint fq,t and the reference fingerprint fr j ,t are not derived from the same multimedia content. Consequently, all the classifying units may calculate N×C decisions hi(q,rj), where 1≦i≦C, 1≦j≦N.
  • The classifier may be achieved through any algorithm for identifying whether a fingerprint and another fingerprint are derived from the same multimedia content or not. The algorithm includes, but not limited to, identifying algorithms based on machine learning, and identifying algorithms based on searching.
  • In an example of identifying algorithms based on machine learning, distances are calculated between the fingerprints derived from positive and negative training samples, each of which includes a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object. A positive sample means that it includes a mark indicating that the training query multimedia object matches the training reference multimedia object, and a negative sample means that it includes a mark indicating that the training query multimedia object does not match the training reference multimedia object. Parameters of the algorithms which can minimize the identifying error are learned from the distances derived from the objects, and therefore, corresponding classifiers are generated. In case of identifying through such a classifier, the distance required by the classifier is calculated from a query fingerprint and a reference fingerprint, and a decision is calculated with the classifier based on the distance. Alternatively, the parameters may be determined experientially, without learning.
  • In an example of identifying algorithms based on searching, a set of training reference fingerprints are searched to find one or more of them matching a training query fingerprint. Fingerprints may be derived from positive and negative training samples, each of which includes a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object. Parameters of the algorithms which can minimize the identifying error are learned from the distances derived from the objects, and therefore, corresponding classifiers are generated. In case of identifying through such a classifier, the reference fingerprints are searched to find one or more of them matching a query fingerprint. Therefore, for the reference fingerprint and the query fingerprint found as matching, it is possible to make a decision that the reference fingerprint and the query fingerprint are, or are likely derived from the same multimedia content. For the reference fingerprint and the query fingerprint not found as matching, it is possible to make a decision that the reference fingerprint and the query fingerprint are not, or are not likely derived from the same multimedia content. Alternatively, the parameters may be determined experientially, without selecting by comparison.
  • In case that at least two classifiers correspond to the same fingerprint algorithm, the classifiers may include classifiers based on the same identifying algorithm but having different parameter configurations (e.g., different thresholds for classifying), and/or classifiers based on different identifying algorithms. Alternatively, each fingerprint algorithms Ft may correspond to only one classifying unit.
  • As an example of the identifying algorithm, in case of a classifying unit 102-i adopting a classifier corresponding to the fingerprint algorithm Ft, a query fingerprint fq,t and a reference fingerprint fr j ,t are derived according to the fingerprint algorithm Ft, a distance di(fq,t,fr j ,t), such as Hamming distance, between the query fingerprint fq,t and the reference fingerprint fr j ,t is calculated. di( ) is a distance function about two fingerprints derived according to the fingerprint algorithm Ft. Then the decision hi(q,rj) is calculated as
  • h i ( q , r j ) = { 1 if d i ( f q , t , f r j , t ) > Th i 0 if d i ( f q , t , f r j , t ) Th i ( 1 )
  • where 1 indicates that the query fingerprint fq,t and the reference fingerprint fr j ,t are not derived from the same multimedia content, 0 indicates that the query fingerprint fq,t and the reference fingerprint fr j ,t are derived from the same multimedia content, and Thi is a threshold associated with the classifier.
  • The decisions may be hard decisions indicating that the query fingerprint and the reference fingerprint are not derived from the same multimedia content (e.g., 1), or that the query fingerprint and the reference fingerprint are derived from the same multimedia content (e.g., 0). The decisions may also be soft decisions indicating a probability that the query fingerprint and the reference fingerprint are not derived from the same multimedia content.
  • For each reference multimedia object rj, the combining unit 103 is configured to calculate a distance Dj as a weighted sum of the decisions h1(q,rj) to hC(q,rj) relating to the reference fingerprints fr j ,1 fr j ,T derived from the reference multimedia object rj according to the fingerprint algorithms F1 to FT respectively, that is,
  • D j = i = 1 C w i h i ( q , r j ) ( 2 )
  • where wi is the weight for the decision hi(q,rj). In an example, all the weights wi may be equal. In another example, the weights wi may be pre-trained based on training samples.
  • For the N reference multimedia objects, N distances D1 to DN are calculated. The identifying unit 104 is configured to identify the multimedia object q as matching the reference multimedia object x with the smallest distance Dx of the distances D1 to DN, which is less than a threshold THc.
  • FIG. 2 is a flow chart illustrating an example method 200 of identifying a multimedia object q (also called query multimedia object q hereafter) according to an embodiment of the invention.
  • As illustrated in FIG. 2, the method 200 starts from step 201. At step 203, query fingerprints fq,1 to fq,T are acquired. The query fingerprints fq,1 to fq,T are derived from the query multimedia object q according to fingerprint algorithms F1 to FT respectively. In other words, each query fingerprints fq,t is derived from the query multimedia object q according to the fingerprint algorithm Ft. The fingerprint algorithms F1 to FT are different from each other. In an embodiment of the method 200, the query fingerprints fq,1 to fq,T may be derived at step 203. Alternatively, in another embodiment of the method 200, the query fingerprints fq,1 to fq,T may be derived at a location such as a client device, and at step 203, the query fingerprints fq,1 to fq,T are received from the location via a connection such as bus, network or link.
  • At step 205, for each fingerprint algorithm Ft, decisions hi(q,r1) to hi(q,rN) are calculated using at least one classifier corresponding to the fingerprint algorithm Ft based on the query fingerprint fq,t and reference fingerprints fr 1 ,t to fr N ,t derived from a plurality of reference multimedia objects r1 to rN according to the fingerprint algorithm Ft. Each fingerprint algorithm Ft may correspond to at least one classifier. One fingerprint algorithm may correspond to only one classifying unit, or may correspond to more than one classifying unit. In an example, one or more of the fingerprint algorithms F1 to FT may each correspond to only one classifier. In another example, one or more of the fingerprint algorithms Ft to FT may each correspond to at least two classifiers. In case that a fingerprint algorithm corresponds to a classifier, the classifier may be applied to fingerprints derived according to the fingerprint algorithm. The decision hi(q,rj) indicates a possibility that the query fingerprint fq,t and the corresponding reference fingerprint fr j ,t are not derived from the same multimedia content. Accordingly, N×C decisions hi(q,rj) are calculated, where 1≦i≦C, 1≦j≦N. The classifier may be achieved through any algorithm for identifying whether a fingerprint and another fingerprint are derived from the same multimedia content or not. The algorithm includes, but not limited to, identifying algorithms based on machine learning, and identifying algorithms based on searching. In case that at least two classifiers correspond to the same fingerprint algorithm, the classifiers may include classifiers based on the same identifying algorithm but having different parameter configurations (e.g., different thresholds for classifying), and/or classifiers based on different identifying algorithms. Alternatively, each fingerprint algorithms Ft may correspond to only one classifier. The decisions may be hard decisions indicating that the query fingerprint and the reference fingerprint are not derived from the same multimedia content, or that the query fingerprint and the reference fingerprint are derived from the same multimedia content. The decisions may also be soft decisions indicating a probability that the query fingerprint and the reference fingerprint are not derived from the same multimedia content.
  • At step 207, for each reference multimedia object rj, a distance Dj is calculated as a weighted sum of the decisions h1(q,rj) to hC(q,rj) relating to the reference fingerprints fr j ,1 to fr j ,T derived from the reference multimedia object rj according to the fingerprint algorithms F1 to FT respectively as described in equation (2).
  • At step 209, the multimedia object q is identified as matching the reference multimedia object x with the smallest distance Dx of the distances D1 to DN, which is less than a threshold THc.
  • The method 200 ends at step 211.
  • According to the apparatus 100 and the method 200, usage of more than one fingerprint algorithm can be beneficial to support specific applications. In particular, different fingerprint algorithms are characterized by different trade-offs between robustness and sensitivity. Combining these fingerprints with different performance characteristics can be useful to derive more intelligent decisions and more accurate information for the target applications. In particular, different fingerprint algorithms with different robustness and sensitivity characteristics can be used jointly to offset each other's performance weaknesses and arrive at a more reliable decision.
  • The apparatus 100 and the method 200 may be applied in various applications. In an example application, the reference multimedia objects include those corresponding to various contents. The apparatus 100 and the method 200 can be applied to find the reference multimedia object having the same content with a query multimedia object. In another example application, the reference multimedia objects include those corresponding to the same content but at different positions in the content. The apparatus 100 and the method 200 can be applied to find the reference multimedia object matching a query multimedia object, so as to determine the position in the content which is synchronous with the query multimedia object.
  • In a further embodiment (Embodiment A) of the apparatus 100 or the method 200, for each of a subset or the whole set of the classifiers, in the corresponding classifying unit 102-i, or at step 205, the decisions hi(q,r1) to hi(q,rN) are calculated through the classifier based on the query fingerprint fq,t and the reference fingerprints fr 1 ,t to fr N ,t. In the calculation, a tree is searched to find at least one leaf node having an bit error rate between the query fingerprint and the reference fingerprint represented by the leaf node less than a maximum tolerable error rate. The reference fingerprints have a fixed length L=S×K bits, and S and K are positive integers. The tree is a 2K-ary tree having S levels, and each node in the l-th level, 0≦l≦S, represents a bit sequence of K×l bits, and therefore, the reference fingerprints can be represented by corresponding leaf nodes in the tree. Each level has a look-up table. For a node of the level reached during the searching, the look-up table defines an estimated bit error rate rt between the query fingerprint and its closest reference fingerprint under the reached node, such that the probability of observing at least E errors between b bits represented by the reached node and first b bits of the query fingerprint is greater than a threshold pt.
  • The look-up table may be computed in advance based on the following observation. During searching the tree, if any node at level 1 is reached, b bits of the query fingerprint can be examined against the b bits represented by the node, and e errors may be seen between the first b bits of the query fingerprint and the first b bits represented by the node. Then, the probability p(e|b,r) of observing e errors in b bits with a bit error rate (BER) of r is a certain distribution (e.g., binomial distribution, assuming that the bit errors are uniformly distributed over the entire fingerprint). The probability p′ of observing at least E errors in b bits is simple one minus the cumulative probability of p(e|b,r) where e ranges from 0 to E. In this way, having observed e errors in b bits, it is possible to calculate the bit error rate rt, between the query fingerprint and the closest reference fingerprint under the node, such that the probability of observing at least e errors is greater than a threshold pt. That is, rt is such that p′=pt. This means that if e errors have been observed in b bits, it is reasonably certain that the eventual overall bit error rate will be greater than rt. During the searching, if a non-leaf node is reached, e and b may be determined for the node, and corresponding rt may be found in the look-up table. If rt is greater than the threshold pt, this means that no reference fingerprint having a bit error rate less than the threshold pt may be found under the node, and therefore, this node can be excluded from the searching scope. The threshold pt is the maximum tolerable error rate.
  • Also in the calculation, the decisions hi(q,r1) to hi(q,rN) are calculated by deciding that only the reference fingerprints represented by leaf nodes, which are found during the search, and the query fingerprint are derived from the same multimedia content. For example, if a found leaf node represents a reference fingerprint fr x ,t, the decision hi(q,rx) is calculated as indicating that the reference fingerprint fr x ,t and the query fingerprint fq,t are derived from the same multimedia content. For the reference fingerprint represented by a leaf node which is not found during the search, the corresponding decision is calculated as indicating that the reference fingerprint and the query fingerprint are not derived from the same multimedia content. In a different example, not all, but only a subset of all the reference fingerprints represented by leaf nodes found in the search are decided as being derived from the same multimedia content as the query fingerprint. Different methods based on thresholds or absolute number of candidates can be used to decide how to determine the subset.
  • Depending on specific performance requirements, there can be various stop criteria for the searching. In a first example, the searching may stop upon finding the first leaf node having an bit error rate less than the threshold pt. In this case, the at least one leaf node found in the search includes only one leaf node. In a second example, the searching may stop upon finding the leaf node having the smallest bit error rate less than the threshold pt. In a third example, the searching may stop upon finding all the leaf nodes or a predetermined number of leaf nodes having bit error rates less than the threshold pt.
  • FIG. 3 depicts pseudo codes for illustrating an example process of the searching according to the second example. Function main( ) is the main process of the search. For function search(node), the variable “node” is an object having an attribute “errs” for representing e, and an attribute “level” for representing b.
  • In the second example described above, at least two leaf nodes having the smallest bit error rate may be found. In this case, it is possible to select one of the leaf nodes with a probability as the searching result. For example, if the probability is 0.5, the leaf node is selected randomly, if the probability is less than 0.5, the first node is selected, and if the probability is greater than 0.5, the last node is selected.
  • In a further embodiment (Embodiment B) of the apparatus 100 or the method 200, for each of a subset or the whole set of the classifiers, the fingerprints for the classifier are derived as hash values, and in the corresponding classifying unit, or at step 205, the decisions hi(q,r1) to hi(q,rN) are calculated through the classifier based on the query fingerprint and the reference fingerprints. In the calculation, a distance di(fq,t,fr j ,t) between the query fingerprint fq,t and each fr j ,t of the reference fingerprints fr 1 ,t to fr N ,t is calculated, and the decisions hi(q,r1) to hi(q,rN) are calculated by deciding that at least one of the reference fingerprints fr 1 ,t to fr N ,t with the distance di less than a threshold THi and the query fingerprint fq,t are derived from the same multimedia content. Depending on specific performance requirements, the at least one reference fingerprint which is decided to be derived from the same multimedia content with the query fingerprint may be determined according to various criteria. In an example, the at least one reference fingerprint may be determined as only the reference fingerprint first found with the distance less than the threshold THi. In another example, the at least one reference fingerprint may be determined as the reference fingerprint with the smallest distance less than the threshold THi. In another example, the at least one reference fingerprint may be determined as all the reference fingerprints or a predetermined number of reference fingerprints with the distance less than the threshold THi.
  • In a further embodiment (Embodiment C) of the apparatus 100 or the method 200, for each of a subset or the whole set of the classifiers, the fingerprints for the classifier are derived as hash values. Each of the hash values is divided into weak bits and reliable bits. The weak bits are defined as the bits which are likely to flip when the multimedia object, from which the fingerprint is derived, is modified. A modification of the content is defined as a change of the digital presentation (waveform) of the multimedia signal which preserves the perceptually relevant content of the multimedia object. Examples are transcoding, but range change, resampling, specific pre- and post-processing technologies. If a song/video is modified by one of these operations, it might sound/look slightly different, but it is still easily recognized as the same song/video by a human. The weak bits are the bits that flip with a high probability when these modifications are applied. This probability is required to be above a certain threshold and might be determined by experiments and the requirements of the application for which the fingerprints are used.
  • The reliable bits are less likely to flip as a result of content modification. In the corresponding classifying unit 102-i, or at step 205, the decisions hi(q,r1) to hi(q,rN) are calculated through the classifier based on the query fingerprint and the reference fingerprints. In the calculation, a distance di(fq,t,fr j ,t) between the query fingerprint fq,t and each fr j ,t of the reference fingerprints fr 1 ,t to fr N ,t having the identical reliable bits is calculated, and the decisions hi(q,r1) to hi(q,rN) are calculated by deciding that at least one of the reference fingerprints fq,t to fr N ,t with the distance di less than a threshold TH′i and the query fingerprint fq,t are derived from the same multimedia content. Depending on specific performance requirements, the at least one reference fingerprint which is decided to be derived from the same multimedia content with the query fingerprint may be determined according to various criteria. In an example, the at least one reference fingerprint may be determined as only the reference fingerprint first found with the distance less than the threshold TH′i. In another example, the at least one reference fingerprint may be determined as the reference fingerprint with the smallest distance less than the threshold TH′i. In another example, the at least one reference fingerprint may be determined as all the reference fingerprints or a predetermined number of reference fingerprints with the distance less than the threshold TH′i.
  • In a further embodiment of the apparatus 100 or the method 200, the classifiers may include any combination of the classifiers described in connection with Embodiments A, B and C.
  • In a further embodiment of the apparatus 100 or the method 200, the query multimedia object q includes a number W of objects which are synchronous with each other, and each of the reference multimedia objects r1 to rN includes the number W of objects which are synchronous with each other, where W>1. In this case, for each of the W objects in the query multimedia object q and the reference multimedia objects r1 to rN, at least one of the fingerprints is derived from the object according to the same fingerprint algorithm respectively. The W objects may belong to different media classes like audio, video, or image. As an example, the W objects may include an audio object and a video or image object synchronous with each other. Some of the fingerprints may be derived from the audio object, and others may be derived from the video or image object. In this embodiment, fingerprint algorithms suitable for the specific media classes of the objects may be chosen. Combining the results of the different fingerprint algorithms, a more accurate search result can be obtained.
  • FIG. 4 is a block diagram illustrating an example apparatus 400 for training a model for identifying multimedia objects according to an embodiment of the invention.
  • As illustrated in FIG. 4, the apparatus 400 includes a fingerprint calculator 401 and a training unit 403.
  • To train the model, a set S of one or more training samples is provided. Each sample includes one training query multimedia object qk out of training query multimedia objects q1 to qM, one training reference multimedia object rj out of training reference multimedia objects r1 to rU, and a mark yk,j indicating whether the training query multimedia object qk matches the training reference multimedia object rj or not. The samples may include some or all the combinations of the training query multimedia objects and the training reference multimedia objects. It can be appreciated that any two samples are different in their training query multimedia object or their training reference multimedia object.
  • For each sample, the fingerprint calculator 401 is configured to derive query fingerprints fq k ,j to fq k ,G from the training query multimedia object qk according to fingerprint algorithms F1 to FG respectively. The fingerprint algorithms F1 to FG are different from each other, and G>1. For each sample, the fingerprint calculator 401 is further configured to derive training reference fingerprints fr j ,1 to fr j ,G from the training reference multimedia object rj according to the fingerprint algorithms F1 to FG respectively.
  • For each fingerprint algorithm Ft, 1≦t≦G the training unit 403 is configured to generate at least one candidate classifier based on the training query fingerprints and the training reference fingerprints derived according to the fingerprint algorithm Ft. The at least one candidate classifier may include only one candidate classifier, or may include more than one candidate classifier. The candidate classifier is adapted to calculate a decision for any two fingerprints derived according to the fingerprint algorithm Ft. The decision indicates a possibility that the two fingerprints are not derived from the same multimedia content.
  • The candidate classifier may be achieved through any algorithm for identifying whether a fingerprint and another fingerprint are derived from the same multimedia content or not. These algorithms include, but are not limited to, identifying algorithms based on machine learning, and identifying algorithms based on searching.
  • In an example of identifying algorithms based on machine learning, distances are calculated between the fingerprints derived from positive and negative training samples, each of which includes a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object. Parameters of the algorithms which can minimize the identifying error are learned from the distances, and therefore, corresponding classifiers are generated.
  • In an example of identifying algorithms based on searching, a set of training reference fingerprints are searched to find one or more of them matching a training query fingerprint. Fingerprints may be derived from positive and negative training samples, each of which includes a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object. A positive sample means that it includes a mark indicating that the training query multimedia object matches the training reference multimedia object, and a negative sample means that it includes a mark indicating that the training query multimedia object does not match the training reference multimedia object. Parameters of the algorithms which can minimize the identifying error are learned from the distances derived from the objects, and therefore, corresponding classifiers are generated.
  • In case that at least two candidate classifiers are generated for the same fingerprint algorithm, the candidate classifiers may include candidate classifiers based on the same identifying algorithm but having different parameter configurations (e.g., different thresholds for classifying), and/or candidate classifiers based on different identifying algorithms. Alternatively, each fingerprint algorithms Ft may correspond to only one classifying unit.
  • In an example of the algorithm for identifying based on equation (1), it is possible to generate the candidate classifier by selecting the threshold Thi from a plurality of candidate thresholds, such that the identifying error with reference to the training samples is the smallest. The identifying error is generally calculated with an error function. As an example, the error function εi is calculated as
  • ɛ i = c S , h i ( c ) y c P ( c ) ( 3 )
  • where S represents the set of the samples, P( ) represents a distribution of weights of the samples, yc represents the mark of sample c, and hi( ) represents a candidate classifier corresponding to fingerprint algorithm Fi. In an example, both hi( ) and yc may take values 0 and 1 (see equation (1)).
  • In an example (Example A) of the algorithm for identifying by searching a tree as described in connection with the apparatus 100 and the method 200, for the tree, it is possible to provide at least two sets of parameters including K and an initial value of the maximum tolerable error rate are provided. For each different set of parameters, the tree is configured differently and a different tree based classifier can be constructed. The identifying errors of the constructed classifiers can be calculated based on an error function such as equation (3) with reference to the samples. Therefore, the set resulting in the smallest error function is selected to generate the candidate classifier. In case of the second example of the algorithm for identifying by searching a tree as described in connection with the apparatus 100 and the method 200, at least two leaf nodes having the smallest bit error rate may be found. In this case, it is possible to select one of the leaf nodes with a probability as the searching result. For example, if the probability is 0.5, the leaf node is selected randomly, if the probability is less than 0.5, the first node is selected, and if the probability is greater than 0.5, the last node is selected. In this case, in addition to K and the initial value of the maximum tolerable error rate, each set of parameters may also include the probability.
  • In an example (Example B) of the algorithm for identifying based on the distance between the hash values as described in connection with the apparatus 100 and the method 200, at least two candidate thresholds for calculating the decisions hi( ) may be provided, and one of the candidate threshold resulting in the smallest error function is selected as the threshold THi for the candidate classifier.
  • In an example (Example C) of the algorithm for identifying based on the distance between the hash values having the identical reliable bits as described in connection with the apparatus 100 and the method 200, at least two combinations of a) threshold for calculating the decisions hi( ) and b) division of the fingerprint into weak bits and reliable bits may be provided, and the combination resulting in the smallest error function is selected for the candidate classifier. The division of the fingerprint into weak and reliable bits can be configured by changing the number of weak bits and reliable bits and by changing the method (or pattern) to determine which bits are classified as weak and reliable.
  • The training unit 403 is further configured to generate the model including a weighted sum of classifiers selected from the candidate classifiers and a threshold THc for evaluating the weighted sum, such that the identifying error obtained by applying the model to the training query fingerprints and the training reference fingerprints derived from the samples is minimized There may be a possibility that in the generated model, the selected classifiers correspond to T>1 fingerprint algorithms. There is also another possibility that in the generated model, the selected classifiers correspond to only one fingerprint algorithm. More than one classifier may correspond to the same fingerprinting algorithm. In order to find THc, different values of THc can be tried out randomly, exhaustively, or find through an optimization algorithm.
  • In a first example of generating the model, both the classifiers and the weights are generated and selected through an Adaboost method. According to the Adaboost method, several rounds of selection may be performed. At the beginning of each round of selection, each training sample is assigned a probability value (weight). Also, in each round candidate classifiers are generated based on training data. From the candidate classifiers generated in this round, the candidate classifier having the minimum error with reference to the training data is selected, and its weight is determined accordingly. Also, in each round, the distribution of weight of the training data is also updated for generating candidate classifiers in the next round. The Adaboost method can be configured in different ways. In one example, all of the fingerprint algorithms can be used in each round to generate the candidate classifiers. In another example, only a subset of all fingerprint algorithms is be used in each round to generate the candidate classifiers. In this case, the used fingerprint algorithms may be predetermined or selected randomly. The fingerprint algorithms used in different rounds may be identical, different in part or different totally. One or more fingerprint algorithms may be used in one or more rounds. In another example, Adaboost may select each fingerprint at most or exactly once to build a classifier. In this example, if a classifier corresponding to a fingerprint algorithm is selected, this fingerprint algorithm will not be considered in generating candidate classifiers for selection in the next iterations.
  • Alternatively, in an second example, classifiers included in the model may be predetermined, and the weights of the classifiers may be determined through an Adaboost method. In this scenario, each classifier is considered in only one iteration.
  • FIG. 5 is a flow chart illustrating an example process 500 of the Adaboost method according to an embodiment of the invention.
  • As illustrated in FIG. 5, the process 500 starts from step 501.
  • Supposing that there are M training query multimedia objects and U training reference multimedia objects, there is a set S of V=M×U samples. One may choose to use all V training sample as training data for the Adaboost algorithm. Or, one may choose to use a subset of W<V of training sample as training data for the Adaboost algorithm.
  • Each sample c in the set S includes one of the training query multimedia objects, one of the training reference multimedia objects and a mark yc. Different samples cannot contain both the same training query multimedia object and the same training reference multimedia object. For the s-th iteration of the Adaboost method, the weight of each sample c in the set S is denoted as Ps(c). Initially, the weights of the samples may be set equal, e.g., set to 1/(V) or 1/(W). At step 503, for each fingerprint algorithm Ft, at least one candidate classifier hi( ) is generated based on the query fingerprints and the corresponding training fingerprints derived from the samples according to the fingerprint algorithm Ft. There are multiple ways to generate these classifiers and they depend on the chosen database structure. In particular, databases can be tree based, hash based, or hash based using weak bits. For these kinds of databases, ways to generate classifiers have been described in some of the preceding paragraphs of this invention disclosure.
  • At step 505, one of the candidate classifiers having the smallest error function εi with reference to the samples is selected.
  • At step 507, a weight wi for the newly selected classifier hi( ) is calculated as
  • w i = 1 2 ln ( 1 - ɛ i ɛ i ) ( 4 )
  • At step 509, the weights Ps(c), cεS are updated as
  • P s + 1 ( c ) = P s ( c ) exp ( - w i y c h i ( c ) ) Z i ( 5 )
  • where Zi is a normalization factor chosen such that Ps+1( ) is a probability distribution.
  • At step 511, it is determined whether this is the last iteration. In general, if a pre-defined number of iterations have been processed, or the smallest error function below a threshold has been reached, the iteration may be determined as the last iteration. If no, the process 500 returns to step 503 to execute the next iteration. If yes, the process 500 ends at step 513.
  • In another example of generating the model, all the generated candidate classifiers h1( ) to hC( ) are pre-selected by some optimization process. Such an optimization process could be a process of minimizing the error function as described some of the preceding paragraphs of this invention disclosure. The weights w1 to wC for the classifiers can be determined by minimizing the identifying error
  • i = 1 C w i H i - Y Frob ( 6 )
  • where Hi is a M×U matrix with Hi(k,j)=hi(ck,j), Y is a M×U matrix with Y(k,j)=yk,j, and Frob denotes the Frobenius matrix norm measuring the distance of matrices, and the weights are supposed to be non-negative and must sum up to one. Alternatively, the identifying error may also be calculated as
  • i = 1 C w i H i - Y Frob . ( 7 )
  • In both cases, a Laplacian gradient search may be used to solve the minimization problem. Alternatively, it is also possible to provide a discrete, possibly uniformly spaced set of possible values for the weights w1 to wC as well as the threshold values for the classifiers h1( ) to hC( ). The best solution is defined as the solution that either minimizes the expression (6) or (7) above for a specific set of threshold values for the classifiers h1( ) to hC( ). Here, the weights and the thresholds can be jointly or consecutively determined.
  • FIG. 6 is a flow chart illustrating an example method 600 of training a model for identifying multimedia objects according to an embodiment of the invention.
  • As illustrated in FIG. 6, the method 600 starts from step 601.
  • To train the model, a set S of one or more samples is provided. Each sample includes one training query multimedia object qk of training query multimedia objects q1 to qM, one training reference multimedia object rj of training reference multimedia objects r1 to rU and a mark yk,j indicating whether the training query multimedia object qk matches the training reference multimedia object rj or not.
  • At step 603, for each sample, training query fingerprints fq k ,1 to fq k ,G are derived from the training query multimedia object qk according to fingerprint algorithms F1 to FG respectively. The fingerprint algorithms F1 to FG are different from each other, and G>1.
  • At step 605, for each sample, training reference fingerprints fr j ,1 to fr j ,G are derived from the training reference multimedia object rj according to the fingerprint algorithms F1 to FG respectively.
  • At step 607, for each fingerprint algorithm Ft, 1≦t≦G, at least one candidate classifier is generated based on the training query fingerprints and the training reference fingerprints derived according to the fingerprint algorithm Ft. The at least one candidate classifier may include only one candidate classifier, or may include more than one candidate classifier. The candidate classifier is adapted to calculate a decision for any two fingerprints derived according to the fingerprint algorithm Ft. The decision indicates a possibility that the two fingerprints are not derived from the same multimedia content. The candidate classifier may be achieved through any algorithm for identifying whether a fingerprint and another fingerprint are derived from the same multimedia content or not. The algorithm includes, but not limited to, identifying algorithms based on machine learning, and identifying algorithms based on searching.
  • At step 607, also, the model is generated. The model includes a weighted sum of classifiers selected from the candidate classifiers and a threshold THc for evaluating the weighted sum, such that the identifying error obtained by applying the model to the query fingerprints and the training fingerprints derived from the samples is minimized There may be a possibility that in the generated model, the selected classifiers correspond to T>1 fingerprint algorithms. There is also another possibility that in the generated model, the selected classifiers correspond to only one fingerprint algorithm. More than one classifier may correspond to the same fingerprinting algorithm. At step 607, the methods of generating the model described in connection with the apparatus 400 may be adopted.
  • The method 600 ends at step 609.
  • In a further embodiment of the apparatus 400 or the method 600, the algorithms for identifying may include any combination of the algorithms described in connection with Examples A, B and C.
  • In a further embodiment of the apparatus 400 or the method 600, each of the training query multimedia objects includes a number W of objects which are synchronous with each other, and each of the training reference multimedia objects includes the number W of objects which are synchronous with each other, where W>1. For each of the W objects in the training query multimedia objects and the training reference multimedia objects, at least one of the fingerprints is derived from the object according to the same fingerprint algorithm respectively. The W objects may belong to different media classes like audio, video, or image. As an example, the W objects may include an audio object and a video or image object synchronous with each other. Some of the fingerprints may be derived from the audio object, and others may be derived from the video or image object. In this embodiment, fingerprint algorithms suitable for the specific media classes of the objects may be chosen.
  • In an alternative embodiment of the apparatus 400 or the method 600, it is also possible to provide at least two sets of candidate weights of the selected classifiers in the weighted sum, and select the set of candidate weights resulting in the smallest identifying error (e.g., expression (3), (6) or (7)) as the weights of the selected classifiers in the weighted sum. The identifying errors may be obtained by applying the model configured with the sets of weights to training samples.
  • FIG. 7 is a block diagram illustrating an exemplary system 700 for implementing embodiments of the present invention.
  • In FIG. 7, a central processing unit (CPU) 701 performs various processes in accordance with a program stored in a read only memory (ROM) 702 or a program loaded from a storage section 708 to a random access memory (RAM) 703. In the RAM 703, data required when the CPU 701 performs the various processes or the like are also stored as required.
  • The CPU 701, the ROM 702 and the RAM 703 are connected to one another via a bus 704. An input/output interface 705 is also connected to the bus 704.
  • The following components are connected to the input/output interface 705: an input section 706 including a keyboard, a mouse, or the like; an output section 707 including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), or the like, and a loudspeaker or the like; the storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs a communication process via the network such as the internet.
  • A drive 710 is also connected to the input/output interface 705 as required. A removable medium 711, such as a magnetic disk, an optical disk, a magneto—optical disk, a semiconductor memory, or the like, is mounted on the drive 710 as required, so that a computer program read therefrom is installed into the storage section 708 as required.
  • In the case where the above—described steps and processes are implemented by the software, the program that constitutes the software is installed from the network such as the internet or the storage medium such as the removable medium 711.
  • The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
  • The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
  • The following exemplary embodiments (each an “EE”) are described.
  • EE 1. A method of identifying a multimedia object, comprising:
  • acquiring query fingerprints fq,1 to fq,T which are derived from the multimedia object according to fingerprint algorithms F1 to FT respectively, where the fingerprint algorithms F1 to FT are different from each other, and T>1;
  • for each fingerprint algorithm Ft, calculating decisions through each of at least one classifier corresponding to the fingerprint algorithm Ft based on the query fingerprint and reference fingerprints derived from a plurality of reference multimedia objects according to the fingerprint algorithm Ft, each of the decisions indicating a possibility that the query fingerprint and the reference fingerprint for calculating the decision are not derived from the same multimedia content;
  • for each of the reference multimedia objects, calculating a distance D as a weighted sum of the decisions relating to the reference fingerprints derived from the reference multimedia object according to the fingerprint algorithms F1 to FT respectively; and
  • identifying the multimedia object as matching the reference multimedia object with the smallest distance which is less than a threshold THc.
  • EE 2. The method according to EE 1, wherein for each of the fingerprint algorithms, the at least one classifier comprises only one classifier.
  • EE 3. The method according to EE 1, wherein for each of at least one of the classifiers, the calculating of the decisions through the classifier based on the query fingerprint and the reference fingerprints comprises:
  • searching a tree to find at least one leaf node having an bit error rate between the query fingerprint and the reference fingerprint represented by the leaf node less than a maximum tolerable error rate; and
  • calculating the decisions by deciding that only the reference fingerprints represented by the at least one leaf node and the query fingerprint are derived from the same multimedia content,
  • wherein the reference fingerprints have a fixed length L=S×K bits, and S and K are positive integers,
  • wherein the tree is a 2K-ary tree having S levels, and each node in the l-th level, 0≦l≦S, represents a bit sequence of K×l bits,
  • wherein each level has a look-up table defining an estimated bit error rate between the query fingerprint and its closest reference fingerprint under a reached node of the level, such that the probability of observing at least E errors between b bits represented by the reached node and first b bits of the query fingerprint is greater than a threshold pt.
  • EE 4. The method according to EE 3, wherein the at least one leaf node comprises only one leaf node.
  • EE 5. The method according to EE 4, wherein the only one leaf node has the smallest bit error rate.
  • EE 6. The method according to EE 1, wherein for each of at least one of the classifiers,
  • the fingerprints for the classifier are derived as hash values, and the calculating of the decisions through the classifier based on the query fingerprint and the reference fingerprints comprises:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints; and
  • calculating the decisions by deciding that at least one of the reference fingerprints with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 7. The method according to EE 3, wherein for each of at least one of the classifiers,
  • the fingerprints for the classifier are derived as hash values, and the calculating of the decisions through the classifier based on the query fingerprint and the reference fingerprints comprises:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints; and
  • calculating the decisions by deciding that at least one of the reference fingerprints with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 8. The method according to EE 1, wherein for each of at least one of the classifiers,
  • the fingerprints for the classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the calculating of the decisions through the classifier based on the query fingerprint and the reference fingerprints comprises:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints having the identical reliable bits; and
  • calculating the decisions by deciding that at least one of the reference fingerprints with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 9. The method according to EE 3, wherein for each of at least one of the classifiers,
  • the fingerprints for the classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the calculating of the decisions through the classifier based on the query fingerprint and the reference fingerprints comprises:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints having the identical reliable bits; and
  • calculating the decisions by deciding that at least one of the reference fingerprints with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 10. The method according to EE 6, wherein for each of at least one of the classifiers,
  • the fingerprints for the classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the calculating of the decisions through the classifier based on the query fingerprint and the reference fingerprints comprises:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints having the identical reliable bits; and
  • calculating the decisions by deciding that at least one of the reference fingerprints with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 11. The method according to EE 7, wherein for each of at least one of the classifiers,
  • the fingerprints for the classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the calculating of the decisions through the classifier based on the query fingerprint and the reference fingerprints comprises:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints having the identical reliable bits; and
  • calculating the decisions by deciding that at least one of the reference fingerprints with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 12. The method according to EE 1, wherein the multimedia object includes a number W of objects which are synchronous with each other, and each of the reference multimedia objects includes the number W of objects which are synchronous with each other, where W>1, and
  • wherein for each of the W objects in the multimedia object and the reference multimedia objects, at least one of the fingerprints is derived from the object according to the same fingerprint algorithm respectively.
  • EE 13. The method according to EE 12, wherein the number W of objects include an audio object and a video or image object and audio fingerprints are taken from audio objects and video or image fingerprints are taken from video or image objects.
  • EE 14. An apparatus for identifying a multimedia object, comprising:
  • an acquiring unit configured to acquire query fingerprints fq,1 to fq,T which are derived from the multimedia object according to fingerprint algorithms F1 to FT respectively, where the fingerprint algorithms F1 to FT are different from each other, and T>1;
  • a plurality of classifying units, wherein each fingerprint algorithm Ft corresponds to at least one of the classifying units, and each of the classifying units is configured to calculate decisions through a classifier based on the query fingerprint and reference fingerprints derived from a plurality of reference multimedia objects according to the fingerprint algorithm Ft, each of the decisions indicating a possibility that the query fingerprint and the reference fingerprint for calculating the decision are not derived from the same multimedia content; and
  • a combining unit configured to, for each of the reference multimedia objects, calculate a distance D as a weighted sum of the decisions relating to the reference fingerprints derived from the reference multimedia object according to the fingerprint algorithms F1 to FT respectively; and
  • an identifying unit configured to identify the multimedia object as matching the reference multimedia object with the smallest distance which is less than a threshold THc.
  • EE 15. The apparatus according to EE 14, wherein each fingerprint algorithms Ft corresponds to only one of the classifying units.
  • EE 16. The apparatus according to EE 14, wherein for each of at least one of the classifiers, the corresponding classifying unit is further configured to calculate the decisions through the classifier based on the query fingerprint and the reference fingerprints by:
  • searching a tree to find at least one leaf node having an bit error rate between the query fingerprint and the reference fingerprint represented by the leaf node less than a maximum tolerable error rate; and
  • calculating the decisions by deciding that only the reference fingerprint represented by the at least one leaf node and the query fingerprint are derived from the same multimedia content,
  • wherein the reference fingerprints have a fixed length L=S×K bits, and S and K are positive integers,
  • wherein the tree is a 2K-ary tree having S levels, and each node in the l-th level, 0≦l≦S, represents a bit sequence of K×l bits,
  • wherein each level has a look-up table defining an estimated bit error rate between the query fingerprint and its closest reference fingerprint under a reached node of the level, such that the probability of observing at least E errors between b bits represented by the reached node and first b bits of the query fingerprint is greater than a threshold pt.
  • EE 17. The apparatus according to EE 16, wherein the at least one leaf node comprises only one leaf node.
  • EE 18. The apparatus according to EE 17, wherein the only one leaf node has the smallest bit error rate.
  • EE 19. The apparatus according to EE 14, wherein for each of at least one of the classifiers, the fingerprints for the classifier are derived as hash values, and the corresponding classifying unit is further configured to calculate the decisions through the classifier based on the query fingerprint and the reference fingerprints by:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints; and
  • calculating the decisions by deciding that at least one of the reference fingerprints with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 20. The apparatus according to EE 16, wherein for each of at least one of the classifiers, the fingerprints for the classifier are derived as hash values, and the corresponding classifying unit is further configured to calculate the decisions through the classifier based on the query fingerprint and the reference fingerprints by:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints; and
  • calculating the decisions by deciding that at least one of the reference fingerprints with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 21. The apparatus according to EE 14, wherein for each of at least one of the classifiers,
  • the fingerprints for the classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the corresponding classifying unit is further configured to calculate the decisions through the classifier based on the query fingerprint and the reference fingerprints by:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints having the identical reliable bits; and
  • calculating the decisions by deciding that at least one of the reference fingerprint with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 22. The apparatus according to EE 16, wherein for each of at least one of the classifiers,
  • the fingerprints for the classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the corresponding classifying unit is further configured to calculate the decisions through the classifier based on the query fingerprint and the reference fingerprints by:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints having the identical reliable bits; and
  • calculating the decisions by deciding that at least one of the reference fingerprint with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 23. The apparatus according to EE 19, wherein for each of at least one of the classifiers,
  • the fingerprints for the classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the corresponding classifying unit is further configured to calculate the decisions through the classifier based on the query fingerprint and the reference fingerprints by:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints having the identical reliable bits; and
  • calculating the decisions by deciding that at least one of the reference fingerprint with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 24. The apparatus according to EE 20, wherein for each of at least one of the classifiers,
  • the fingerprints for the classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the corresponding classifying unit is further configured to calculate the decisions through the classifier based on the query fingerprint and the reference fingerprints by:
  • calculating a distance d between the query fingerprint and each of the reference fingerprints having the identical reliable bits; and
  • calculating the decisions by deciding that at least one of the reference fingerprint with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  • EE 25. The apparatus according to EE 14, wherein the multimedia object includes a number W of objects which are synchronous with each other, and each of the reference multimedia objects includes the number W of objects which are synchronous with each other, where W>1, and
  • wherein for each of the W objects in the multimedia object and the reference multimedia objects, at least one of the fingerprints is derived from the object according to the same fingerprint algorithm respectively.
  • EE 26. The apparatus according to EE 25, wherein the number W of objects include an audio object and a video or image object, and audio fingerprints are taken from audio objects and video or image fingerprints are taken from video or image objects.
  • EE 27. A method of training a model for identifying multimedia objects, comprising:
  • for each of one or more samples including a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object or not,
      • deriving training query fingerprints from the training query multimedia object according to fingerprint algorithms F1 to FG respectively, where the fingerprint algorithms F1 to FG are different from each other, and G>1;
      • deriving training reference fingerprints from the training reference multimedia object according to the fingerprint algorithms F1 to FG respectively;
  • for each fingerprint algorithm Ft, generating at least one candidate classifier based on the training query fingerprints and the training reference fingerprints derived according to the fingerprint algorithm Ft, the candidate classifier being adapted to calculate a decision for any two fingerprints derived according to the fingerprint algorithm Ft, which indicates a possibility that the two fingerprints are not derived from the same multimedia content; and
  • generating the model including a weighted sum of classifiers selected from the candidate classifiers and a threshold THS for evaluating the weighted sum such that the identifying error obtained by applying the model to the training query fingerprints and the training reference fingerprints derived from the samples is minimized
  • EE 28. The method according to EE 27, wherein the selected classifiers in the generated model correspond to only one fingerprint algorithm.
  • EE 29. The method according to EE 27, wherein the classifiers are generated and selected through an Adaboost method.
  • EE 30. The method according to EE 27, wherein weights of the selected classifiers in the weighted sum are determined through the Adaboost method.
  • EE 31. The method according to EE 27, wherein the generation of the model comprises:
  • providing at least two sets of candidate weights of the selected classifiers in the weighted sum; and
  • selecting the set of candidate weights resulting in the smallest identifying error as the weights of the selected classifiers in the weighted sum.
  • EE 32. The method according to EE 27, wherein for each fingerprint algorithm Ft, only one classifier is selected.
  • EE 33. The method according to EE 29, wherein for each fingerprint algorithm Ft, only one classifier is selected.
  • EE 34. The method according to EE 30, wherein for each fingerprint algorithm Ft, only one classifier is selected.
  • EE 35. The method according to EE 31, wherein for each fingerprint algorithm Ft, only one classifier is selected.
  • EE 36. The method according to EE 27, wherein for each of at least one of the candidate classifiers, the candidate classifier is adapted to:
  • search a tree to find at least one leaf node having an bit error rate between the training query fingerprint and the training reference fingerprint represented by the leaf node less than a maximum tolerable error rate; and
  • calculate the decision by deciding that only the training reference fingerprint represented by the at least one leaf node and the training query fingerprint are derived from the same multimedia content,
  • wherein the training reference fingerprints have a fixed length L=S×K bits, and S and K are positive integers,
  • wherein the tree is a 2K-ary tree having S levels, and each node in the l-th level, 0≦l≦S, represents a bit sequence of K×l bits,
  • wherein each level has a look-up table defining an estimated bit error rate between the training query fingerprint and its closest training reference fingerprint under a reached node of the level, such that the probability of observing at least E errors between b bits represented by the reached node and first b bits of the training query fingerprint is greater than a threshold pt, wherein at least two sets of parameters including K and an initial value of the maximum tolerable error rate are provided for the tree, and the set resulting in the smallest identifying error is selected to generate the candidate classifier.
  • EE 37. The method according to EE 36, wherein the at least one leaf node comprises only one leaf node.
  • EE 38. The method according to EE 37, wherein the only one leaf node has the smallest bit error rate.
  • EE 39. The method according to EE 27, wherein for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, and the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints, and
  • calculate the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content, and
  • wherein at least two candidate thresholds for calculating the decisions are provided and the candidate threshold resulting in the smallest identifying error is selected as the threshold for the candidate classifier.
  • EE 40. The method according to EE 36, wherein for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, and the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints, and
  • calculate the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content, and
  • wherein at least two candidate thresholds for calculating the decisions are provided and the candidate threshold resulting in the smallest identifying error is selected as the threshold for the candidate classifier.
  • EE 41. The method according to EE 27, wherein for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints having the identical reliable bits, and
  • calculating the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content,
  • wherein at least two combinations of threshold for calculating the decisions and division of weak bits and reliable bits are provided and the combination resulting in the smallest identifying error is selected for the candidate classifier.
  • EE 42. The method according to EE 36, wherein for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints having the identical reliable bits, and
  • calculating the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content,
  • wherein at least two combinations of threshold for calculating the decisions and division of weak bits and reliable bits are provided and the combination resulting in the smallest identifying error is selected for the candidate classifier.
  • EE 43. The method according to EE 39, wherein for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints having the identical reliable bits, and
  • calculating the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content,
  • wherein at least two combinations of threshold for calculating the decisions and division of weak bits and reliable bits are provided and the combination resulting in the smallest identifying error is selected for the candidate classifier.
  • EE 44. The method according to EE 40, wherein for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints having the identical reliable bits, and
  • calculating the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content,
  • wherein at least two combinations of threshold for calculating the decisions and division of weak bits and reliable bits are provided and the combination resulting in the smallest identifying error is selected for the candidate classifier.
  • EE 45. The method according to EE 27, wherein each of the training query multimedia objects includes a number W of objects which are synchronous with each other, and each of the training reference multimedia objects includes the number W of objects which are synchronous with each other, where W>1, and
  • wherein for each of the W objects in the training query multimedia objects and the training reference multimedia objects, at least one of the fingerprints is derived from the object according to the same fingerprint algorithm respectively.
  • EE 46. The method according to EE 45, wherein the number W of objects include an audio object and a video or image object, and audio fingerprints are taken from audio objects and video or image fingerprints are taken from video or image objects.
  • EE 47. An apparatus for training a model for identifying multimedia objects, comprising:
  • a fingerprint calculator configured to, for each of one or more samples including a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object or not, derive training query fingerprints from the training query multimedia object according to fingerprint algorithms F1 to FG respectively, where the fingerprint algorithms F1 to FG are different from each other, and G>1, and derive training reference fingerprints from the training reference multimedia object according to the fingerprint algorithms F1 to FG respectively; and
  • a training unit configured to:
      • for each fingerprint algorithm Ft, generate at least one candidate classifier based on the training query fingerprints and the training reference fingerprints derived according to the fingerprint algorithm Ft, the candidate classifier being adapted to calculate a decision for any two fingerprints derived according to the fingerprint algorithm Ft, which indicates a possibility that the two fingerprints are not derived from the same multimedia content; and
      • generate the model including a weighted sum of classifiers selected from the candidate classifiers and a threshold THc for evaluating the weighted sum such that the identifying error obtained by applying the model to the training query fingerprints and the training reference fingerprints derived from the samples is minimized
  • EE 48. The apparatus according to EE 47, wherein the selected classifiers in the generated model correspond to more than one fingerprint algorithm.
  • EE 49. The apparatus according to EE 46, wherein the classifiers are generated and selected through an Adaboost method.
  • EE 50. The apparatus according to EE 49, wherein weights of the selected classifiers in the weighted sum are determined through the Adaboost method.
  • EE 51. The apparatus according to EE 47, wherein the generation of the model comprises:
  • providing at least two sets of candidate weights of the selected classifiers in the weighted sum; and
  • selecting the set of candidate weights resulting in the smallest identifying error as the weights of the selected classifiers in the weighted sum.
  • EE 52. The apparatus according to EE 47, wherein for each fingerprint algorithm Ft, only one classifier is selected.
  • EE 53. The apparatus according to EE 49, wherein for each fingerprint algorithm Ft, only one classifier is selected.
  • EE 54. The apparatus according to EE 50, wherein for each fingerprint algorithm Ft, only one classifier is selected.
  • EE 55. The apparatus according to EE 51, wherein for each fingerprint algorithm Ft, only one classifier is selected.
  • EE 56. The apparatus according to EE 47, wherein
  • for each of at least one of the candidate classifiers, the candidate classifier is adapted to:
  • search a tree to find at least one leaf node having an bit error rate between the training query fingerprint and the training reference fingerprint represented by the leaf node less than a maximum tolerable error rate; and
  • calculate the decision by deciding that only the training reference fingerprint represented by the at least one leaf node and the training query fingerprint are derived from the same multimedia content,
  • wherein the training reference fingerprints have a fixed length L=S×K bits, and S and K are positive integers,
  • wherein the tree is a 2K-ary tree having S levels, and each node in the l-th level, 0≦l≦S, represents a bit sequence of K×l bits,
  • wherein each level has a look-up table defining an estimated bit error rate between the training query fingerprint and its closest training reference fingerprint under a reached node of the level, such that the probability of observing at least E errors between b bits represented by the reached node and first b bits of the training query fingerprint is greater than a threshold pt, wherein at least two sets of parameters including K and an initial value of the maximum tolerable error rate are provided for the tree, and the set resulting in the smallest identifying error is selected to generate the candidate classifier.
  • EE 57. The apparatus according to EE 56, wherein the at least one leaf node comprises only one leaf node.
  • EE 58. The apparatus according to EE 57, wherein the only one leaf node has the smallest bit error rate.
  • EE 59. The apparatus according to EE 47, wherein
  • for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, and the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints, and
  • calculate the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content, and
  • wherein at least two candidate thresholds for calculating the decisions are provided and the candidate threshold resulting in the smallest identifying error is selected as the threshold for the candidate classifier.
  • EE 60. The apparatus according to EE 56, wherein
  • for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, and the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints, and
  • calculate the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content, and
  • wherein at least two candidate thresholds for calculating the decisions are provided and the candidate threshold resulting in the smallest identifying error is selected as the threshold for the candidate classifier.
  • EE 61. The apparatus according to EE 47, wherein
  • for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints having the identical reliable bits, and
  • calculating the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content,
  • wherein at least two combinations of threshold for calculating the decisions and division of weak bits and reliable bits are provided and the combination resulting in the smallest identifying error is selected for the candidate classifier.
  • EE 62. The apparatus according to EE 56, wherein
  • for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints having the identical reliable bits, and
  • calculating the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content,
  • wherein at least two combinations of threshold for calculating the decisions and division of weak bits and reliable bits are provided and the combination resulting in the smallest identifying error is selected for the candidate classifier.
  • EE 63. The apparatus according to EE 59, wherein
  • for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints having the identical reliable bits, and
  • calculating the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content,
  • wherein at least two combinations of threshold for calculating the decisions and division of weak bits and reliable bits are provided and the combination resulting in the smallest identifying error is selected for the candidate classifier.
  • EE 64. The apparatus according to EE 60, wherein
  • for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification, and
  • the candidate classifier is adapted to:
  • calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints having the identical reliable bits, and
  • calculating the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content,
  • wherein at least two combinations of threshold for calculating the decisions and division of weak bits and reliable bits are provided and the combination resulting in the smallest identifying error is selected for the candidate classifier.
  • EE 65. The apparatus according to EE 47, wherein each of the training query multimedia objects includes a number W of objects which are synchronous with each other, and each of the training reference multimedia objects includes the number W of objects which are synchronous with each other, where W>1, and
  • wherein for each of the W objects in the training query multimedia objects and the training reference multimedia objects, at least one of the fingerprints is derived from the object according to the same fingerprint algorithm respectively.
  • EE 66. The apparatus according to EE 65, wherein the number W of objects include an audio object and a video or image object, and audio fingerprints are taken from audio objects and video or image fingerprints are taken from video or image objects.
  • EE 67. A computer-readable medium having computer program instructions recorded thereon for enabling a processor to perform a method of identifying a multimedia object, the method comprising:
  • acquiring query fingerprints fq,1 to fq,T which are derived from the multimedia object according to fingerprint algorithms F1 to FT respectively, where the fingerprint algorithms F1 to FT are different from each other, and T>1;
  • for each fingerprint algorithm Ft, calculating decisions through each of at least one classifier corresponding to the fingerprint algorithm Ft based on the query fingerprint and reference fingerprints derived from a plurality of reference multimedia objects according to the fingerprint algorithm Ft, each of the decisions indicating a possibility that the query fingerprint and the reference fingerprint for calculating the decision are not derived from the same multimedia content;
  • for each of the reference multimedia objects, calculating a distance D as a weighted sum of the decisions relating to the reference fingerprints derived from the reference multimedia object according to the fingerprint algorithms F1 to FT respectively; and
  • identifying the multimedia object as matching the reference multimedia object with the smallest distance which is less than a threshold THc.
  • EE 68. A computer program product including computer program instructions for enabling a processor to perform a method of identifying a multimedia object, the method comprising:
  • acquiring query fingerprints fq,1 to fq,T which are derived from the multimedia object according to fingerprint algorithms F1 to FT respectively, where the fingerprint algorithms F1 to FT are different from each other, and T>1;
  • for each fingerprint algorithm Ft, calculating decisions through each of at least one classifier corresponding to the fingerprint algorithm Ft based on the query fingerprint and reference fingerprints derived from a plurality of reference multimedia objects according to the fingerprint algorithm Ft, each of the decisions indicating a possibility that the query fingerprint and the reference fingerprint for calculating the decision are not derived from the same multimedia content;
  • for each of the reference multimedia objects, calculating a distance D as a weighted sum of the decisions relating to the reference fingerprints derived from the reference multimedia object according to the fingerprint algorithms F1 to FT respectively; and
  • identifying the multimedia object as matching the reference multimedia object with the smallest distance which is less than a threshold THc.
  • EE 69. A computer-readable medium having computer program instructions recorded thereon for enabling a processor to perform a method of training a model for identifying multimedia objects, the method comprising:
  • for each of one or more samples including a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object or not,
      • deriving training query fingerprints from the training query multimedia object according to fingerprint algorithms F1 to FG respectively, where the fingerprint algorithms F1 to FG are different from each other, and G>1;
      • deriving training reference fingerprints from the training reference multimedia object according to the fingerprint algorithms F1 to FG respectively; and
  • for each fingerprint algorithm Ft, generating at least one candidate classifier based on the training query fingerprints and the training reference fingerprints derived according to the fingerprint algorithm Ft, the candidate classifier being adapted to calculate a decision for any two fingerprints derived according to the fingerprint algorithm Ft, which indicates a possibility that the two fingerprints are not derived from the same multimedia content;
  • generating the model including a weighted sum of classifiers selected from the candidate classifiers and a threshold THc for evaluating the weighted sum such that the identifying error obtained by applying the model to the training query fingerprints and the training reference fingerprints derived from the samples is minimized
  • EE 70. A computer program product including computer program instructions for enabling a processor to perform a method of training a model for identifying multimedia objects, the method comprising:
  • for each of one or more samples including a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object or not,
      • deriving training query fingerprints from the training query multimedia object according to fingerprint algorithms F1 to FG respectively, where the fingerprint algorithms F1 to FG are different from each other, and G>1;
      • deriving training reference fingerprints from the training reference multimedia object according to the fingerprint algorithms F1 to FG respectively; and
  • for each fingerprint algorithm Ft, generating at least one candidate classifier based on the training query fingerprints and the training reference fingerprints derived according to the fingerprint algorithm Ft, the candidate classifier being adapted to calculate a decision for any two fingerprints derived according to the fingerprint algorithm Ft, which indicates a possibility that the two fingerprints are not derived from the same multimedia content;
  • generating the model including a weighted sum of classifiers selected from the candidate classifiers and a threshold THc for evaluating the weighted sum such that the identifying error obtained by applying the model to the training query fingerprints and the training reference fingerprints derived from the samples is minimized.

Claims (24)

    We claim:
  1. 1. An apparatus for identifying a multimedia object, comprising:
    an acquiring unit, implemented at least in part by one or more computing processors, that acquires query fingerprints fq,1 to fq,T which are derived from the multimedia object according to fingerprint algorithms F1 to FT respectively, where the fingerprint algorithms F1 to FT are different from each other, and T>1;
    a plurality of classifying units, wherein each fingerprint algorithm Ft corresponds to at least one of the classifying units, and each of the classifying units, implemented at least in part by one or more computing processors, calculates decisions through a classifier based on the query fingerprint fq,t and reference fingerprints derived from a plurality of reference multimedia objects according to the fingerprint algorithm Ft, each of the decisions indicating a possibility that the query fingerprint and the reference fingerprint for calculating the decision are not derived from the same multimedia content; and
    a combining unit, implemented at least in part by one or more computing processors, that, for each of the reference multimedia objects, calculates a distance D as a weighted sum of the decisions relating to the reference fingerprints derived from the reference multimedia object according to the fingerprint algorithms F1 to FT respectively; and
    an identifying unit, implemented at least in part by one or more computing processors, that identifies the multimedia object as matching the reference multimedia object with the smallest distance which is less than a threshold THc;
    wherein for each of at least one of the classifiers, the fingerprints for the classifier are derived as hash values, and the corresponding classifying unit further calculates the decisions through the classifier based on the query fingerprint and the reference fingerprints by:
    calculating a distance d between the query fingerprint and each of the reference fingerprints; and
    calculating the decisions by deciding that at least one of the reference fingerprints with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  2. 2. The apparatus according to claim 1, wherein each fingerprint algorithms Ft corresponds to only one of the classifying units.
  3. 3. The apparatus according to claim 1, wherein the multimedia object includes a number W of objects which are synchronous with each other, and each of the reference multimedia objects includes the number W of objects which are synchronous with each other, where W>1, and
    wherein for each of the W objects in the multimedia object and the reference multimedia objects, at least one of the fingerprints is derived from the object according to the same fingerprint algorithm respectively.
  4. 4. The apparatus according to claim 1, wherein each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification.
  5. 5. An apparatus for training a model for identifying multimedia objects, comprising:
    a fingerprint calculator, implemented at least in part by one or more computing processors, that, for each of one or more samples including a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object or not, derives training query fingerprints from the training query multimedia object according to fingerprint algorithms F1 to FG respectively, where the fingerprint algorithms F1 to FG are different from each other, and G>1, and derives training reference fingerprints from the training reference multimedia object according to the fingerprint algorithms F1 to FG respectively; and
    a training unit, implemented at least in part by one or more computing processors, that:
    for each fingerprint algorithm Ft, generates at least one candidate classifier based on the training query fingerprints and the training reference fingerprints derived according to the fingerprint algorithm Ft, the candidate classifier being adapted to calculate a decision for any two fingerprints derived according to the fingerprint algorithm Ft, which indicates a possibility that the two fingerprints are not derived from the same multimedia content; and
    generates the model including a weighted sum of classifiers selected from the candidate classifiers and a threshold THc for evaluating the weighted sum such that the identifying error obtained by applying the model to the training query fingerprints and the training reference fingerprints derived from the samples is minimized;
    wherein for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, and the candidate classifier is adapted to:
    calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints, and
    calculate the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content, and
    wherein at least two candidate thresholds for calculating the decisions are provided and the candidate threshold resulting in the smallest identifying error is selected as the threshold for the candidate classifier.
  6. 6. The apparatus according to claim 5, wherein the selected classifiers in the generated model correspond to more than one fingerprint algorithm.
  7. 7. The apparatus according to claim 5, wherein the classifiers are generated and selected through an Adaboost method.
  8. 8. The apparatus according to claim 7, wherein weights of the selected classifiers in the weighted sum are determined through the Adaboost method.
  9. 9. The apparatus according to claim 5, wherein the generation of the model comprises:
    providing at least two sets of candidate weights of the selected classifiers in the weighted sum; and
    selecting the set of candidate weights resulting in the smallest identifying error as the weights of the selected classifiers in the weighted sum.
  10. 10. The apparatus according to claim 5, wherein for each fingerprint algorithm Ft, only one classifier is selected.
  11. 11. The apparatus according to claim 5, wherein the fingerprints for generating the candidate classifier are derived as hash values, each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification.
  12. 12. The apparatus according to claim 5, wherein each of the training query multimedia objects includes a number W of objects which are synchronous with each other, and each of the training reference multimedia objects includes the number W of objects which are synchronous with each other, where W>1, and
    wherein for each of the W objects in the training query multimedia objects and the training reference multimedia objects, at least one of the fingerprints is derived from the object according to the same fingerprint algorithm respectively.
  13. 13. A method of identifying a multimedia object, comprising:
    acquiring query fingerprints fq,1 to fq,T which are derived from the multimedia object according to fingerprint algorithms F1 to FT respectively, where the fingerprint algorithms F1 to FT are different from each other, and T>1;
    for each fingerprint algorithm Ft, calculating decisions through each of at least one classifier corresponding to the fingerprint algorithm Ft based on the query fingerprint fq,t and reference fingerprints derived from a plurality of reference multimedia objects according to the fingerprint algorithm Ft, each of the decisions indicating a possibility that the query fingerprint and the reference fingerprint for calculating the decision are not derived from the same multimedia content;
    for each of the reference multimedia objects, calculating a distance D as a weighted sum of the decisions relating to the reference fingerprints derived from the reference multimedia object according to the fingerprint algorithms F1 to FT respectively; and
    identifying the multimedia object as matching the reference multimedia object with the smallest distance which is less than a threshold THc;
    wherein for each of at least one of the classifiers, the fingerprints for the classifier are derived as hash values, and wherein the decisions are calculated based on the query fingerprint and the reference fingerprints by:
    calculating a distance d between the query fingerprint and each of the reference fingerprints; and
    calculating the decisions by deciding that at least one of the reference fingerprints with the distance d less than a threshold and the query fingerprint are derived from the same multimedia content.
  14. 14. The method according to claim 13, wherein each fingerprint algorithms Ft corresponds to only one classifying unit.
  15. 15. The method according to claim 13, wherein the multimedia object includes a number W of objects which are synchronous with each other, and each of the reference multimedia objects includes the number W of objects which are synchronous with each other, where W>1, and
    wherein for each of the W objects in the multimedia object and the reference multimedia objects, at least one of the fingerprints is derived from the object according to the same fingerprint algorithm respectively.
  16. 16. The method according to claim 13, wherein each of the hash values is divided into weak bits and reliable bits, where the weak bits are likely to flip when the multimedia object, from which the fingerprint is derived, is modified, and the reliable bits are less likely to flip as a result of content modification.
  17. 17. A method of training a model for identifying multimedia objects, comprising:
    for each of one or more samples including a training query multimedia object, a training reference multimedia object and a mark indicating whether the training query multimedia object matches the training reference multimedia object or not,
    deriving training query fingerprints from the training query multimedia object according to fingerprint algorithms F1 to FG respectively, where the fingerprint algorithms Ft to FG are different from each other, and G>1;
    deriving training reference fingerprints from the training reference multimedia object according to the fingerprint algorithms F1 to FG respectively; and
    for each fingerprint algorithm Ft, generating at least one candidate classifier based on the training query fingerprints and the training reference fingerprints derived according to the fingerprint algorithm Ft, the candidate classifier being adapted to calculate a decision for any two fingerprints derived according to the fingerprint algorithm Ft, which indicates a possibility that the two fingerprints are not derived from the same multimedia content;
    generating the model including a weighted sum of classifiers selected from the candidate classifiers and a threshold THc for evaluating the weighted sum such that the identifying error obtained by applying the model to the training query fingerprints and the training reference fingerprints derived from the samples is minimized;
    wherein for each of at least one of the candidate classifiers, the fingerprints for generating the candidate classifier are derived as hash values, and the candidate classifier is adapted to:
    calculate a distance d between the training query fingerprint and each of a set of training reference fingerprints, and
    calculate the decisions by deciding that at least one of the training reference fingerprints with the distance d less than a threshold and the training query fingerprint are derived from the same multimedia content, and
    wherein at least two candidate thresholds for calculating the decisions are provided and the candidate threshold resulting in the smallest identifying error is selected as the threshold for the candidate classifier.
  18. 18. The method according to claim 17, wherein each fingerprint algorithms Ft corresponds to only one classifying unit.
  19. 19. The method according to claim 17, wherein the multimedia object includes a number W of objects which are synchronous with each other, and each of the reference multimedia objects includes the number W of objects which are synchronous with each other, where W>1, and
    wherein for each of the W objects in the multimedia object and the reference multimedia objects, at least one of the fingerprints is derived from the object according to the same fingerprint algorithm respectively.
  20. 20. The method according to claim 17, wherein the selected classifiers in the generated model correspond to more than one fingerprint algorithm.
  21. 21. The method according to claim 17, wherein the classifiers are generated and selected through an Adaboost method.
  22. 22. The method according to claim 21, wherein weights of the selected classifiers in the weighted sum are determined through the Adaboost method.
  23. 23. The method according to claim 17, wherein the generation of the model comprises:
    providing at least two sets of candidate weights of the selected classifiers in the weighted sum; and
    selecting the set of candidate weights resulting in the smallest identifying error as the weights of the selected classifiers in the weighted sum.
  24. 24. The method according to claim 17, wherein for each fingerprint algorithm Ft, only one classifier is selected.
US14869554 2012-04-18 2015-09-29 Identifying multimedia objects based on multimedia fingerprint Pending US20160019671A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US201261625889 true 2012-04-18 2012-04-18
US13854276 US9202255B2 (en) 2012-04-18 2013-04-01 Identifying multimedia objects based on multimedia fingerprint
US14869554 US20160019671A1 (en) 2012-04-18 2015-09-29 Identifying multimedia objects based on multimedia fingerprint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14869554 US20160019671A1 (en) 2012-04-18 2015-09-29 Identifying multimedia objects based on multimedia fingerprint

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13854276 Division US9202255B2 (en) 2012-04-18 2013-04-01 Identifying multimedia objects based on multimedia fingerprint

Publications (1)

Publication Number Publication Date
US20160019671A1 true true US20160019671A1 (en) 2016-01-21

Family

ID=48190097

Family Applications (2)

Application Number Title Priority Date Filing Date
US13854276 Active 2033-09-06 US9202255B2 (en) 2012-04-18 2013-04-01 Identifying multimedia objects based on multimedia fingerprint
US14869554 Pending US20160019671A1 (en) 2012-04-18 2015-09-29 Identifying multimedia objects based on multimedia fingerprint

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13854276 Active 2033-09-06 US9202255B2 (en) 2012-04-18 2013-04-01 Identifying multimedia objects based on multimedia fingerprint

Country Status (2)

Country Link
US (2) US9202255B2 (en)
EP (1) EP2657884B1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9846696B2 (en) 2012-02-29 2017-12-19 Telefonaktiebolaget Lm Ericsson (Publ) Apparatus and methods for indexing multimedia content
US9609373B2 (en) * 2013-10-25 2017-03-28 Avago Technologies General Ip (Singapore) Pte. Ltd. Presentation timeline synchronization across audio-video (AV) streams
WO2015183148A1 (en) * 2014-05-27 2015-12-03 Telefonaktiebolaget L M Ericsson (Publ) Fingerprinting and matching of content of a multi-media file
US9996603B2 (en) * 2014-10-14 2018-06-12 Adobe Systems Inc. Detecting homologies in encrypted and unencrypted documents using fuzzy hashing
EP3096243A1 (en) * 2015-05-22 2016-11-23 Thomson Licensing Methods, systems and apparatus for automatic video query expansion

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020178410A1 (en) * 2001-02-12 2002-11-28 Haitsma Jaap Andre Generating and matching hashes of multimedia content
WO2010021965A1 (en) * 2008-08-17 2010-02-25 Dolby Laboratories Licensing Corporation Signature derivation for images

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7421376B1 (en) * 2001-04-24 2008-09-02 Auditude, Inc. Comparison of data signals using characteristic electronic thumbprints
US6931413B2 (en) * 2002-06-25 2005-08-16 Microsoft Corporation System and method providing automated margin tree analysis and processing of sampled data
RU2006134049A (en) * 2004-02-26 2008-04-10 Медиагайд A method and apparatus for automatically detecting and identifying the transmitted audio signal or video
US8103050B2 (en) 2006-01-16 2012-01-24 Thomson Licensing Method for computing a fingerprint of a video sequence
WO2007091243A3 (en) * 2006-02-07 2009-04-16 Mobixell Networks Ltd Matching of modified visual and audio media
US7840540B2 (en) 2006-04-20 2010-11-23 Datascout, Inc. Surrogate hashing
EP2168061A1 (en) * 2007-06-06 2010-03-31 Dolby Laboratories Licensing Corporation Improving audio/video fingerprint search accuracy using multiple search combining
CN101743512B (en) * 2007-06-27 2012-09-05 杜比实验室特许公司 Incremental construction of search tree with signature pointers for identification of multimedia content
US8934545B2 (en) 2009-02-13 2015-01-13 Yahoo! Inc. Extraction of video fingerprints and identification of multimedia using video fingerprinting
US8892570B2 (en) * 2009-12-22 2014-11-18 Dolby Laboratories Licensing Corporation Method to dynamically design and configure multimedia fingerprint databases
US8655878B1 (en) * 2010-05-06 2014-02-18 Zeitera, Llc Scalable, adaptable, and manageable system for multimedia identification
US8805827B2 (en) * 2011-08-23 2014-08-12 Dialogic (Us) Inc. Content identification using fingerprint matching

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020178410A1 (en) * 2001-02-12 2002-11-28 Haitsma Jaap Andre Generating and matching hashes of multimedia content
WO2010021965A1 (en) * 2008-08-17 2010-02-25 Dolby Laboratories Licensing Corporation Signature derivation for images
US20110142348A1 (en) * 2008-08-17 2011-06-16 Dolby Laboratories Licensing Corporation Signature Derivation for Images

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Miller et al. "Audio fingerprinting: nearest neighbor search in high dimensional binary spaces." Journal of VLSI signal processing systems for signal, image and video technology 41.3 (2005): 285-291. *
Yang et al., "A robust hashing algorithm based on SURF for video copy detection", Computers & Security Volume 31, Issue 1, February 2012, 33–39 *
Yeo et al., "Rate-efficient visual correspondences using random projections", Image Processing, 2008. ICIP 2008., 217-220 *

Also Published As

Publication number Publication date Type
EP2657884A3 (en) 2014-10-08 application
EP2657884B1 (en) 2018-09-12 grant
US20130279740A1 (en) 2013-10-24 application
US9202255B2 (en) 2015-12-01 grant
EP2657884A2 (en) 2013-10-30 application

Similar Documents

Publication Publication Date Title
US7107254B1 (en) Probablistic models and methods for combining multiple content classifiers
US20090060351A1 (en) Visual Language Modeling for Image Classification
US9031999B2 (en) System and methods for generation of a concept based database
US20100185691A1 (en) Scalable semi-structured named entity detection
US20120124034A1 (en) Co-selected image classification
US20130226850A1 (en) Method and apparatus for adapting a context model
US20100030780A1 (en) Identifying related objects in a computer database
US20120290621A1 (en) Generating a playlist
US20080319973A1 (en) Recommending content using discriminatively trained document similarity
US9324323B1 (en) Speech recognition using topic-specific language models
US20080159622A1 (en) Target object recognition in images and video
US20050165732A1 (en) System and method providing automated margin tree analysis and processing of sampled data
US20120078894A1 (en) Trend Analysis in Content Identification Based on Fingerprinting
US20110219012A1 (en) Learning Element Weighting for Similarity Measures
US20150371633A1 (en) Speech recognition using non-parametric models
US20100125447A1 (en) Language identification for documents containing multiple languages
US8396286B1 (en) Learning concepts for video annotation
US20090319449A1 (en) Providing context for web articles
US7933859B1 (en) Systems and methods for predictive coding
US20090037175A1 (en) Confidence measure generation for speech related searching
US8156132B1 (en) Systems for comparing image fingerprints
US20060218192A1 (en) Method and System for Providing Information Services Related to Multimodal Inputs
US20120136812A1 (en) Method and system for machine-learning based optimization and customization of document similarities calculation
US20110208744A1 (en) Methods for detecting and removing duplicates in video search results
US9087049B2 (en) System and method for context translation of natural language

Legal Events

Date Code Title Description
AS Assignment

Owner name: DOLBY LABORATORIES LICENSING CORPORATION, CALIFORN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BAUER, CLAUS;LU, LIE;HU, MINGQING;SIGNING DATES FROM 20120420 TO 20120421;REEL/FRAME:036768/0170