WO2013017306A1 - Détection de copie - Google Patents

Détection de copie Download PDF

Info

Publication number
WO2013017306A1
WO2013017306A1 PCT/EP2012/059988 EP2012059988W WO2013017306A1 WO 2013017306 A1 WO2013017306 A1 WO 2013017306A1 EP 2012059988 W EP2012059988 W EP 2012059988W WO 2013017306 A1 WO2013017306 A1 WO 2013017306A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth
measure
video
query
reference video
Prior art date
Application number
PCT/EP2012/059988
Other languages
English (en)
Inventor
Mohamed Hefeeda
Nagmeh KHODABAKSHI
Original Assignee
Qatar Foundation
Hoarton, Lloyd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB1113282.6A external-priority patent/GB2493514B/en
Priority claimed from US13/197,573 external-priority patent/US8509600B2/en
Application filed by Qatar Foundation, Hoarton, Lloyd filed Critical Qatar Foundation
Priority to EP12724334A priority Critical patent/EP2569722A1/fr
Publication of WO2013017306A1 publication Critical patent/WO2013017306A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/647Three-dimensional objects by matching two-dimensional images to three-dimensional objects

Definitions

  • the present invention relates to copy detection, and, in particular, to copy detection for 3D video.
  • a 3D (three-dimensional) video is one that is designed to enhance the illusion of depth perception.
  • a camera system is typically used to record images from two perspectives (or computer-generated imagery generates the two perspectives in post-production), and special projection hardware and/or eyewear are used to provide the illusion of depth when viewing.
  • Such 3D video content can include film theatrical releases, television broadcasts and online content.
  • 3D content is expensive - typically more so than the cost associated with two dimensional (2D) content. Accordingly, content owners are interested in protecting their contents from illegal copying and distribution in order to protect the investments being made.
  • a method for detecting whether query video data appears in a reference video comprising determining a measure of depth from a portion of the query video data, comparing the measure against a measure of depth for the reference video to perform a depth match and, if a match is determined, comparing a visual signature derived from the query video data against a visual signature of the reference video to perform a visual match to determine a measure representing the likelihood that the query video data derives from the reference video.
  • Comparing a visual signature can include generating a visual signature for the query video, and comparing the visual signature against a measure of visual content of the reference video.
  • the measure of depth represents 3D content of a video.
  • Determining a measure of depth can include extracting a depth map from the query video data representing information relating to the distance of surfaces of scene objects from a viewpoint.
  • comparing the measure can include comparing against multiple frames of a reference video. The multiple frames can represent different views for the reference video.
  • a method can further include computing a matching score representing a measure of distance between query video data and the reference data.
  • the matching score can comprise first and components relating to first and second matching scores from a depth match and a visual match respectively.
  • a matching score is determined using a distance threshold measure above which a matching score is set to a first value, and below which a matching score is set to a second value.
  • a copy detection system for a 3D reference video comprising an extraction engine to determine a measure of depth from a portion of query video data, a comparison engine to perform a depth match by comparing the measure against a measure of depth for the reference video and to perform a visual match by comparing a visual signature derived from the query video data against a visual signature of the reference video to determine a measure representing the likelihood that the query video data derives from the reference video.
  • the reference video can include reference video data including depth information representing information relating to the distance of surfaces of scene objects from a viewpoint.
  • the comparison engine is further operable to determine a match for a portion of reference video over multiple frames of the reference video. The multiple frames can relate to multiple views for the reference video.
  • the comparison engine can be operable to compare the measure of depth from the query video data to multiple depth measures from respective multiple reference videos, and further operable to generate a matching score representing a measure of similarity between the query video data and a reference video.
  • the matching score can include components representing a measure for the depth match and for the visual match performed by the comparison engine.
  • the system is further operable to locate the position, within the reference video, of matching portion of query video data.
  • a computer program embedded on a non-transitory tangible computer readable storage medium including machine readable instructions that, when executed by a processor, implement a method for detecting copied portions of a reference video, comprising determining a measure of depth from a portion of query video data, comparing the measure against a measure of depth for the reference video to perform a depth match and, if a match is determined, comparing a visual signature derived from the query video data against a visual signature of the reference video to perform a visual match to determine a measure representing the likelihood that the query video data derives from the reference video.
  • Determining a measure of depth can include using an existing depth map or deriving a depth map from existing video data for the reference video or query video data.
  • Figure 1 is a schematic block diagram of a method according to an example
  • Figure 2 is a schematic block diagram of a method for generating a depth signature according to an example
  • Figure 3 is a schematic block diagram of a method for detecting whether query video data appears in a reference video according to an example
  • Figure 4 is a diagram depicting a general case in which query and reference videos have multiple views
  • Figure 5 is a diagram depicting a method according to an example
  • Figure 6 is a schematic block diagram of a copy detection system according to an example.
  • watermarking embeds known information in content prior to distribution.
  • copies of marked content contain the watermark, which can later be extracted and used to prove the existence of copied material.
  • content-based copy detection additional information beyond the content itself is not required.
  • a video contains enough unique information that can be used for detecting copies. For example, if an owner of a video suspects that the video content is being illegally distributed or hosted on the Internet, the owner can pose a query to a copy detection system which can perform a comparison to determine if a copy is present.
  • the content-based copy detection can also be a complementary approach to watermarking. After a suitable copy detector provides a creator or a distributor with a suspect list, the actual owner of the media can use a watermark or other authentication techniques to prove ownership for example.
  • FIG. 1 is a schematic block diagram of a method according to an example.
  • Data for a reference video 101 which represents an original video including 3D content is processed in block 103 in order to extract depth information in the form of a depth map 104.
  • depth information For example, a measure of depth from a portion of the reference video data representing 3D content can be extracted.
  • a depth map provides a representation of information relating to the distance of surfaces of scene objects from a particular viewpoint, such as a camera for example.
  • Multi-view video in which a video has multiple views, a subset of which are displayed to the user depending on the angle of viewing.
  • Video plus depth in which a video is encoded in 2D and a separate depth map is created.
  • the depth map allows the creation of many virtual (synthesized) views, which can add flexibility and support wider viewing angles for users for example.
  • a 3D video can be encoded in multi-view plus depth, where several views are used with a depth map to create more virtual views.
  • depth information for each video frame is typically represented as a grayscale image showing the depth of each pixel in that video frame. That is, a difference in gray level of a pixel compared to others represents a difference in depth for those pixels in the image, or rather a difference in depth for the underlying structure making up the image at those pixel locations. Such depth information can be used in order to provide a depth map for the frame.
  • a method for estimating depth information to provide a depth map based on stereo matching for example can be used. More specifically, human eyes are typically horizontally separated by about 50-75 mm depending on each individual.
  • disparity between a stereo pair of images can be used to extract the depth information since the amount of disparity is inversely proportional to the distance from the observer. Therefore, according to an example, generating disparity images can be performed by taking two or more images and estimating a 3D model of the underlying scene for those images by finding corresponding pixels in the images and converting their 2D positions into 3D depths. A grayscale image can be used to represent a depth map determined using such a method as before.
  • depth information can be extracted from a single frame, or equivalent, or consecutively over several frames forming a portion of video content.
  • the depth information is typically in the form of a grayscale image.
  • the depth information can be in the form of a set of grayscale images in which case the multiple such pieces of depth information can be combined in any suitable way in order to provide a single measure for the depth information over the portion in question.
  • a single video frame comprises a pair of images which are used to form a 3D image frame for consumption, each image from the stereo pair may provide depth information.
  • the information may then be combined into a single grayscale image such as by addition, subtraction or any other suitable method for combining the information from the images.
  • depth information which has been extracted from video data representing a reference video is used to generate a depth signature for the reference video as will be explained below.
  • the depth signature is indexed in a signature database 109.
  • the generated depth signature can be indexed and stored with other depth signatures from other reference videos in order to form a repository of such signatures for a collection of reference videos, and which can be queried in order to compare signatures as will be explained below.
  • a reference video herein is an original video. Videos that are checked against reference videos are query videos herein. If a query video matches one of the reference videos, that query video is called a copied video herein. Other alternatives are possible.
  • FIG. 2 is a schematic block diagram of a method for generating a depth signature according to an example.
  • depth map 104 is divided into a grid.
  • the division into a grid can be uniform, i.e., into equal size blocks, or non-uniform to account for different importance in regions in the depth map.
  • salient image portions can be segmented differently to account for their relative importance compared to non-salient regions in a frame.
  • the level of granularity of segmentation can be finer over salient portions of a depth map.
  • the number of blocks in a grid is a configurable parameter which trades off computational complexity with copy detection accuracy.
  • the number can be determined automatically, such as based on a measure of saliency for example, or can be a manually provided number.
  • segmented blocks of the depth map 104 grid are mapped to a vector 204.
  • each element of the vector can represent one block, although groups of blocks may map to one vector element.
  • a group of non-salient segments may be combined in a suitable way and mapped to one or more elements of a vector, thereby reducing the number of dimensions of the vector.
  • Various metrics can be used to map the depth information. For example, the mean, mode, or median of the depth values in a segment can be used as a measure for the block in question with that value being mapped to the vector. More complex metrics that are composed of multiple components, e.g., the mean and standard deviation, can also be used.
  • the vector 204 takes the form ⁇ di, d 2 , .. c/,..., d D > in an example, where D is the total number of blocks in the depth map grid, and d, is a metric summarizing the depth information in block / ' .
  • a depth signature can be created for every frame in a video. It can also be created for only a subset of the frames in order to reduce the computational complexity. This subset of the frames can be chosen deterministically, e.g., each 10 th frame is chosen, or randomly.
  • the subset of the frames can be keyframes of the video, where a keyframe is a representative frame for a sequence of video frames containing similar visual information, which is referred to as a video shot.
  • shot boundary detection algorithms can be employed to identify when a shot starts and ends.
  • Known keyframe selection algorithms can be used to select key frames.
  • the depth signature for a video is composed of the depth signatures of its frames, or the chosen subset of frames for which the depth signatures are created.
  • the depth signature can be formed by concatenating the vectors for example.
  • the metrics summarizing respective blocks can be grouped in the signature to provide a signature of the form ⁇ d 1 i, d 1 2, ⁇ d > ..., ⁇ >, where, over m frames, D is the total number of blocks in the depth map grid, of, is the metric summarizing the depth information in block / ' of frame j-
  • depth signatures can be generated for other videos other than reference videos. That is to say, a query video which is a candidate copy of a reference video and which it is desired to compare against reference videos can be processed in order to generate a depth signature for it.
  • depth signatures which are vectors with multiple dimensions, can be compared against depth vectors from other videos in order to find potential copies.
  • depth signatures are indexed in order to facilitate such comparisons.
  • LSH locality sensitive hashing
  • the basic premise is to hash high dimensional vectors to integer hash values in such a way so that when the vectors are 'close' to each other in the original space, their hash values are likely to be close to each other as well.
  • a hash function h ⁇ H is called (r ⁇ r ⁇ p ⁇ sensitive for a distance measure D, which can be Euclidean distance, if for any v, q ⁇ S:
  • each hash function maps a c/-dimensional vector v to an integer using the following: where a is a c/-dimensional vector with entries chosen independently from an s-stable distribution, and b is a real number chosen uniformly from the range [0, w] .
  • the parameter w depends on the desired distance between the query and nearest points, and the distance measure.
  • FIG. 3 is a schematic block diagram of a method for detecting whether query video data appears in a reference video according to an example.
  • depth information for a query video is extracted, and a depth signature as described above is generated in block 303.
  • the depth signature for the query video is compared against depth signatures in the signature database 109 in block 305 (described in more detail below). If a match is not found, the process ends in block 307. If a match is found, a visual signature for the query video is generated in block 309.
  • visual features should be robust to (i.e. they should not change because of) various transformations such as scaling, rotation, change in viewpoint, and change in illumination for example.
  • Different types of visual features can be used, including but not limited to SURF (Speeded Up Robust Feature) and SIFT (Scale-Invariant Feature Transform).
  • SIFT features can be extracted from a video frames as follows:
  • Scale-space extrema detection Search over all scales and image location (using a difference-of-Gaussian function) to identify potential interest points that are invariant to scale and orientation.
  • Orientation assignment One or more orientations are assigned to each keypoint location based on local image gradient direction (All future operations are performed on the image based on the assigned orientation, scale, and location for each feature, so the features are invariance to these transformations).
  • Keypoint descriptor Local image gradients are measured at the selected scale in the region around each keypoint. Then, these are transformed into a representation that allows significant levels of local shape distortion and change in illumination.
  • the descriptor for each keypoint is usually a 128-element vector.
  • the number of visual features extracted from each video frame can be controlled by configuring a feature extraction algorithm. Reducing the number of extracted features results in a reduction of the computational complexity, but might introduce some errors in the detection process.
  • the peak threshold and edge threshold parameters of the SIFT feature extraction algorithm are set to control the number of SIFT features. In one example, the number of SIFT features extracted from each frame can be set to around 200.
  • the visual signature for a video frame takes the form ⁇ v 1 ,v 2 ,..,Vi...,vv>, where V is the total number visual features in a frame, and v, is the value of visual feature / ' .
  • Each visual feature v has multiple elements. For example, a SIFT feature typically has 128 elements.
  • the visual signature can be created for every frame in the video. It can also be created for only a subset of the frames in order to reduce the computational complexity. This subset of the frames can be chosen deterministically, e.g., each 10 th frame is chosen, or randomly.
  • the subset of the frames can be the keyframes of the video, where a keyframe is a representative frame for a sequence of video frames containing similar visual information, which is referred to as a video shot.
  • the visual signature of a video is composed of the visual signatures of its frames, or the chosen subset of frames for which the visual signatures are computed.
  • a set of visual signatures for reference videos is generated and stored in the signature repository 109. Accordingly, in block 1 1 1 1 the visual signature of a query video is compared against the visual signatures of reference videos stored in the repository 109. If a match is not found, the process ends at block 1 13. If a match is found, a score representing the match along with other data such as an identification of the query video and/or an identification of the matching portion is returned in block 1 15.
  • visual signatures can be hashed into V buckets, with an entry stored for it in each bucket.
  • Such an entry has the following fields according to an example: ⁇ VideoID, FrameID, ViewID, FeatureID, VisualSignature >.
  • a depth signature is first computed from a query video.
  • the methods used to extract depth information and create depth signatures are the same as the ones used to process reference videos as described above.
  • the depth signature of the query video is then compared against the depth signatures in the repository 109, which can store signatures for reference videos in the form of a database.
  • a visual signature is computed from the query video and compared against visual signatures in the reference video database 109.
  • a combined score is then computed based on a depth signature matching score as well as a visual signature matching score. The combined score is used to decide whether the query video is a copy of one of the videos in the reference video database.
  • This method is computationally efficient as it eliminates many query videos by checking their depth signatures first, which are typically more compact and faster to compare than visual signatures. Since modifications to the depth values of copied videos can damage depth perception and create undesirable visual artefacts, which means that they are unlikely to be performed on copied videos, the loss in the copy detection accuracy in this method is not significant.
  • depth and visual signatures can be extracted from a query video. Then, both the depth and visual signatures can be compared against the signatures in the reference database 109 and a combined matching computed, which can then be used to determine whether the query video is a copy or not.
  • This method requires more computations, but it can tolerate depth transformations applied to the copied videos. If a query video is found to be a potential copy of a reference video or part of it, the location of the copied part in the reference video can be identified.
  • determining potential copied query videos using depth signatures can take place in steps: frame level comparison; and video level comparison.
  • the best matching frames in the signature database 109 for each query frame are determined and a score between each matched pair is computed.
  • the temporal aspects of the video frames can be taken into account, and a matching score between a query video and each reference video can be computed.
  • the depth signatures that are closest to it based on a measure, such as Euclidean distance can be found using a nearest neighbour search method such as LSH.
  • LSH nearest neighbour search method
  • the L buckets of the hashed values of the depth signature of the query video frame are identified.
  • the distances between the query depth signature and all depth signatures are computed.
  • distances can be used as matching scores.
  • a threshold can be used such that scores for distances exceeding the threshold can be set to zero and other scores are set to 1 for example. This can reduce the computation needed to compare scores. It should be noted that frames found in this may belong to different videos.
  • 3D videos can have multiple views, and a signature from the query frame should be checked against frames from different views.
  • Figure 4 is a diagram depicting a general case in which query and reference videos have multiple views. As such, q views in the query video and r views in the target video are matched against each other. Two frames are considered a match if at least one of their views matches. Finally, a score can be computed for each matched pair of frames based on the distance between their views. For example, the score can be the number of views that match. The number of matched frames in each reference video can then be counted. Reference videos with the number of matched frames exceeding a threshold can be considered in a subsequent step, with other videos no longer considered.
  • Temporal characteristics include the timing and order of the frames in the query and reference videos. For example, if frame x in a query video matches frame y in a reference video, then frame x+1 in the query video should match frame y+1 in the reference video.
  • Copied videos are typically clips with contiguous frames taken from reference videos. Also, a copied video can be embedded in other videos.
  • a matching matrix is computed for each candidate reference video and the query video according to an example.
  • the columns of the matrix represent reference video frames, and the rows represent query video frames. Entries in the matrix are the relevance scores of the frames.
  • Figure 5 is a diagram of a matching matrix according to an example, in which dark squares represent matched frames.
  • the longest diagonal sequence in the matrix with the largest number of matching frames is considered as a potential copy.
  • the number of matched frames is referred to as the depth matching score and is denoted by S dep th- It is worth mentioning that frame dropping and occasional frame mismatches caused by possible transformations must be taken into account.
  • the diagonal sequence mentioned before is not a strictly linear one, and gaps may exist.
  • FIG. 5 is a diagram depicting a method according to an example.
  • This band depicted by dashed diagonal lines, starts sweeping the matrix from top left most position and moves one block each time in the direction shown by arrow A.
  • the temporal score of the longest diagonal sequence of matched frames inside the band is computed.
  • the position with the greatest temporal score is considered the potential copied location, and its score is considered the temporal score of the reference video.
  • the arrow B in figure 5 shows the longest diagonal sequence of for the example matrix.
  • visual signatures comparison can occur in steps: frame level, and video level.
  • the visual signatures that are closest to it based on a measure such as Euclidean distance are detected using a nearest neighbour search method such as LSH.
  • LSH nearest neighbour search method
  • the V buckets of the hashed values of the visual signature vector ⁇ v 1 ,v 2 , ...,vv> are searched and entries in the form:
  • VideoID, FrameID, ViewID, FeatureID, DepthSignature > are returned. These features may belong to different frames of different videos. Accordingly, to find the matching keyframes, the number of times that features of a frame are returned is counted, and the frames with the greatest count are considered a match. Then, keyframe level matching can be performed, similarly to the process for depth signatures as described above.
  • temporal characteristics are taken into account, and a temporal score is computed between the query video and each potential reference video. Finally, the videos which best match based on their temporal scores are considered as potential copies.
  • a final matching is computed for each potential video based on the depth and visual matching scores, using a weighted sum: Score — Wl ' S'ciept/i visual -
  • copied videos can be small clips of a reference video. It is therefore useful to automatically identify the location of a copied clip in the reference video.
  • the matching matrix shown in Figure 5 can be used to identify the location of the copied clip. Note that there will be two matching matrices: one from matching depth signatures and the other from matching visual signatures. Either one of them or both can be used.
  • the longest diagonal sequence with the greatest score in each case is determined. The start and end of this longest sequence provides the start and end locations of the clip in the reference video which has been copied in the query video.
  • FIG. 6 is a schematic block diagram of a copy detection system according to an example.
  • Apparatus 600 includes one or more processors, such as processor 601 , providing an execution platform for executing machine readable instructions such as software. Commands and data from the processor 601 are communicated over a communication bus 399.
  • the system 600 also includes a main memory 602, such as a Random Access Memory (RAM), where machine readable instructions may reside during runtime, and a secondary memory 605.
  • the secondary memory 605 includes, for example, a hard disk drive 607 and/or a removable storage drive 630, representing a floppy diskette drive, a magnetic tape drive, a compact disk drive, etc., or a non-volatile memory where a copy of machine readable instructions or software may be stored.
  • the secondary memory 605 may also include ROM (read only memory), EPROM (erasable, programmable ROM), EEPROM (electrically erasable, programmable ROM).
  • ROM read only memory
  • EPROM erasable, programmable ROM
  • EEPROM electrically erasable, programmable ROM
  • data representing any one or more of video data such as query or reference video data
  • depth information such as data representing a depth map for example, and data representing visual or depth signatures may be stored in the main memory 602 and/or the secondary memory 605.
  • the removable storage drive 630 reads from and/or writes to a removable storage unit 609 in a well-known manner.
  • a user interfaces with the system 600 with one or more input devices 61 1 , such as a keyboard, a mouse, a stylus, and the like in order to provide user input data.
  • the display adaptor 615 interfaces with the communication bus 399 and the display 617, and receives display data from the processor 601 and converts the display data into display commands for the display 617.
  • a network interface 619 is provided for communicating with other systems and devices via a network (not shown).
  • the system can include a wireless interface 621 for communicating with wireless devices in the wireless community.
  • the system 600 shown in figure 6 is provided as an example of a possible platform that may be used, and other types of platforms may be used as is known in the art.
  • One or more of the steps described above may be implemented as instructions embedded on a computer readable medium and executed on the system 600.
  • the steps may be embodied by a computer program, which may exist in a variety of forms both active and inactive. For example, they may exist as software program(s) comprised of program instructions in source code, object code, executable code or other formats for performing some of the steps.
  • any of the above may be embodied on a computer readable medium, which include storage devices and signals, in compressed or uncompressed form.
  • suitable computer readable storage devices include conventional computer system RAM (random access memory), ROM (read only memory), EPROM (erasable, programmable ROM), EEPROM (electrically erasable, programmable ROM), and magnetic or optical disks or tapes.
  • Examples of computer readable signals, whether modulated using a carrier or not, are signals that a computer system hosting or running a computer program may be configured to access, including signals downloaded through the Internet or other networks. Concrete examples of the foregoing include distribution of the programs on a CD ROM or via Internet download. In a sense, the Internet itself, as an abstract entity, is a computer readable medium. The same is true of computer networks in general. It is therefore to be understood that those functions enumerated above may be performed by any electronic device capable of executing the above-described functions.
  • data 603 representing a visual and/or depth signature can reside in memory 602.
  • An extraction engine 606 and a comparison engine 608 can be modules executed from memory 602 for example.
  • engines 606, 608 can be ASICs or similar which can be connected to bus 399.
  • a database 109 can reside on a HDD such as 605, or can be provided on a removable storage unit 609 for example.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé de détection du point de savoir si des données vidéo d'interrogation apparaissent ou non dans une vidéo de référence, lequel procédé consiste à déterminer une mesure de profondeur à partir d'une partie des données vidéo d'interrogation, à comparer la mesure par rapport à une mesure de profondeur pour la vidéo de référence pour réaliser une mise en correspondance de profondeurs et, si une correspondance est déterminée, à comparer une signature visuelle dérivée des données vidéo d'interrogation par rapport à une signature visuelle de la vidéo de référence pour réaliser une mise en correspondance visuelle pour déterminer une mesure représentant la probabilité que les données vidéo d'interrogation dérivent de la vidéo de référence.
PCT/EP2012/059988 2011-08-02 2012-05-29 Détection de copie WO2013017306A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP12724334A EP2569722A1 (fr) 2011-08-02 2012-05-29 Détection de copie

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB1113282.6A GB2493514B (en) 2011-08-02 2011-08-02 Copy detection
GB1113282.6 2011-08-02
US13/197,573 US8509600B2 (en) 2011-08-03 2011-08-03 Copy detection
US13/197,573 2011-08-03

Publications (1)

Publication Number Publication Date
WO2013017306A1 true WO2013017306A1 (fr) 2013-02-07

Family

ID=47628647

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2012/059988 WO2013017306A1 (fr) 2011-08-02 2012-05-29 Détection de copie

Country Status (2)

Country Link
EP (1) EP2569722A1 (fr)
WO (1) WO2013017306A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016000509A1 (fr) * 2014-06-30 2016-01-07 华为技术有限公司 Procédé de filtrage de données, et procédé et appareil de construction de filtre de données
CN111028283A (zh) * 2019-12-11 2020-04-17 北京迈格威科技有限公司 图像检测方法、装置、设备及可读存储介质
CN113239855A (zh) * 2021-05-27 2021-08-10 北京字节跳动网络技术有限公司 一种视频检测方法、装置、电子设备以及存储介质

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110122255A1 (en) * 2008-07-25 2011-05-26 Anvato, Inc. Method and apparatus for detecting near duplicate videos using perceptual video signatures

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3539788B2 (ja) * 1995-04-21 2004-07-07 パナソニック モバイルコミュニケーションズ株式会社 画像間対応付け方法
FR2895188A1 (fr) * 2005-12-16 2007-06-22 Thomson Licensing Sas Procede et dispositif de recalage temporel de documents multimedia
US8224157B2 (en) * 2009-03-30 2012-07-17 Electronics And Telecommunications Research Institute Method and apparatus for extracting spatio-temporal feature and detecting video copy based on the same in broadcasting communication system
GB2478156A (en) * 2010-02-26 2011-08-31 Sony Corp Method and apparatus for generating a disparity map for stereoscopic images

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110122255A1 (en) * 2008-07-25 2011-05-26 Anvato, Inc. Method and apparatus for detecting near duplicate videos using perceptual video signatures

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KUCUKTUNC O ET AL: "Video copy detection using multiple visual cues and MPEG-7 descriptors", JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, ACADEMIC PRESS, INC, US, vol. 21, no. 8, 1 November 2010 (2010-11-01), pages 838 - 849, XP027429939, ISSN: 1047-3203, [retrieved on 20100713], DOI: 10.1016/J.JVCIR.2010.07.001 *
NAGHMEH KHODABAKHSHI ET AL: "Copy Detection of 3D Videos", MMSYS'12, 24 February 2012 (2012-02-24), Chapel Hill, North Carolina, USA, pages 131 - 142, XP055034653, Retrieved from the Internet <URL:http://delivery.acm.org/10.1145/2160000/2155578/p131-khodabakhshi.pdf?ip=145.64.134.242&acc=ACTIVE SERVICE&CFID=135513402&CFTOKEN=52890846&__acm__=1343996770_4a39085f82bec6ca730e651008c56e9b> [retrieved on 20120803] *
RAMACHANDRA V ET AL: "3D Video Fingerprinting", 3DTV CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO, 2008, IEEE, PISCATAWAY, NJ, USA, 28 May 2008 (2008-05-28), pages 81 - 84, XP031275216, ISBN: 978-1-4244-1760-5 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016000509A1 (fr) * 2014-06-30 2016-01-07 华为技术有限公司 Procédé de filtrage de données, et procédé et appareil de construction de filtre de données
US9755616B2 (en) 2014-06-30 2017-09-05 Huawei Technologies Co., Ltd. Method and apparatus for data filtering, and method and apparatus for constructing data filter
CN111028283A (zh) * 2019-12-11 2020-04-17 北京迈格威科技有限公司 图像检测方法、装置、设备及可读存储介质
CN111028283B (zh) * 2019-12-11 2024-01-12 北京迈格威科技有限公司 图像检测方法、装置、设备及可读存储介质
CN113239855A (zh) * 2021-05-27 2021-08-10 北京字节跳动网络技术有限公司 一种视频检测方法、装置、电子设备以及存储介质
CN113239855B (zh) * 2021-05-27 2023-04-18 抖音视界有限公司 一种视频检测方法、装置、电子设备以及存储介质

Also Published As

Publication number Publication date
EP2569722A1 (fr) 2013-03-20

Similar Documents

Publication Publication Date Title
US8509600B2 (en) Copy detection
GB2493514A (en) Using a measure of depth to detect if video data derives from a reference video
Sattler et al. Large-scale location recognition and the geometric burstiness problem
Christlein et al. On rotation invariance in copy-move forgery detection
Gill et al. A review paper on digital image forgery detection techniques
JP5878238B2 (ja) 映像を比較するための方法および装置
WO2013104432A1 (fr) Détection de copies vidéo
CN109842811A (zh) 一种在视频中植入推送信息的方法、装置及电子设备
US20190311744A1 (en) Comparing frame data to generate a textless version of a multimedia production
Kim et al. Adaptive weighted fusion with new spatial and temporal fingerprints for improved video copy detection
CN114244538A (zh) 一种基于多攻击生成媒体内容感知哈希的数字水印方法
KR20120121424A (ko) 영상 검색 장치 및 방법
Pandey et al. Passive copy move forgery detection using SURF, HOG and SIFT features
Khodabakhshi et al. Spider: A system for finding 3D video copies
Mahmoudpour et al. Synthesized view quality assessment using feature matching and superpixel difference
Ren et al. ESRNet: Efficient search and recognition network for image manipulation detection
Mani et al. A survey on digital image forensics: Metadata and image forgeries
EP2569722A1 (fr) Détection de copie
CN113298871B (zh) 地图生成方法、定位方法及其系统、计算机可读存储介质
Buyssens et al. Depth-aware patch-based image disocclusion for virtual view synthesis
Joly New local descriptors based on dissociated dipoles
Chaitra et al. Digital image forgery: taxonomy, techniques, and tools–a comprehensive study
Gengembre et al. A probabilistic framework for fusing frame-based searches within a video copy detection system
CN112991419B (zh) 视差数据生成方法、装置、计算机设备及存储介质
Saleem A key-point based robust algorithm for detecting cloning forgery

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2012724334

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12724334

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE