WO2010087127A1 - 映像識別子生成装置 - Google Patents
映像識別子生成装置 Download PDFInfo
- Publication number
- WO2010087127A1 WO2010087127A1 PCT/JP2010/000283 JP2010000283W WO2010087127A1 WO 2010087127 A1 WO2010087127 A1 WO 2010087127A1 JP 2010000283 W JP2010000283 W JP 2010000283W WO 2010087127 A1 WO2010087127 A1 WO 2010087127A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- visual feature
- collation
- feature amount
- reliability
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/91—Television signal processing therefor
Definitions
- the present invention relates to an apparatus, a method, and a program for generating a video identifier for video search for detecting a similar or the same moving image section from a large number of moving images.
- Non-Patent Document 1 describes an example of a device that extracts and collates feature amounts from moving images.
- FIG. 9 is a block diagram showing the device described in Patent Document 1. In FIG.
- the block unit feature amount extraction unit 1000 extracts feature amounts in block units from the input first video, and outputs the first feature amount to the collation unit 1030.
- the block unit feature quantity extraction unit 1010 extracts the feature quantity in block units from the input second video, and outputs the second feature quantity to the collation unit 1030.
- the weighting coefficient calculation means 1020 calculates the weight value of each block based on the input learning video and outputs the weighting coefficient to the matching means 1030.
- the matching unit 1030 uses the weighting coefficient output from the weighting coefficient calculation unit 1020, and the first feature amount output from the block unit feature amount extraction unit 1000 and the first feature amount output from the block unit feature amount extraction unit 1010. 2 is collated with the feature quantity, and the collation result is output.
- the block unit feature quantity extraction unit 1000 divides each frame of the input first video into block units, and calculates a feature quantity for identifying the video from each block. Specifically, the edge type is determined for each block, and the type is calculated as the feature amount of each block. Then, for each frame, a feature vector consisting of the edge type of each block is constructed. This feature quantity vector is calculated for each frame, and the obtained feature quantity is output to the matching unit 1030 as the first feature quantity.
- the operation of the block unit feature quantity extraction unit 1010 is the same as that of the block unit feature quantity extraction unit 1000.
- the second feature quantity is calculated from the input second video, and the obtained second feature quantity is compared. To 1030.
- the weighting coefficient calculation means 1020 calculates the probability that a telop is inserted in each block in the frame using the learning video in advance. Based on the calculated probability, a weighting coefficient for each block is calculated. Specifically, in order to increase robustness against telop superimposition, the weighting coefficient is calculated such that the lower the probability that the telop is superimposed, the higher the weight. The obtained weighting coefficient is output to the matching unit 1030.
- the collating unit 1030 uses the weighting coefficient output from the weighting coefficient calculating unit 1020 and the first feature amount output from the block unit feature amount extracting unit 1000 and the second feature unit output from the block unit feature amount extracting unit 1010. Is compared with the feature quantity. Specifically, for each frame, the feature values of the blocks at the same position are compared, and if they are the same, the score for each block is calculated as 1 if not, otherwise 0. The obtained block unit scores are weighted using a weighting coefficient and totaled to calculate a frame matching score (similarity in frame units). This is performed for each frame, and a comparison result between the first video and the second video is calculated.
- the object of the present invention is to solve the problem that collation accuracy is lowered when there are video patterns that appear in many videos in common or video patterns whose feature values cannot be obtained stably. Another object is to provide a video identifier generation device.
- a video identifier generation device includes a visual feature amount extraction unit that extracts a visual feature amount used for identifying the video based on feature amounts of a plurality of partial region pairs in the video, and the visual feature A means for calculating the reliability of a quantity, wherein when the video is a specific video, a reliability calculation that calculates a lower reliability than when the video is a video other than the specific video Means.
- the present invention it is possible to prevent the collation accuracy from being lowered due to a video pattern that appears in common in many videos or a video pattern whose feature value cannot be obtained stably.
- 4 is a flowchart illustrating an operation of a common video pattern learning unit 250 in FIG. 3.
- 6 is a flowchart for explaining the operation of robustness-reduced video pattern learning means 350 in FIG. 4. It is a block diagram for demonstrating the technique relevant to this invention.
- a video identifier extraction apparatus which comprises a feature quantity extraction unit 130, a specific video pattern detection unit 110, and a reliability calculation unit 120.
- the feature amount extraction unit 130 extracts a feature amount from the input video and outputs a visual feature amount.
- the specific video pattern detection unit 110 detects a specific pattern from the input video and outputs a specific pattern detection result to the reliability calculation unit 120.
- the reliability calculation unit 120 calculates the reliability based on the specific pattern detection result output from the specific video pattern detection unit 110 and outputs the reliability information.
- the video identifier of the input video is composed of the visual feature amount output from the feature amount extraction unit 130 and the reliability information output from the reliability calculation unit 120.
- the visual feature quantity and the reliability information may be independent as long as the correspondence between them is clarified, or may be integrated as in an embodiment using multiplexing means described later.
- the video is input to the feature quantity extraction means 130.
- data is input in units of pictures after being decoded by a decoder.
- a picture is a unit constituting a screen, and usually consists of a frame or a field.
- the picture is not limited to these, and any picture may be used as long as it is a unit constituting the screen.
- the partial image which cut out a part of screen may be sufficient.
- the screen excluding it may be used as a picture.
- the black belt refers to black margin areas that are inserted at the top and bottom or the left and right of the screen by aspect conversion of 4: 3 and 16: 9, for example.
- the feature quantity extraction unit 130 calculates a feature quantity vector for each picture.
- a picture is regarded as one still image, and a vector of visual feature quantities indicating features such as colors, patterns, and shapes is extracted.
- a difference in feature amount between the regions is calculated for the local region pair associated with each dimension of the feature vector (for example, the pixel value in the region for each region of the region pair is calculated).
- An average value is obtained and the difference between the average values is calculated between regions), and a feature quantity vector having a quantized value obtained by quantizing the difference as a value of each dimension may be used.
- the feature quantity vector calculated for each picture is output as a visual feature quantity.
- the input video is also input to the specific video pattern detection means 110.
- an undesired image pattern for detecting an image is detected, and a specific pattern detection result is output.
- An undesired video pattern is a video pattern (scene) that happens to be almost the same even though it is originally a completely different video.
- a fade-out to a black frame frequently used in movies is a typical example.
- the video editing technique called fade-out is used for many completely different videos, but regardless of the content of the original video, after fading out, it becomes a black scene and there is no difference between the videos. In this way, it means a common video pattern that occurs between a number of completely different videos.
- Such a video pattern is a video pattern that causes a problem in identification regardless of the type of feature amount, regardless of the type of feature amount used.
- the feature amount is unstable and the robustness is lost.
- the robustness is lowered.
- the kind of image in which the robustness is reduced depends on the feature amount, but there is a video pattern in which the robustness specific to the feature amount is reduced regardless of the feature amount. For example, in the case of a color-related feature amount, the robustness is lowered when black and white is used. On the other hand, in the case of a feature amount representing a pattern, the robustness is lowered when a flat image is obtained.
- the specific video pattern detection means 110 detects such a specific video pattern that is not desirable for video identification.
- the detection method depends on the video pattern, for example, in the case of the above-described fade-out scene, it can be determined by using the average value of the luminance value of the entire image and a scale representing flatness. As a measure for representing flatness, for example, the dispersion of luminance values can be used. If this is sufficiently small and the average luminance value is not more than a certain threshold value and sufficiently close to black, it can be determined that the image is a black image after fade-out. .
- the fade-out may be determined based on measurement of a change in luminance value over time.
- the variance value and average value of luminance values in the screen are calculated for each picture in time series, and the variance gradually decreases toward 0 and changes with time so that the average value gradually decreases
- the black image fades out.
- the fade-out to the black image has been described above, the fade-out for other pixel values can be similarly detected. That is, the same applies to the variance, and the average value can be detected by checking whether or not it converges to a specific value.
- Detected specific pattern detection result may be a binary value indicating whether it has been detected. For example, it is only necessary to output 1 when it is detected and 0 when it cannot be detected. Alternatively, it may be a continuous value between 0 and 1 (or a level value representing the probability expressed in several levels) according to the probability (probability) when detected. This is output for each picture. Or you may come to output a detection result collectively for every fixed period.
- the specific pattern detection result is output to the reliability calculation means 120.
- the reliability calculation means 120 calculates and outputs the reliability for the feature quantity of each picture according to the specific pattern detection result output from the specific video pattern detection means 110. At this time, if the specific pattern detection result indicates no detection, the maximum reliability value is output (for example, the reliability is a value from 0 to 1, and the maximum reliability is 1). 1 is output if it corresponds to.
- the specific pattern detection result indicates that the detection or detection possibility is high, the reliability is lowered according to the ratio. That is, when it is detected, the reliability is set to the lowest level, and when it is determined that the possibility of detection is high, the reliability is lowered according to the degree. This is performed for each picture, and the obtained value is output as the reliability. Alternatively, the reliability may be obtained and output for each picture of a certain period.
- the visual feature amount output from the feature amount extraction unit 130 may be input to the specific image pattern detection unit 110 instead of the image (broken line in FIG. 1).
- the specific video pattern detection means 110 estimates the specific video pattern from the input feature quantity and detects the specific pattern. Specifically, a visual feature amount is extracted from a video defined as a specific video pattern, and a specific pattern is detected by determining similarity with the input visual feature amount. For example, in the case of the above-mentioned fade-out, the specific pattern detection result is calculated by detecting whether or not the luminance value is close to the value of the feature value corresponding to the case where the luminance value is constant throughout the screen.
- the average and variance of luminance values are used as the visual feature amount, it can be determined that the above-described black image fades out when the variance is sufficiently small and the average value is sufficiently small. In this way, the specific video pattern is obtained from the feature quantity itself, and the reliability can be calculated.
- an unfavorable video pattern for video identification is detected, and the reliability that lowers the reliability for the corresponding picture is generated together with the feature amount.
- the collation accuracy can be improved.
- a detection method suitable for each specific video pattern can be adopted, and the detection accuracy can be improved.
- FIG. 2 Next, a second embodiment of the present invention shown in FIG. 2 will be described with reference to the drawings.
- FIG. 2 there is shown a video identifier extraction apparatus according to a second embodiment of the present invention, which includes a feature quantity extraction unit 130, a specific video pattern detection unit 210, and a reliability calculation unit 120.
- the specific video pattern detection unit 210 detects a specific pattern from the video based on the input specific video pattern information, and outputs the specific pattern detection result to the reliability calculation unit 120.
- Video and specific video pattern information are input to the specific video pattern detection means 210.
- the specific video pattern information is information describing a video pattern that is not desirable for identification as described above, and may be, for example, the specific video itself.
- the specific video may be a single image representing the video, or a video section composed of a plurality of continuous images. Alternatively, a plurality of images obtained from the video section may be used.
- the specific video pattern information may be a visual feature amount necessary for detecting the specific video pattern. However, this visual feature amount may not necessarily be the same as the visual feature amount obtained by the feature amount extraction unit 130. For example, in the case of fading out to the black image described above, the average value and variance of the luminance values of the entire screen may be used as the feature amount.
- the specific video pattern detection means 210 detects the specific video pattern based on the similarity between the input video and the video described by the specific video pattern information. That is, when the specific video pattern information is an image itself, the visual feature amount is obtained from both the picture of the input video and the image input as the specific video pattern information, and the similarity is compared. To detect a specific pattern. At this time, as a criterion for similarity determination, a distance between feature amounts may be used, or similarity may be used. When the distance is small or the degree of similarity is large, the probability of detection is defined according to the degree, and the result is output as a specific pattern detection result.
- the specific video pattern information is a feature amount extracted from the image
- the same kind of feature amount is extracted from the input video and collated.
- the specific video pattern information is described by the feature amount of the edge histogram
- the edge histogram is calculated for each picture from the input video. The operation after the feature amount calculation is the same as when an image is input as the specific video pattern information.
- the input to the specific video pattern detection unit 210 may be a visual feature amount output from the feature amount extraction unit 130 instead of a video (broken line in FIG. 2).
- the specific video pattern detection unit 210 estimates the specific video pattern from the input feature quantity and detects the specific pattern.
- the specific video pattern information is the video itself, the feature quantity that can be collated with the feature quantity output by the feature quantity extraction unit 130 is extracted from the video and compared.
- the specific video pattern information is a visual feature amount, it must be a feature amount that can be collated with the feature amount output by the feature amount extraction unit 130.
- This method has a feature that it is not necessary to determine a detection method for each specific video pattern, and it is possible to deal with various patterns only by changing information given as specific video pattern information. For this reason, even after the device has already been created, it is possible to expand the video pattern that can be supported by simply replacing the specific video pattern information.
- FIG. 3 a third embodiment of the present invention shown in FIG. 3 will be described with reference to the drawings.
- FIG. 3 there is shown a video identifier extraction apparatus according to a third embodiment of the present invention, which includes a feature quantity extraction unit 130, a specific video pattern detection unit 210, a reliability calculation unit 120, and a common video. Pattern learning means 250. Compared with the case of FIG. 2, a common video pattern learning unit 250 is further added, and specific video pattern information as an output thereof is connected to the specific video pattern detection unit 210. Other than that, it is the same as the video identifier extraction device of FIG.
- the operations of the feature amount extraction unit 130, the specific video pattern detection unit 210, and the reliability calculation unit 120 are the same as those in the case of FIG.
- a video group for learning is input to the common video pattern learning means 250.
- the video input here is preferably a set of videos that are produced independently of each other and have no derivation relationship with each other. That is, it is desirable that the video has no relevance, such as editing one video and generating another video.
- the common video pattern learning unit 250 extracts video sections that coincide with each other by chance. Specifically, the feature amount of each video is calculated for each picture, and the distance (similarity) between them is calculated for many video pairs. As a result, when a video section that can be regarded as almost the same despite the independent video is found, the video section is extracted as specific video pattern information. As a result, the specific video pattern can be automatically extracted by learning instead of being manually determined.
- the specific video pattern information may be a feature amount extracted from the video, not the video itself. In this case, the feature amount of the extracted video pattern is calculated and output as specific video pattern information.
- FIG. 7 is a flowchart showing the operation of the common video pattern learning means 250.
- step S10 visual feature values are extracted from each of the input videos.
- the visual feature amount extraction method at this time is not necessarily the same as the method used by the feature amount extraction means 130.
- step S20 the extracted visual feature values are collated. As a result, a collation result between any two pairs of learning videos to be input is obtained.
- step S30 a video section having a high similarity (or a short distance) is extracted from the collation results.
- step S40 the extracted video section information is output as specific video pattern information.
- the specific video pattern information output in this way is input to the specific video pattern detection means 210.
- the third embodiment it is possible to automatically extract undesirable video patterns from a large number of videos, in particular, common video patterns generated between a number of completely different videos.
- FIG. 4 there is shown a video identifier extraction apparatus according to a fourth embodiment of the present invention, which is characterized by a feature quantity extraction means 130, a specific video pattern detection means 210, a reliability calculation means 120, and robustness. It comprises a reduced image pattern learning means 350. Compared to the case of FIG. 3, the difference is that the robustness-reduced video pattern learning means 350 is used instead of the common video pattern learning means 250. Other than that, it is the same as the video identifier extraction device of FIG.
- the operations of the feature amount extraction unit 130, the specific video pattern detection unit 210, and the reliability calculation unit 120 are the same as those in the case of FIG.
- the learning video group is input to the robustness-reduced video pattern learning means 350.
- This learning video group is used to learn a video pattern in which the visual feature amount used by the feature amount extraction unit 130 is not very robust.
- the robustness-reduced video pattern learning unit 350 extracts visual feature amounts from the video by the same feature amount extraction method as the feature amount extraction unit 130.
- various modification processes encoding process, noise addition, telop superposition, etc.
- the feature amount is similarly extracted.
- the visual feature values are compared before and after the modification process to check how much the feature values have changed. Specifically, the distance or the similarity is calculated between the feature quantities before and after the modification process.
- a video having a small similarity or a large distance value is found, it is extracted as specific video pattern information.
- a threshold value is processed for the similarity or distance value, and a case where the similarity is smaller than a certain threshold value or a case where the distance value is larger than a certain threshold value may be extracted.
- the specific video pattern can be automatically extracted by learning instead of being manually determined.
- the specific video pattern information may be a feature amount extracted from the video, not the video itself. In this case, the feature amount of the extracted video pattern is calculated and output as specific video pattern information.
- FIG. 8 is a flowchart showing the operation of the robustness-reduced video pattern learning means 350.
- step S50 a modified video is generated.
- various modification processes assumed in advance are performed on the input image to generate a modified image. This process may be performed before step S70 and may be performed after step S60 described below.
- step S60 the visual feature amount is extracted from the video before modification.
- This feature quantity extraction method is the same as that used by the feature quantity extraction means 130. Thereby, a visual feature amount is calculated for each video before modification.
- step S70 visual feature values are extracted from the modified video. This performs visual feature extraction for each of the modified videos generated in step S50.
- This feature quantity extraction method is the same as that used by the feature quantity extraction means 130. Thereby, a visual feature amount is calculated for each video after modification.
- step S80 the visual feature values before and after the modification are collated. This collates visual feature values between the corresponding feature values before and after modification. In this case, collation is performed by associating the unmodified picture with the modified picture. Then, the collation result is output to each picture or each video section formed by bundling a plurality of pictures in time series.
- step S90 a video section having a large distance between feature amounts or a small similarity is extracted from the collation result.
- step S100 specific video pattern information is generated and output from the video of the extracted video section.
- the specific video pattern information output in this way is input to the specific video pattern detection means 210.
- the fourth embodiment as in the case of the third embodiment, it is possible to automatically extract undesirable video patterns from a large number of videos.
- FIG. 5 there is shown an embodiment of a video identifier collation device that collates video identifiers generated by the video identifier extraction device shown in FIGS. 1 to 4, and includes collation parameter calculation means 410 and collation means 400. It consists of.
- the collation parameter calculation means 410 obtains a collation parameter from the first reliability information and the second reliability information, and outputs it to the collation means 400.
- the matching unit 400 uses the matching parameter output from the matching parameter calculation unit 410 to match the first visual feature quantity and the second visual feature quantity, and outputs a matching result.
- the first visual feature quantity and the first reliability information constitute a video identifier of the first video
- the second visual feature quantity and the second reliability information are the second video. Video identifier.
- the matching parameter calculation means 410 calculates a matching parameter used for matching between the sections of the video 1 and the video 2 from the first reliability information and the second reliability information. For example, from the first reliability information and the second reliability information, a weighting coefficient for performing matching for each picture is calculated as a matching parameter.
- the weighting coefficient w (k 1 , k 2 ) when collating between these pictures can be calculated by [Equation 1].
- [Formula 1] w (k 1 , k 2 ) min (r 1 (k 1 ), r 2 (k 2 ))
- the collating unit 400 collates the first visual feature quantity and the second visual feature quantity.
- the comparison may be made based on the degree of similarity representing the similarity between both feature amounts, or the distance representing the degree of difference between both feature amounts.
- it compares based on the distance d calculated by [Formula 2].
- N is the number of dimensions of the feature quantity
- v 1 (i) and v 2 (i) represent the i-th dimension value of the first and second feature quantities, respectively.
- This comparison is performed on a picture-by-picture basis, and a predetermined interval between the first video and the second video is verified.
- the above-described weighting coefficient w (k 1 , k 2 ) is used. For example, when collating video segments with a value obtained by averaging distance values obtained by comparison in units of pictures within the video segment, when calculating the average value, the k 1 th picture and the second video of the first video are calculated.
- the distance value d (k 1 , k 2 ) obtained by comparing the k 2 th picture of the video of is weighted with a weighting coefficient w (k 1 , k 2 ). That is, when a section consisting of a K picture starting from the t 1st picture of the video 1 and a section consisting of a K picture starting from the t 2nd picture of the video 2 are collated, the distance value is calculated by [Equation 3]. calculate. [Formula 3]
- this value is larger than the threshold, it is determined that the section does not match, and if it is less than the threshold, it is determined that the section matches.
- the number of picture pairs whose distance value is within a threshold is obtained by comparison in units of pictures, and when the value is sufficiently larger than the number of pictures included in the section, it is determined that they are in the same section. If not, it is determined that they are not in the same section. In this case as well, determination can be made with weights similarly. That is, it is also possible to determine by [Equation 4]. [Formula 4]
- U (x) is a unit step function that is 1 when x ⁇ 0, and 0 when x ⁇ 0
- Th is a threshold of distance between feature quantities between pictures (that is, the distance is equal to or less than Th) Is determined to be the same at the same time, and otherwise is determined not to be the same).
- Non-Patent Document 2 As a comparison method between sections of any length, the verification method described in Non-Patent Document 2 can also be used. As shown in FIG. 6, a collation window of length L picture is provided for collation between videos, and these are slid between the first video and the second video to compare them. If the sections in the matching window are determined to be the same section, the matching window is extended by p pictures from there and the matching process is continued. As long as it is determined as the same section, the process of extending the window by p pictures is repeated to obtain the same section with the maximum length. By doing in this way, the same maximum length section can be obtained efficiently.
- Sim (x, y) is a function representing the proximity of x and y, and becomes larger as the values of x and y are closer. For example, if the distance between x and y is d (x, y), a function such as [Equation 6] can be used. [Formula 6]
- Sim (x, y) may be a function that is 1 only when x and y match, and 0 otherwise, like Kronecker delta. Or when using the angle (cosine value) between feature vectors as similarity, it compares based on similarity S calculated by [Formula 7]. [Formula 7]
- the collation parameter output from the collation parameter calculation unit 410 may be a parameter that determines whether or not the collation result of the corresponding picture is ignored. If one of the pictures at the time of matching is low in reliability, the matching result between pictures is not very reliable. In such a case, it is conceivable to collate the video section while ignoring the collation result of the picture. For example, when collating video 1 and video 2, if the reliability of the 5th to 9th pictures in video 1 is low, the inter-picture verification result for the 5th to 9th pictures in video 1 is ignored. The video section of video 1 and video 2 is collated.
- the collation parameter output from the collation parameter calculation unit 410 may be a parameter describing the number of times that pictures are determined to be different by collation between pictures.
- modification processing such as analog capture, not all pictures are captured accurately, and some pictures may fall.
- the collation cannot be performed well due to a dropped picture even though the images are the same.
- the number of times that picture collation may fail is determined, and if the number is less than that number, the collation is continued as it is (that is, only when the collation failure exceeds that number). It is possible to successfully collate continuous sections by determining that the collation is not performed.
- N th The number of collation failures between pictures allowed at this time (this is N th ) is controlled by the reliability. For example, in a section with low reliability, the value of N th is incremented according to the number of pictures with low reliability. In this way, even if a picture with low reliability continues, it can be collated as a continuous section.
- the specific video pattern detection means may detect the specific video pattern from both the input video and the visual feature amount extracted from the input video.
- the video identifier generation device of the present invention inputs the visual feature amount output from the feature amount extraction unit 130 and the reliability information output from the reliability calculation unit 120, and sets the video identifier.
- Multiplexing means 140 for outputting may be provided.
- the multiplexing unit 140 generates and outputs a video identifier by combining the visual feature amount output from the feature amount extraction unit 130 and the reliability information output from the reliability calculation unit 120.
- the video identifier is generated by multiplexing the two in a form that can be separated at the time of collation.
- visual feature quantities and reliability information may be interleaved and multiplexed for each picture, or only reliability information is multiplexed together first, and then visual feature quantities are multiplexed ( Or vice versa).
- the reliability information and the visual feature amount may be multiplexed for each fixed interval (for example, for each time interval unit for calculating reliability information).
- the video identifier collation apparatus of the present invention receives the video identifiers of two images to be collated, and outputs demultiplexing means for outputting visual feature quantities and reliability information constituting the video identifiers.
- 420 and 430 may be provided.
- the demultiplexing means 420 separates the first visual feature quantity and the first reliability information from the input first video identifier, and outputs them to the matching means 400 and the matching parameter calculation means 410, respectively.
- the demultiplexing unit 430 separates the second visual feature value and the second reliability information from the input second video identifier, and outputs them to the matching unit 400 and the matching parameter calculation unit 410, respectively.
- the video identifier extraction device and the video identifier collation device of the present invention can be realized by a computer and a program, as well as by realizing the functions of the device.
- the program is provided by being recorded on a computer-readable recording medium such as a magnetic disk or a semiconductor memory, and is read by the computer at the time of starting up the computer, etc.
- the video identifier extracting device and the video identifier collating device in the form of
- the present invention can be applied to a use such as searching for a similar or identical video from many videos with high accuracy.
- searching for the same section of video can be used for applications such as identifying illegally copied moving images distributed on the network, or identifying CMs that are being broadcast on actual broadcast waves.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Collating Specific Patterns (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
Description
本発明の目的は、多くの映像中に共通に現れる映像パタンや、特徴量が安定的に求まらない映像パタンなどが映像中に存在していると照合精度が低下する、という課題を解決する映像識別子生成装置を提供することにある。
[式1]
w(k1,k2) = min(r1(k1), r2(k2))
[式2]
[式3]
[式4]
[式7]
120…信頼度算出手段
130…特徴量抽出手段
140…多重化手段
210…特定映像パタン検出手段
250…共通映像パタン学習手段
350…頑健性低下映像パタン学習手段
400…照合手段
410…照合パラメータ算出手段
420、430…多重分離手段
Claims (38)
- 映像中の、複数の部分領域対の特徴量に基づいて前記映像の識別に用いる視覚特徴量を抽出する視覚特徴量抽出手段と、
前記視覚特徴量の信頼度を算出する手段であって、前記映像が特定の映像である場合には、前記特定の映像以外の映像である場合に比べて、より値の小さな信頼度を算出する信頼度算出手段と
を備えることを特徴とする映像識別子生成装置。 - 前記信頼度は、前記視覚特徴量を用いて前記映像を他の映像と照合する際の照合結果の確からしさを表す値である
ことを特徴とする請求項1に記載の映像識別子生成装置。 - 前記視覚特徴量抽出手段は、前記映像中の複数の部分領域対の、対をなす2つの部分領域の特徴量の差分値に基づいて前記視覚特徴量を抽出する
ことを特徴とする請求項1または2に記載の映像識別子生成装置。 - 前記部分領域の特徴量は、前記部分領域の平均画素値である
ことを特徴とする請求項3に記載の映像識別子生成装置。 - 前記特定の映像は、平坦な画素値を有する映像である
ことを特徴とする請求項1乃至4の何れか1項に記載の映像識別子生成装置。 - 前記特定の映像は、映像全体の輝度値の分散が小さい映像である
ことを特徴とする請求項1乃至5の何れか1項に記載の映像識別子生成装置。 - 前記特定の映像は、輝度値が画面全体でほぼ一定になる映像である
ことを特徴とする請求項1乃至6の何れか1項に記載の映像識別子生成装置。 - 前記視覚特徴量の抽出と前記信頼度の算出は、ピクチャ単位で行われる
ことを特徴とする請求項1乃至7の何れか1項に記載の映像識別子生成装置。 - 前記ピクチャは、フレームである
ことを特徴とする請求項8に記載の映像識別子生成装置。 - 前記視覚特徴量抽出手段は、前記差分値を量子化して、前記視覚特徴量を算出する
ことを特徴とする請求項3乃至9の何れか1項に記載の映像識別子生成装置。 - 前記視覚特徴量と前記信頼度とをまとめて映像識別子として出力する多重化手段を備える
ことを特徴とする請求項1乃至10の何れか1項に記載の映像識別子生成装置。 - 第1の映像中の複数の部分領域対の特徴量から算出された、映像の識別に用いる第1の視覚特徴量と、前記第1の映像が特定の映像である場合には前記特定の映像以外の映像である場合に比べて値が小さくなるように算出された、前記第1の視覚特徴量の信頼度を示す第1の信頼度情報と、第2の映像中の複数の部分領域対の特徴量から算出された、前記第2の映像の識別に用いる第2の視覚特徴量と、前記第2の映像が特定の映像である場合には前記特定の映像以外の映像である場合に比べて、より値が小さくなるように算出された、前記第2の視覚特徴量の信頼度を示す第2の信頼度情報とを用い、
前記第1の信頼度情報と前記第2の信頼度情報とに基づいて、照合パラメータを算出する照合パラメータ算出手段と、
前記第1の視覚特徴量と前記第2の視覚特徴量とを、前記照合パラメータに従って照合し、照合結果を出力する照合手段と
を備えることを特徴とする映像識別子照合装置。 - 前記第1の視覚特徴量は、前記第1の映像中の複数の部分領域対の、対をなす2つの部分領域の特徴量の差分値から算出され、前記第2の視覚特徴量は、前記第2の映像中の複数の部分領域対の、対をなす2つの部分領域の特徴量の差分値から算出されている
ことを特徴とする請求項12に記載の映像識別子照合装置。 - 前記照合パラメータは、前記第1の信頼度と前記第2の信頼度とのうちの小さいほうの値によって定まる
ことを特徴とする請求項12または13に記載の映像識別子照合装置。 - 前記照合パラメータ算出手段は、前記第1の視覚特徴量と前記第2の視覚特徴量との間の距離または類似度を算出する際の重みを表す値を前記照合パラメータとして算出し、
前記照合手段は、前記照合パラメータによって定まる重みを用いて前記第1の視覚特徴量と前記第2の視覚特徴量との距離または類似度を算出して照合結果を求める
ことを特徴とする請求項12乃至14の何れか1項に記載の映像識別子照合装置。 - 前記照合パラメータ算出手段は、前記第1の視覚特徴量と前記第2の視覚特徴量の何れか一方の前記信頼度が低い場合には、前記照合パラメータとして特定のパラメータを出力し、
前記照合手段は、前記照合パラメータが前記特定のパラメータであるときに、前記第1の視覚特徴量と前記第2の視覚特徴量との間の距離または類似度を除いて照合結果を算出する
ことを特徴とする請求項12乃至15の何れか1項に記載の映像識別子照合装置。 - 前記照合パラメータ算出手段は、前記照合パラメータとして、前記第1の視覚特徴量と前記第2の視覚特徴量との間の照合をピクチャ単位で行う際の、ピクチャ単位の照合の失敗の回数の許容値を規定するパラメータを出力し、
前記照合手段は、ピクチャ単位の照合の失敗の回数が前記許容値以内の場合には照合を継続し、照合結果を算出する
ことを特徴とする請求項12乃至16の何れか1項に記載の映像識別子照合装置。 - 請求項1乃至11の何れかに記載の映像識別子生成装置から生成された映像識別子を用いて照合を行うことを特徴とする照合装置。
- 映像中の、複数の部分領域対の特徴量に基づいて前記映像の識別に用いる視覚特徴量を抽出し、
前記視覚特徴量の信頼度として、前記映像が特定の映像である場合には、前記特定の映像以外の映像である場合に比べて、より値の小さな信頼度を算出する
ことを特徴とする映像識別子生成方法。 - 前記信頼度は、前記視覚特徴量を用いて前記映像を他の映像と照合する際の照合結果の確からしさを表す値である
ことを特徴とする請求項19に記載の映像識別子生成方法。 - 前記映像中の複数の部分領域対の、対をなす2つの部分領域の特徴量の差分値に基づいて前記視覚特徴量を抽出する
ことを特徴とする請求項19または20に記載の映像識別子生成方法。 - 前記部分領域の特徴量は、前記部分領域の平均画素値である
ことを特徴とする請求項21に記載の映像識別子生成方法。 - 前記特定の映像は、平坦な画素値を有する映像である
ことを特徴とする請求項19乃至22の何れか1項に記載の映像識別子生成方法。 - 前記特定の映像は、映像全体の輝度値の分散が小さい映像である
ことを特徴とする請求項19乃至23の何れか1項に記載の映像識別子生成方法。 - 前記特定の映像は、輝度値が画面全体でほぼ一定になる映像である
ことを特徴とする請求項19乃至24の何れか1項に記載の映像識別子生成方法。 - 前記視覚特徴量の抽出と前記信頼度の算出は、ピクチャ単位で行われる
ことを特徴とする請求項19乃至25の何れか1項に記載の映像識別子生成方法。 - 前記ピクチャは、フレームである
ことを特徴とする請求項26に記載の映像識別子生成方法。 - 前記差分値を量子化して、前記視覚特徴量を算出する
ことを特徴とする請求項21乃至27の何れか1項に記載の映像識別子生成方法。 - 前記視覚特徴量と前記信頼度とをまとめて映像識別子として出力する
ことを特徴とする請求項19乃至28の何れか1項に記載の映像識別子生成方法。 - 第1の映像中の複数の部分領域対の特徴量から算出された、映像の識別に用いる第1の視覚特徴量と、前記第1の映像が特定の映像である場合には前記特定の映像以外の映像である場合に比べて値が小さくなるように算出された、前記第1の視覚特徴量の信頼度を示す第1の信頼度情報と、第2の映像中の複数の部分領域対の特徴量から算出された、前記第2の映像の識別に用いる第2の視覚特徴量と、前記第2の映像が特定の映像である場合には前記特定の映像以外の映像である場合に比べて、より値が小さくなるように算出された、前記第2の視覚特徴量の信頼度を示す第2の信頼度情報とを用い、
前記第1の信頼度情報と前記第2の信頼度情報とに基づいて、照合パラメータを算出し、
前記第1の視覚特徴量と前記第2の視覚特徴量とを、前記照合パラメータに従って照合し、照合結果を出力する
ことを特徴とする映像識別子照合方法。 - 前記第1の視覚特徴量は、前記第1の映像中の複数の部分領域対の、対をなす2つの部分領域の特徴量の差分値から算出され、前記第2の視覚特徴量は、前記第2の映像中の複数の部分領域対の、対をなす2つの部分領域の特徴量の差分値から算出されている
ことを特徴とする請求項30に記載の映像識別子照合方法。 - 前記照合パラメータは、前記第1の信頼度と前記第2の信頼度とのうちの小さいほうの値によって定まる
ことを特徴とする請求項30または31に記載の映像識別子照合方法。 - 前記第1の視覚特徴量と前記第2の視覚特徴量との間の距離または類似度を算出する際の重みを表す値を前記照合パラメータとして算出し、
前記照合パラメータによって定まる重みを用いて前記第1の視覚特徴量と前記第2の視覚特徴量との距離または類似度を算出して照合結果を求める
ことを特徴とする請求項30乃至32の何れか1項に記載の映像識別子照合方法。 - 前記第1の視覚特徴量と前記第2の視覚特徴量の何れか一方の前記信頼度が低い場合には、前記照合パラメータとして特定のパラメータを出力し、
前記照合パラメータが前記特定のパラメータであるときに、前記第1の視覚特徴量と前記第2の視覚特徴量との間の距離または類似度を除いて照合結果を算出する
ことを特徴とする請求項30乃至33の何れか1項に記載の映像識別子照合方法。 - 前記照合パラメータとして、前記第1の視覚特徴量と前記第2の視覚特徴量との間の照合をピクチャ単位で行う際の、ピクチャ単位の照合の失敗の回数の許容値を規定するパラメータを出力し、
ピクチャ単位の照合の失敗の回数が前記許容値以内の場合には照合を継続し、照合結果を算出する
ことを特徴とする請求項30乃至34の何れか1項に記載の映像識別子照合方法。 - 請求項19乃至29の何れかに記載の映像識別子生成方法によって生成された映像識別子を用いて照合を行うことを特徴とする照合方法。
- コンピュータを、
映像中の、複数の部分領域対の特徴量に基づいて前記映像の識別に用いる視覚特徴量を抽出する視覚特徴量抽出手段と、
前記視覚特徴量の信頼度を算出する手段であって、前記映像が特定の映像である場合には、前記特定の映像以外の映像である場合に比べて、より値の小さな信頼度を算出する信頼度算出手段と
して機能させるためのプログラム。 - コンピュータを、
第1の映像中の複数の部分領域対の特徴量から算出された、映像の識別に用いる第1の視覚特徴量と、前記第1の映像が特定の映像である場合には前記特定の映像以外の映像である場合に比べて値が小さくなるように算出された、前記第1の視覚特徴量の信頼度を示す第1の信頼度情報と、第2の映像中の複数の部分領域対の特徴量から算出された、前記第2の映像の識別に用いる第2の視覚特徴量と、前記第2の映像が特定の映像である場合には前記特定の映像以外の映像である場合に比べて、より値が小さくなるように算出された、前記第2の視覚特徴量の信頼度を示す第2の信頼度情報とを用い、
前記第1の信頼度情報と前記第2の信頼度情報とに基づいて、照合パラメータを算出する照合パラメータ算出手段と、
前記第1の視覚特徴量と前記第2の視覚特徴量とを、前記照合パラメータに従って照合し、照合結果を出力する照合手段と
して機能させるためのプログラム。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP11189664.3A EP2423839B1 (en) | 2009-01-29 | 2010-01-20 | Video identifier creation device |
KR1020117017640A KR101290023B1 (ko) | 2009-01-29 | 2010-01-20 | 영상 시그니처 생성 디바이스 |
US13/145,076 US20110285904A1 (en) | 2009-01-29 | 2010-01-20 | Video signature generation device |
JP2010548399A JP4883227B2 (ja) | 2009-01-29 | 2010-01-20 | 映像識別子生成装置 |
EP10735599.2A EP2393290B1 (en) | 2009-01-29 | 2010-01-20 | Video identifier creation device |
CN201080005606.4A CN102301697B (zh) | 2009-01-29 | 2010-01-20 | 视频签名产生设备 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009017808 | 2009-01-29 | ||
JP2009-017808 | 2009-01-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010087127A1 true WO2010087127A1 (ja) | 2010-08-05 |
Family
ID=42395393
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/000283 WO2010087127A1 (ja) | 2009-01-29 | 2010-01-20 | 映像識別子生成装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20110285904A1 (ja) |
EP (2) | EP2423839B1 (ja) |
JP (2) | JP4883227B2 (ja) |
KR (1) | KR101290023B1 (ja) |
CN (1) | CN102301697B (ja) |
WO (1) | WO2010087127A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2720457A1 (en) * | 2011-06-13 | 2014-04-16 | NEC Corporation | Video processing system, video processing method, method of creating video processing database, video processing database, video processing apparatus, and control method and control program therefor |
WO2023120244A1 (ja) * | 2021-12-24 | 2023-06-29 | ソニーグループ株式会社 | 伝送装置、伝送方法、およびプログラム |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140152891A1 (en) * | 2012-12-05 | 2014-06-05 | Silicon Image, Inc. | Method and Apparatus for Reducing Digital Video Image Data |
CN104683815B (zh) * | 2014-11-19 | 2017-12-15 | 西安交通大学 | 一种基于内容的h.264压缩域视频检索方法 |
WO2017075493A1 (en) * | 2015-10-28 | 2017-05-04 | Ustudio, Inc. | Video frame difference engine |
KR101672224B1 (ko) | 2015-11-02 | 2016-11-03 | 한국지질자원연구원 | 탄산염 제조 및 이산화탄소의 저감을 위한 해수담수화 시스템 |
US11227160B2 (en) * | 2019-11-15 | 2022-01-18 | International Business Machines Corporation | Detecting scene transitions in video footage |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007128262A (ja) * | 2005-11-02 | 2007-05-24 | Omron Corp | 顔照合装置 |
JP2008310775A (ja) * | 2007-06-18 | 2008-12-25 | Canon Inc | 表情認識装置及び方法、並びに撮像装置 |
JP2009075868A (ja) * | 2007-09-20 | 2009-04-09 | Toshiba Corp | 画像から対象を検出する装置、方法およびプログラム |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6215898B1 (en) * | 1997-04-15 | 2001-04-10 | Interval Research Corporation | Data processing system and method |
US6501794B1 (en) * | 2000-05-22 | 2002-12-31 | Microsoft Corporate | System and related methods for analyzing compressed media content |
WO2002065782A1 (en) * | 2001-02-12 | 2002-08-22 | Koninklijke Philips Electronics N.V. | Generating and matching hashes of multimedia content |
JP2003134535A (ja) * | 2001-10-30 | 2003-05-09 | Nec Eng Ltd | 画質劣化検知システム |
JP2004208117A (ja) * | 2002-12-26 | 2004-07-22 | Nec Engineering Ltd | 字幕合成時における画質劣化検知システム |
JP4349160B2 (ja) * | 2004-03-05 | 2009-10-21 | 日本電気株式会社 | 画像類似度算出システム、画像検索システム、画像類似度算出方法および画像類似度算出用プログラム |
CN101473657A (zh) * | 2006-06-20 | 2009-07-01 | 皇家飞利浦电子股份有限公司 | 产生视频信号的指纹 |
JP2009017808A (ja) | 2007-07-11 | 2009-01-29 | Nippon Flour Mills Co Ltd | 製パン用小麦粉組成物及び製パン用穀粉組成物並びにこれらを使用したパン |
JP5034733B2 (ja) * | 2007-07-13 | 2012-09-26 | カシオ計算機株式会社 | 特徴点追跡装置及びプログラム |
-
2010
- 2010-01-20 KR KR1020117017640A patent/KR101290023B1/ko active IP Right Grant
- 2010-01-20 EP EP11189664.3A patent/EP2423839B1/en active Active
- 2010-01-20 WO PCT/JP2010/000283 patent/WO2010087127A1/ja active Application Filing
- 2010-01-20 US US13/145,076 patent/US20110285904A1/en not_active Abandoned
- 2010-01-20 EP EP10735599.2A patent/EP2393290B1/en active Active
- 2010-01-20 JP JP2010548399A patent/JP4883227B2/ja active Active
- 2010-01-20 CN CN201080005606.4A patent/CN102301697B/zh active Active
-
2011
- 2011-11-30 JP JP2011262739A patent/JP2012109979A/ja active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007128262A (ja) * | 2005-11-02 | 2007-05-24 | Omron Corp | 顔照合装置 |
JP2008310775A (ja) * | 2007-06-18 | 2008-12-25 | Canon Inc | 表情認識装置及び方法、並びに撮像装置 |
JP2009075868A (ja) * | 2007-09-20 | 2009-04-09 | Toshiba Corp | 画像から対象を検出する装置、方法およびプログラム |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2720457A1 (en) * | 2011-06-13 | 2014-04-16 | NEC Corporation | Video processing system, video processing method, method of creating video processing database, video processing database, video processing apparatus, and control method and control program therefor |
EP2720457A4 (en) * | 2011-06-13 | 2015-03-25 | Nec Corp | VIDEO PROCESSING SYSTEM, VIDEO PROCESSING METHOD, VIDEO PROCESSING DATABASE CREATING METHOD, VIDEO PROCESSING DATA BASE, VIDEO PROCESSING APPARATUS, AND CONTROL METHOD AND CONTROL PROGRAM THEREOF |
WO2023120244A1 (ja) * | 2021-12-24 | 2023-06-29 | ソニーグループ株式会社 | 伝送装置、伝送方法、およびプログラム |
Also Published As
Publication number | Publication date |
---|---|
US20110285904A1 (en) | 2011-11-24 |
CN102301697B (zh) | 2015-07-01 |
EP2423839A2 (en) | 2012-02-29 |
EP2393290B1 (en) | 2021-02-24 |
EP2393290A1 (en) | 2011-12-07 |
JPWO2010087127A1 (ja) | 2012-08-02 |
KR101290023B1 (ko) | 2013-07-30 |
EP2423839B1 (en) | 2021-02-24 |
EP2393290A4 (en) | 2012-12-05 |
KR20110110252A (ko) | 2011-10-06 |
JP4883227B2 (ja) | 2012-02-22 |
CN102301697A (zh) | 2011-12-28 |
EP2423839A3 (en) | 2012-12-05 |
JP2012109979A (ja) | 2012-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5573131B2 (ja) | 映像識別子抽出装置および方法、映像識別子照合装置および方法、ならびにプログラム | |
US8335251B2 (en) | Video signature extraction device | |
JP4883227B2 (ja) | 映像識別子生成装置 | |
Galvan et al. | First quantization matrix estimation from double compressed JPEG images | |
US8169497B2 (en) | Method of segmenting videos into a hierarchy of segments | |
US8995708B2 (en) | Apparatus and method for robust low-complexity video fingerprinting | |
JP5644505B2 (ja) | 照合加重情報抽出装置 | |
Hoad et al. | Video similarity detection for digital rights management | |
KR100683501B1 (ko) | 신경망 기법을 이용한 뉴스 비디오의 앵커 화면 추출 장치및 그 방법 | |
Chao | Introduction to video fingerprinting | |
Li et al. | Efficient shot boundary detection based on scale invariant features | |
Jang et al. | The Original Similarity Extraction Mechanism for Digital Content Copyright Protection in UCC Service Environment | |
US20100201889A1 (en) | Detection of wipe transitions in video | |
KR101470191B1 (ko) | 지역적 극대 필터 및 지역적 극소 필터를 이용한 비디오 내의 블록 오류 고속 검출 방법 및 장치 | |
Stiegler et al. | First version of algorithms for content analysis and automatic content pre-selection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080005606.4 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10735599 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
ENP | Entry into the national phase |
Ref document number: 2010548399 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13145076 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010735599 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20117017640 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |