CN110619362A

CN110619362A - Video content comparison method and device based on perception and aberration

Info

Publication number: CN110619362A
Application number: CN201910874788.1A
Authority: CN
Inventors: 姜卫平; 李国华; 郭忠武; 王荣芳; 纪军; 韩煜
Original assignee: Bo Hui Science And Technology Co Ltd Of Beijing
Current assignee: Bo Hui Science And Technology Co Ltd Of Beijing
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2019-12-27
Anticipated expiration: 2039-09-17
Also published as: CN110619362B

Abstract

The invention discloses a video content comparison method and device based on perception and disparity. The method includes: acquiring a source video image sequence and a target video image sequence; extracting perceptual hash features and aberration features; Determine whether the aligned source video image sequence and the target video image sequence are similarly matched; if the aligned source video image sequence is similarly matched to the target video image sequence, according to the image sequence of the source video image The difference feature and the aberration feature of the target video image sequence are used to determine whether the similarly matched source video image sequence and the target video image sequence are accurately matched; output a comparison result generated based on the judgment result. Compared with the prior art, this embodiment adopts a step-by-step matching judgment method for the overall image in combination with perceptual hashing and aberration algorithms, which can simultaneously meet the requirements of low computational complexity and high accuracy for video content comparison.

Description

A method and device for comparing video content based on perception and disparity

技术领域technical field

本发明涉及图像识别技术领域，特别涉及基于感知与像差的视频内容比对方法及装置。The invention relates to the technical field of image recognition, in particular to a method and device for comparing video content based on perception and aberration.

背景技术Background technique

电视中的视频内容通常通过广播电视信号进行传输。在广播电视信号的传输过程中，视频内容可能会发生非法篡改、节目错漏播、信号传输故障等影响视频内容质量的情况。因此对视频内容进行监测对减少视频内容错误尤为重要。对不同环节的视频信号进行内容对比，是一种直接有效的监测技术手段；另外，在视频检索、视频广告监测等业务场景中，通过将样本视频片段和目标视频进行内容比对，从而检索出样本视频片段是否出现以及出现的位置、时长，也是视频内容比对技术的另一种重要应用。Video content on television is usually transmitted via broadcast television signals. During the transmission of radio and television signals, video content may be illegally tampered with, programs are wrongly broadcast, and signal transmission failures may affect the quality of video content. Therefore, it is particularly important to monitor video content to reduce video content errors. Comparing the content of video signals in different links is a direct and effective means of monitoring technology; in addition, in business scenarios such as video retrieval and video advertisement monitoring, by comparing the content of sample video clips and target videos, the retrieved Whether the sample video clips appear, where they appear, and how long they last is another important application of the video content comparison technology.

由于视频比对过程中需要处理的数据量较大，因此对视频比对的方法不仅要求具有良好的失真鲁棒性，还需要具有较高的计算性能。目前，视频内容比对技术主要通过视频图像匹配实现，常见的方法有感知哈希(PHash)、颜色矩(ColorMoment)、尺度不变特征变换(SIFT)、卷积神经网络(CNN)、峰值信噪比(PSNR)等。这些方法中部分方法，比如SIFT方法，SIFT方法虽然准确性较好，但计算复杂度高，且只是针对图像的局部特征。现有的视频比对技术还不能同时满足低计算复杂度要求和高准确性要求。Due to the large amount of data that needs to be processed in the video comparison process, the video comparison method not only requires good distortion robustness, but also needs to have high computing performance. At present, video content comparison technology is mainly realized through video image matching. Common methods include perceptual hash (PHash), color moment (ColorMoment), scale invariant noise ratio (PSNR), etc. Some of these methods, such as the SIFT method, although the SIFT method has better accuracy, but the computational complexity is high, and it is only for the local features of the image. Existing video comparison techniques cannot meet the requirements of low computational complexity and high accuracy at the same time.

发明内容Contents of the invention

本发明实施例提供了一种基于感知与像差的视频内容比对方法及装置，以解决现有图像比对方法不能够同时满足低计算复杂度要求和高准确性要求的问题。Embodiments of the present invention provide a video content comparison method and device based on perception and aberration to solve the problem that existing image comparison methods cannot meet the requirements of low computational complexity and high accuracy at the same time.

一方面，本发明实施例提供了一种基于感知与像差的视频内容比对方法，该方法包括：获取源视频图像序列及目标视频图像序列；提取所述源视频图像序列及所述目标视频图像序列的感知哈希特征和像差特征；根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量；根据所述对齐偏移量对齐所述源视频图像序列和所述目标视频图像序列；根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，判断对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配；如果对齐后的所述源视频图像序列和所述目标视频图像序列相似匹配，根据所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配；输出基于所述判断结果生成的比对结果。On the one hand, an embodiment of the present invention provides a video content comparison method based on perception and disparity, the method comprising: acquiring a source video image sequence and a target video image sequence; extracting the source video image sequence and the target video image sequence The perceptual hash feature and the disparity feature of the image sequence; according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, calculate the source video image sequence and the target video image The alignment offset of the sequence; align the source video image sequence and the target video image sequence according to the alignment offset; according to the perceptual hash feature of the source video image sequence and the perception of the target video image sequence Hash feature, judging whether the aligned source video image sequence and the target video image sequence are similarly matched; if the aligned source video image sequence and the target video image sequence are similarly matched, according to the source video The aberration feature of the image sequence and the aberration feature of the target video image sequence, judging whether the similarly matched source video image sequence and the target video image sequence are accurately matched; output the comparison generated based on the judgment result result.

结合一方面，在第一种可能的实现方式中，提取所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征包括：提取所述源视频图像序列的灰度值及所述目标视频图像序列的灰度值，生成源视频灰度图像序列和目标视频灰度图像序列；缩放所述源视频灰度图像序列和所述目标视频灰度图像序列；将缩放后的所述源视频灰度图像序列和所述目标视频灰度图像序列进行离散余弦变换；分别取离散余弦变换后的所述源视频灰度图像序列和所述目标视频灰度图像序列的左上角块系数；计算所述块系数的系数均值；判断所述块系数中每一系数值与所述系数均值的大小关系，如果系数值大于或等于所述系数均值，则将系数点标记为1，如果系数值小于所述系数均值，则将系数点标记为0；根据标记结果生成所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征。In conjunction with one aspect, in a first possible implementation manner, extracting the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence includes: extracting the grayscale of the source video image sequence Value and the grayscale value of described target video image sequence, generate source video grayscale image sequence and target video grayscale image sequence; Scale described source video grayscale image sequence and described target video grayscale image sequence; After zooming The source video grayscale image sequence and the target video grayscale image sequence are subjected to discrete cosine transform; respectively take the upper left corner of the source video grayscale image sequence and the target video grayscale image sequence after discrete cosine transform block coefficient; calculate the coefficient mean value of the block coefficient; determine the size relationship between each coefficient value in the block coefficient and the coefficient mean value, if the coefficient value is greater than or equal to the coefficient mean value, then mark the coefficient point as 1, If the coefficient value is smaller than the coefficient mean value, mark the coefficient point as 0; generate the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence according to the marking result.

结合一方面，在第二种可能的实现方式中，提取所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征包括：提取所述源视频图像序列的灰度值及所述目标视频图像序列的灰度值，生成源视频灰度图像序列和目标视频灰度图像序列；缩放所述源视频灰度图像序列和所述目标视频灰度图像序列；将缩放后的所述源视频灰度图像序列和所述目标视频灰度图像序列分割成N个像素块；计算N个所述像素块中所有像素点的块像素均值；计算每一像素点灰度值与所述块像素均值的像差，如果所述像差大于或等于0，则将像素点标记为1，如果所述像差小于0，则将像素点标记为0；根据标记结果生成所述源视频图像序列的N个块像差特征及所述目标视频图像序列的N个块像差特征。In conjunction with one aspect, in a second possible implementation manner, extracting the aberration feature of the source video image sequence and the aberration feature of the target video image sequence includes: extracting the gray value and The grayscale value of the target video image sequence generates a source video grayscale image sequence and a target video grayscale image sequence; scaling the source video grayscale image sequence and the target video grayscale image sequence; The source video grayscale image sequence and the target video grayscale image sequence are divided into N pixel blocks; calculate the block pixel mean value of all pixels in the N pixel blocks; calculate the gray value of each pixel point and the described The disparity of the block pixel mean value, if the disparity is greater than or equal to 0, then mark the pixel as 1, if the disparity is less than 0, then mark the pixel as 0; generate the source video image according to the marking result N block aberration features of the sequence and N block aberration features of the target video image sequence.

结合第二种可能的实现方式，在第三种可能的实现方式中，根据所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配包括：根据所述源视频图像序列的N个块像差特征及所述目标视频图像序列的N个块像差特征生成所述源视频图像序列的像素块和所述目标视频图像序列的像素块之间的匹配结果；根据所述匹配结果计算相匹配的所述像素块与所述像素块的比值；根据所述比值计算所述源视频图像序列和所述目标视频图像序列的平均匹配块百分比；根据所述平均匹配块百分比与预设匹配块百分比的大小关系，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配。In combination with the second possible implementation, in the third possible implementation, according to the aberration characteristics of the source video image sequence and the aberration characteristics of the target video image sequence, it is judged that the similarly matched source Whether the video image sequence and the target video image sequence exactly match includes: generating the source video image sequence according to the N block aberration features of the source video image sequence and the N block aberration features of the target video image sequence The matching result between the pixel block and the pixel block of the target video image sequence; calculate the ratio of the matched pixel block to the pixel block according to the matching result; calculate the source video image according to the ratio sequence and the average matching block percentage of the target video image sequence; according to the size relationship between the average matching block percentage and the preset matching block percentage, determine whether the similarly matched source video image sequence and the target video image sequence are exact match.

结合第三种可能的实现方式，在第四种可能的实现方式中，根据所述平均匹配块百分比与预设匹配块百分比的大小关系，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配包括：如果所述平均匹配块百分比小于所述预设匹配块百分比，则相似匹配后的所述源视频图像序列和所述目标视频图像序列不精确匹配；如果所述平均匹配块百分比大于或等于所述预设匹配块百分比，则相似匹配后的所述源视频图像序列和所述目标视频图像序列精确匹配。In combination with the third possible implementation, in a fourth possible implementation, according to the size relationship between the average matching block percentage and the preset matching block percentage, it is judged that the similarly matched source video image sequence and the Whether the target video image sequence is accurately matched includes: if the average matching block percentage is less than the preset matching block percentage, the similarly matched source video image sequence and the target video image sequence are not exactly matched; if the If the average matching block percentage is greater than or equal to the preset matching block percentage, then the similarly matched source video image sequence and the target video image sequence are exactly matched.

结合一方面，在第五种可能的实现方式中，根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量，包括：选取多个预选对齐偏移量；根据所述预选对齐偏移量对齐所述源视频图像序列和所述目标视频图像序列；根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算对齐后的所述源视频图像序列和所述目标视频图像序列的平均汉明距离；选取所述平均汉明距离最小的所述预选对齐偏移量为对齐偏移量。In combination with one aspect, in a fifth possible implementation, the source video image sequence and the target video image sequence are calculated according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence. The alignment offset of the video image sequence includes: selecting a plurality of pre-selected alignment offsets; aligning the source video image sequence and the target video image sequence according to the pre-selected alignment offset; according to the source video image sequence The perceptual hash feature of the target video image sequence and the perceptual hash feature of the target video image sequence, calculate the average Hamming distance between the aligned source video image sequence and the target video image sequence; select the minimum average Hamming distance The preselected alignment offset is an alignment offset.

结合一方面，在第六种可能的实现方式中，根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，判断对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配包括：计算所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征的平均汉明距离；根据所述平均汉明距离与预设阈值的大小关系，判断所述对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配。In combination with one aspect, in a sixth possible implementation manner, according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, the aligned source video image sequence and Whether the target video image sequence is similarly matched includes: calculating the average Hamming distance of the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence; Setting a magnitude relationship between thresholds, and judging whether the aligned source video image sequence and the target video image sequence are similarly matched.

结合第六种可能的实现方式，在第七种可能的实现方式中，根据所述平均汉明距离与预设阈值的大小关系，判断所述对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配包括：如果所述平均汉明距离大于所述预设阈值，则所述对齐后的所述源视频图像序列和所述目标视频图像序列不相似匹配；如果所述平均汉明距离小于或等于所述预设阈值，则所述对齐后的所述源视频图像序列和所述目标视频图像序列相似匹配。With reference to the sixth possible implementation manner, in a seventh possible implementation manner, according to the size relationship between the average Hamming distance and a preset threshold, it is judged that the aligned source video image sequence and the target Whether the video image sequence is similarly matched includes: if the average Hamming distance is greater than the preset threshold, the aligned source video image sequence and the target video image sequence are not similarly matched; if the average Hamming distance If the bright distance is less than or equal to the preset threshold, then the aligned source video image sequence and the target video image sequence are similarly matched.

结合一方面，在第八种可能的实现方式中，根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量还包括：选取预设时间长度内的所述源视频图像序列及所述目标视频图像序列；根据预设时间长度内的所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量。In combination with one aspect, in an eighth possible implementation manner, the source video image sequence and the target video image sequence are calculated according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence. The alignment offset of the video image sequence also includes: selecting the source video image sequence and the target video image sequence within a preset time length; according to the perceptual hash feature of the source video image sequence within a preset time length and the perceptual hash feature of the target video image sequence, and calculate the alignment offset of the source video image sequence and the target video image sequence.

本公开实施例的第二方面，提供了一种基于感知与像差的视频内容比对装置，包括：The second aspect of the embodiments of the present disclosure provides a device for comparing video content based on perception and disparity, including:

图像序列获取单元，用于获取源视频图像序列及目标视频图像序列；An image sequence acquisition unit, configured to acquire a source video image sequence and a target video image sequence;

提取单元，用于提取所述源视频图像序列及所述目标视频图像序列的感知哈希特征和像差特征；An extraction unit, configured to extract perceptual hash features and disparity features of the source video image sequence and the target video image sequence;

计算单元，用于根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量；A calculation unit, configured to calculate an alignment offset between the source video image sequence and the target video image sequence according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence;

对齐单元，用于根据所述对齐偏移量对齐所述源视频图像序列和所述目标视频图像序列；an alignment unit, configured to align the source video image sequence and the target video image sequence according to the alignment offset;

第一判断单元，根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，判断对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配；The first judging unit, according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, judges whether the aligned source video image sequence and the target video image sequence are similarly matched ;

第二判断单元，用于如果对齐后的所述源视频图像序列和所述目标视频图像序列相似匹配，根据所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配；The second judging unit is configured to, if the aligned source video image sequence and the target video image sequence are similarly matched, according to the aberration characteristics of the source video image sequence and the aberration characteristics of the target video image sequence, judging whether the similarly matched source video image sequence and the target video image sequence match exactly;

结果输出单元，用于输出基于所述判断结果生成的比对结果。A result output unit, configured to output a comparison result generated based on the judgment result.

本公开实施例的第三方面，提供了一种终端，包括：A third aspect of the embodiments of the present disclosure provides a terminal, including:

处理器；processor;

用于存储处理器可执行指令的存储器；memory for storing processor-executable instructions;

其中，所述处理器被配置为：Wherein, the processor is configured as:

获取源视频图像序列及目标视频图像序列；Obtaining a source video image sequence and a target video image sequence;

提取所述源视频图像序列及所述目标视频图像序列的感知哈希特征和像差特征；extracting perceptual hash features and disparity features of the source video image sequence and the target video image sequence;

根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量；calculating an alignment offset between the source video image sequence and the target video image sequence according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence;

根据所述对齐偏移量对齐所述源视频图像序列和所述目标视频图像序列；aligning the source video image sequence and the target video image sequence according to the alignment offset;

根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，判断对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配；According to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, determine whether the aligned source video image sequence and the target video image sequence are similarly matched;

如果对齐后的所述源视频图像序列和所述目标视频图像序列相似匹配，根据所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配；If the aligned source video image sequence and the target video image sequence are similarly matched, according to the aberration characteristics of the source video image sequence and the aberration characteristics of the target video image sequence, it is judged that the similarly matched Whether the source video image sequence and the target video image sequence match exactly;

输出基于所述判断结果生成的比对结果。Outputting a comparison result generated based on the judgment result.

从上述实施例可以看出，获取源视频图像序列及目标视频图像序列；提取所述源视频图像序列及所述目标视频图像序列的感知哈希特征和像差特征；根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量；根据所述对齐偏移量对齐所述源视频图像序列和所述目标视频图像序列；根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，判断对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配；如果对齐后的所述源视频图像序列和所述目标视频图像序列相似匹配，根据所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配；输出基于所述判断结果生成的比对结果。与现有技术相比，本实施例针对整体图像，结合感知哈希与像差算法，采用分步匹配判断法，能够同时满足视频内容比对的低计算复杂度要求和高准确性要求。As can be seen from the foregoing embodiments, the source video image sequence and the target video image sequence are obtained; the perceptual hash features and aberration features of the source video image sequence and the target video image sequence are extracted; according to the source video image sequence The perceptual hash feature of the target video image sequence and the perceptual hash feature of the target video image sequence, calculate the alignment offset of the source video image sequence and the target video image sequence; align the source video according to the alignment offset image sequence and the target video image sequence; according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, determine the aligned source video image sequence and the target video Whether the image sequences are similarly matched; if the aligned source video image sequences and the target video image sequences are similarly matched, judge according to the aberration characteristics of the source video image sequences and the aberration characteristics of the target video image sequences Whether the similarly matched source video image sequence and the target video image sequence match exactly; output a comparison result generated based on the judgment result. Compared with the prior art, this embodiment adopts a step-by-step matching judgment method for the overall image in combination with perceptual hashing and aberration algorithms, which can simultaneously meet the requirements of low computational complexity and high accuracy for video content comparison.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。通过附图所示，本发明的上述及其它目的、特征和优势将更加清晰。在全部附图中相同的附图标记指示相同的部分。并未刻意按实际尺寸等比例缩放绘制附图，重点在于示出本发明的主旨。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the accompanying drawings required in the embodiments. Obviously, the accompanying drawings in the following description are only some of the present invention. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort. The above and other objects, features and advantages of the present invention will be more clearly illustrated by the accompanying drawings. Like reference numerals designate like parts throughout the drawings. The drawings are not intentionally scaled according to the actual size, and the emphasis is on illustrating the gist of the present invention.

图1为本发明基于感知与像差的视频内容比对方法一个实施例的流程图；Fig. 1 is the flowchart of an embodiment of the video content comparison method based on perception and disparity of the present invention;

图2为所述源视频图像序列及所述目标视频图像序列感知哈希特征提取流程图；Fig. 2 is the flow chart of perceptual hash feature extraction of the source video image sequence and the target video image sequence;

图3为所述源视频图像序列及所述目标视频图像序列像差特征提取流程图；Fig. 3 is described source video image sequence and described target video image sequence aberration feature extraction flowchart;

图4为根据所述对齐偏移量对齐所述源视频图像序列和所述目标视频图像序列的一个实施例的示意图；Fig. 4 is a schematic diagram of an embodiment of aligning the source video image sequence and the target video image sequence according to the alignment offset;

图5为根据感知哈希特征判断所述源视频图像序列及所述目标视频图像序列是否相似匹配的流程图；Fig. 5 is a flowchart of judging whether the source video image sequence and the target video image sequence are similarly matched according to the perceptual hash feature;

图6为根据像差特征判断所述源视频图像序列及所述目标视频图像序列是否精确匹配的流程图；Fig. 6 is a flow chart for judging whether the source video image sequence and the target video image sequence are accurately matched according to the aberration feature;

图7为本发明实施例提供的基于感知与像差的视频内容比对装置的示意图；7 is a schematic diagram of a video content comparison device based on perception and aberration provided by an embodiment of the present invention;

图8为本发明实施例提供的一种终端的框图。Fig. 8 is a block diagram of a terminal provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整的描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

参见图1，为本发明基于感知与像差的视频内容比对方法一个实施例的流程图，该方法包括如下步骤：Referring to Fig. 1, it is a flowchart of an embodiment of the video content comparison method based on perception and aberration of the present invention, the method includes the following steps:

步骤101，获取源视频图像序列及目标视频图像序列。Step 101, acquire a source video image sequence and a target video image sequence.

视频泛指将一系列静态影像以电信号的方式加以捕捉、纪录、处理、储存、传送与重现的各种技术。连续的图像变化每秒超过24帧画面以上时，根据视觉暂留原理，人眼无法辨别单幅的静态画面；看上去是平滑连续的视觉效果，这样连续的画面叫做视频。本申请实施例中的源视频图像序列指的是广播电视信号传输之前的初始样本视频片段，目标视频图像序列指的是广播电视信号传输之后的目的视频片段，源视频图像序列和目标视频图像序列都包含有多幅视频帧。本申请实施例通过将源视频图像序列中的单幅视频帧与目标视频图像序列中的对应单幅视频帧进行比对，来判断源视频图像序列经过传输而的得到的目标视频图像序列是否产生内容错误或质量损伤即失真。通过对源视频进行视频采集和解码获取源视频图像序列，通过对目标视频采集和解码获取目标视频图像序列。Video generally refers to various technologies that capture, record, process, store, transmit and reproduce a series of still images in the form of electrical signals. When the continuous image change exceeds 24 frames per second, according to the principle of visual persistence, the human eye cannot distinguish a single static image; it looks like a smooth and continuous visual effect, such a continuous image is called a video. The source video image sequence in the embodiment of the present application refers to the initial sample video segment before the broadcast TV signal transmission, and the target video image sequence refers to the destination video segment after the broadcast TV signal transmission, the source video image sequence and the target video image sequence Both contain multiple video frames. In the embodiment of the present application, by comparing the single video frame in the source video image sequence with the corresponding single video frame in the target video image sequence, it is judged whether the target video image sequence obtained through transmission of the source video image sequence is produced. Incorrect content or impaired quality is distortion. The source video image sequence is obtained by video capturing and decoding the source video, and the target video image sequence is obtained by capturing and decoding the target video.

步骤102，提取所述源视频图像序列及所述目标视频图像序列的感知哈希特征和像差特征。Step 102, extracting perceptual hash features and disparity features of the source video image sequence and the target video image sequence.

在本实施例中，具体的，参见图2，提取所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征包括以下步骤：In this embodiment, specifically, referring to FIG. 2 , extracting the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence includes the following steps:

步骤201：提取所述源视频图像序列的灰度值及所述目标视频图像序列的灰度值，生成源视频灰度图像序列和目标视频灰度图像序列。Step 201: Extract the grayscale value of the source video image sequence and the grayscale value of the target video image sequence to generate a source video grayscale image sequence and a target video grayscale image sequence.

彩色图像变成灰度格式，是要扔掉图像的颜色信息，用灰度表示图像的亮度信息。彩色图像每像素占3个字节，而变成灰度图像后，每个像素占一个字节，像素的灰度值是当前彩色图像像素的亮度。可利用图像处理工具提取所述源视频图像序列的灰度值及所述目标视频图像序列的灰度值即生成了源视频图像序列的灰度图像帧和目标视频图像序列的灰度图像帧。When a color image is converted into a grayscale format, the color information of the image is to be discarded, and the brightness information of the image is expressed in grayscale. A color image occupies 3 bytes per pixel, and after being converted into a grayscale image, each pixel occupies one byte, and the grayscale value of the pixel is the brightness of the current color image pixel. The grayscale value of the source video image sequence and the grayscale value of the target video image sequence can be extracted by using an image processing tool, that is, the grayscale image frame of the source video image sequence and the grayscale image frame of the target video image sequence are generated.

步骤202：缩放所述源视频灰度图像序列和所述目标视频灰度图像序列。在进行提取所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征之前，为了方便比对处理及减小后续比对过程的计算量，还可以缩放所述源视频图像序列及所述目标视频图像序列，即缩放所述源视频图像序列中的所有视频帧和所述目标视频图像序列中的所有视频帧。灰度图像包含了亮度变化大的区域比如物体边缘即亮度变化大的区域，亮度变化大的区域称为高频区域，亮度变化小的区域称为低频区域，缩放灰度图像即损失高频信息的过程，源视频灰度图像序列和所述目标视频灰度图像序列的图像帧均缩放至比如32*32分辨率。Step 202: scaling the source video grayscale image sequence and the target video grayscale image sequence. Before extracting the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, in order to facilitate the comparison process and reduce the calculation amount of the subsequent comparison process, the source can also be scaled The video image sequence and the target video image sequence are to scale all the video frames in the source video image sequence and all the video frames in the target video image sequence. The grayscale image contains areas with large brightness changes, such as the edge of the object, which is the area with large brightness changes. The area with large brightness changes is called the high frequency area, and the area with small brightness changes is called the low frequency area. Scaling the grayscale image loses high frequency information. In the process, the image frames of the source video grayscale image sequence and the target video grayscale image sequence are scaled to, for example, 32*32 resolution.

步骤203：将缩放后的所述源视频灰度图像序列和所述目标视频灰度图像序列进行离散余弦变换。对图像进行离散余弦变化，变换后的离散余弦变换系统能量主要集中在左上角，其余大部分系数接近于零，因此离散余弦变换能够对使图像信息集中，变换后的图像便于后续处理。Step 203: Discrete cosine transform is performed on the scaled source video grayscale image sequence and the target video grayscale image sequence. The discrete cosine transformation is performed on the image. The energy of the transformed discrete cosine transform system is mainly concentrated in the upper left corner, and most of the remaining coefficients are close to zero. Therefore, the discrete cosine transform can concentrate the image information, and the transformed image is convenient for subsequent processing.

步骤204：分别取离散余弦变换后的所述源视频灰度图像序列和所述目标视频灰度图像序列的左上角块系数。离散余弦变换是对二维像素数组进行处理，即图像的时频变换，经过离散余弦变换后的图像信息集中在左上角，可取变换后图像的左上角8*8块系数，并将图像中心坐标[0,0]处的系数设置为0。Step 204: Take the upper left block coefficients of the source video grayscale image sequence and the target video grayscale image sequence respectively after discrete cosine transform. Discrete cosine transform is to process the two-dimensional pixel array, that is, the time-frequency transform of the image. After the discrete cosine transform, the image information is concentrated in the upper left corner. The 8*8 block coefficients in the upper left corner of the transformed image can be taken, and the image center coordinates Coefficients at [0,0] are set to 0.

步骤205：计算所述块系数的系数均值。比如，计算所述8*8块系数中的所有系数的系数均值为A。Step 205: Calculate the coefficient mean value of the block coefficients. For example, calculate the coefficient mean value A of all the coefficients in the 8*8 block coefficients.

步骤206：判断所述块系数中每一系数值与所述系数均值的大小关系，如果系数值大于或等于所述系数均值，则将系数标记为1，如果系数值小于所述系数均值，则将系数标记为0。通过判断每个系数的系数值X与系数均值A的大小关系，如果X≥A，则将该系数标记为1，如果X＜A，则将该系数标记为0。Step 206: Determine the relationship between each coefficient value in the block coefficient and the coefficient mean value, if the coefficient value is greater than or equal to the coefficient mean value, mark the coefficient as 1, if the coefficient value is smaller than the coefficient mean value, then Label the coefficients as 0. By judging the magnitude relationship between the coefficient value X of each coefficient and the mean value A of the coefficient, if X≥A, the coefficient is marked as 1, and if X<A, the coefficient is marked as 0.

步骤207：根据标记结果生成所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征。具体的，按照8*8块系数的顺序进行标记形成64个bit值，将标记形成的64个bit值作为所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征。Step 207: Generate perceptual hash features of the source video image sequence and perceptual hash features of the target video image sequence according to the marking results. Specifically, 64 bit values are formed by marking according to the order of 8*8 block coefficients, and the 64 bit values formed by marking are used as the perceptual hash feature of the source video image sequence and the perceptual hash of the target video image sequence feature.

在本实施例中，具体的，参见图3，提取所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征包括以下步骤：In this embodiment, specifically, referring to FIG. 3 , extracting the aberration feature of the source video image sequence and the aberration feature of the target video image sequence includes the following steps:

步骤301：提取所述源视频图像序列的灰度值及所述目标视频图像序列的灰度值，生成源视频灰度图像序列和目标视频灰度图像序列。Step 301: Extract the grayscale value of the source video image sequence and the grayscale value of the target video image sequence to generate a source video grayscale image sequence and a target video grayscale image sequence.

步骤302：缩放所述源视频灰度图像序列和所述目标视频灰度图像序列，可以将源视频图像序列的图像帧和目标视频图像序列的图像帧均缩放至176*144分辨率。Step 302: scaling the source video grayscale image sequence and the target video grayscale image sequence, both the image frames of the source video image sequence and the image frames of the target video image sequence can be scaled to a resolution of 176*144.

步骤303：将缩放后的所述源视频灰度图像序列和所述目标视频灰度图像序列分割成N个像素块。具体的可以将所述源视频灰度图像序列和所述目标视频灰度图像序列分割成8*8个像素块，则源视频图像序列的每一图像帧和目标视频图像序列的每一图像帧均具有(176/8)*(144/8)＝396个块。Step 303: Divide the scaled source video grayscale image sequence and the target video grayscale image sequence into N pixel blocks. Specifically, the source video grayscale image sequence and the target video grayscale image sequence can be divided into 8*8 pixel blocks, then each image frame of the source video image sequence and each image frame of the target video image sequence Each has (176/8)*(144/8)=396 blocks.

步骤304：计算N个所述像素块中所有像素点的块像素均值。可以是计算所述396个像素块中所有像素点的块像素均值，得到396个块像素均值A₁、A₂…A₃₉₆。Step 304: Calculate the block pixel mean value of all pixel points in the N pixel blocks. It may be to calculate the block pixel mean values of all the pixels in the 396 pixel blocks to obtain the 396 block pixel mean values A ₁ , A ₂ . . . A ₃₉₆ .

步骤305：计算每一像素点灰度值与所述块像素均值的像差，如果所述像差大于或等于0，则将像素点标记为1，如果所述像差小于0，则将像素点标记为0。Step 305: Calculate the disparity between the gray value of each pixel and the mean value of the pixel in the block, if the disparity is greater than or equal to 0, mark the pixel as 1, and if the disparity is less than 0, mark the pixel as Points are marked as 0.

步骤306：根据标记结果生成所述源视频图像序列的N个块像差特征及所述目标视频图像序列的N个块像差特征。最终每一个块像素均得到64个bit值，得到396个块像差特征。Step 306: Generate N block aberration features of the source video image sequence and N block aberration features of the target video image sequence according to the marking result. Finally, 64 bit values are obtained for each block pixel, and 396 block aberration features are obtained.

根据步骤201至步骤207得到所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，以及根据步骤301至步骤306得到所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征。Obtain the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence according to step 201 to step 207, and obtain the aberration feature and the aberration feature of the source video image sequence according to step 301 to step 306 The aberration feature of the target video image sequence.

步骤103，根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量。Step 103: Calculate an alignment offset between the source video image sequence and the target video image sequence according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence.

步骤104，根据所述对齐偏移量对齐所述源视频图像序列和所述目标视频图像序列。Step 104, align the source video image sequence and the target video image sequence according to the alignment offset.

图4展示了一个根据所述对齐偏移量对齐所述源视频图像序列和所述目标视频图像序列的例子。首先选取多个预选对齐偏移量O₁、O₂…O_m。根据所述预选对齐偏移量O₁、O₂…O_m分别对齐所述源视频图像序列和所述目标视频图像序列。对齐所述源视频图像序列和所述目标视频图像序列之后，根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算对齐后的所述源视频图像序列和所述目标视频图像序列的平均汉明距离，平均汉明距离的表达公式可以是：#其中，H_i、H’_i+o分别为所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征。DH为对齐图像帧感知哈希特征的汉明距离，即2组64bit特征中不同bit的个数。N为对齐窗口中，对齐图像帧的个数。本实施例中还可以取一定时间长度的对齐窗口进行平均汉明距离的计算，例如可以截取5秒窗口内的视频帧，设窗口内含有M帧图像，则N＝M-O。最后得到所述预选对齐偏移量O₁、O₂…O_m下的感知哈希特征平均汉明距离：取距离最小且对齐帧数N≥10的偏移量，作为视频同步对齐偏移量，记为O_o。根据得到偏移量O_o对齐所述源视频图像序列和所述目标视频图像序列。FIG. 4 shows an example of aligning the source video image sequence and the target video image sequence according to the alignment offset. First, a plurality of preselected alignment offsets O ₁ , O ₂ . . . O _m are selected. The source video image sequence and the target video image sequence are respectively aligned according to the preselected alignment offsets O ₁ , O ₂ . . . O _m . After aligning the source video image sequence and the target video image sequence, calculate the aligned source video image according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence The average Hamming distance between the sequence and the target video image sequence, the expression formula of the average Hamming distance can be: # Wherein, H _i , H' _i+o are perceptual hash features of the source video image sequence and perceptual hash features of the target video image sequence, respectively. DH is the Hamming distance of the perceptual hash feature of the aligned image frame, that is, the number of different bits in the two groups of 64bit features. N is the number of aligned image frames in the alignment window. In this embodiment, an alignment window of a certain length of time can also be used to calculate the average Hamming distance. For example, a video frame within a 5-second window can be intercepted, and if the window contains M frames of images, then N=MO. Finally, the average Hamming distance of perceptual hash features under the pre-selected alignment offsets O ₁ , O ₂ ... O _m is obtained: The offset with the smallest distance and the number of aligned frames N≥10 is taken as the video synchronization alignment offset, which is denoted as O _o . Aligning the source video image sequence and the target video image sequence according to the obtained offset _Oo .

根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量还可以包括：选取预设时间长度内的所述源视频图像序列及所述目标视频图像序列；根据预设时间长度内的所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量。通过缩小比较范围既可以获得准确的对齐偏移量也可以减小计算量。According to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, calculating the alignment offset of the source video image sequence and the target video image sequence may also include: selecting a preset Assume the source video image sequence and the target video image sequence in the time length; according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence in the preset time length, Calculate alignment offsets of the source video image sequence and the target video image sequence. By narrowing the comparison range, an accurate alignment offset can be obtained and the amount of calculation can be reduced.

步骤105，根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，判断对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配。具体的，根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，判断对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配包括以下步骤，如图5所示：Step 105, according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, determine whether the aligned source video image sequence and the target video image sequence are similar and match. Specifically, according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, judging whether the aligned source video image sequence and the target video image sequence are similarly matched includes the following Steps, as shown in Figure 5:

步骤501：计算所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征的平均汉明距离。Step 501: Calculate the average Hamming distance between the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence.

步骤502：根据所述平均汉明距离与预设阈值的大小关系，判断所述对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配。例如，选取预设阈值为T1，如果计算的感知哈希特征的平均汉明距离大于所述预设阈值T1，则判断窗口内源视频图像序列和目标视频图像序列不相似匹配。如果计算的感知哈希特征的平均汉明距离小于所述预设阈值T1，则判断窗口内源视频图像序列和目标视频图像序列相似匹配，需要进一步的根据像差特征进行精确匹配判断。Step 502: According to the magnitude relationship between the average Hamming distance and a preset threshold, determine whether the aligned source video image sequence and the target video image sequence are similarly matched. For example, the preset threshold is selected as T1, and if the calculated average Hamming distance of perceptual hash features is greater than the preset threshold T1, then the source video image sequence and the target video image sequence in the judgment window are not similarly matched. If the calculated average Hamming distance of the perceptual hash feature is smaller than the preset threshold T1, the source video image sequence and the target video image sequence within the judgment window are similarly matched, and further accurate matching judgment is required based on the aberration feature.

步骤106，如果对齐后的所述源视频图像序列和所述目标视频图像序列相似匹配，根据所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配。如果根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征判断对齐后的所述源视频图像序列和所述目标视频图像序列相似匹配后，则需进一步的根据述源视频图像序列的像差特征及所述目标视频图像序列的像差特征，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配。具体的，根据所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配包括以下步骤，如图6所示：Step 106, if the aligned source video image sequence and the target video image sequence are similarly matched, according to the aberration characteristics of the source video image sequence and the aberration characteristics of the target video image sequence, it is judged that after similar matching Whether the source video image sequence and the target video image sequence exactly match. If it is judged according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence that the aligned source video image sequence and the target video image sequence are similarly matched, further steps are required According to the aberration feature of the source video image sequence and the aberration feature of the target video image sequence, it is judged whether the similarly matched source video image sequence and the target video image sequence match exactly. Specifically, according to the aberration characteristics of the source video image sequence and the aberration characteristics of the target video image sequence, judging whether the similarly matched source video image sequence and the target video image sequence match exactly includes the following steps ,As shown in Figure 6:

步骤601：根据所述源视频图像序列的N个块像差特征及所述目标视频图像序列的N个块像差特征生成相似匹配后的所述源视频图像序列的像素块和所述目标视频图像序列的像素块之间的匹配结果。例如，对每个8*8像素块的64bit像差特征，计算汉明距离(2组64bit中不同bit的个数)，如果汉明距离大于第二预设阈值T2，则可判断对应的块内容不相匹配，如果汉明距离小于等于第二预设阈值T2，则可判断对应的块内容相匹配。Step 601: Generate similarly matched pixel blocks of the source video image sequence and the target video image sequence according to the N block aberration features of the source video image sequence and the N block aberration features of the target video image sequence Matching results between pixel blocks of an image sequence. For example, for the 64bit aberration feature of each 8*8 pixel block, calculate the Hamming distance (the number of different bits in the 2 groups of 64bit), if the Hamming distance is greater than the second preset threshold T2, the corresponding block can be judged If the content does not match, if the Hamming distance is less than or equal to the second preset threshold T2, it can be determined that the content of the corresponding block matches.

步骤602：根据N个所述像素块的匹配结果计算相匹配的所述像素块与所述像素块的比值。根据步骤601得到相匹配的像素块的个数Y，可以计算出每帧图像N个像素块中，内容相匹配的像素块占所述像素块的总块数的百分比。例如得到的相匹配的像素块的个数Y为360个，总块数N为396块，则相匹配的像素块的个数占总块数的百分比为360/396≈91％。Step 602: Calculate the ratio of the matched pixel blocks to the pixel blocks according to the matching results of the N pixel blocks. According to the number Y of matching pixel blocks obtained in step 601, the percentage of the pixel blocks with matching content in the total number of pixel blocks among the N pixel blocks in each frame of image can be calculated. For example, the obtained number Y of matching pixel blocks is 360, and the total number N of blocks is 396 blocks, then the percentage of the number of matching pixel blocks to the total number of blocks is 360/396≈91%.

步骤603：根据所述比值计算所述源视频图像序列和所述目标视频图像序列的平均匹配块百分比。平均匹配块百分比为所有帧图像的匹配快百分比的均值Y₁。Step 603: Calculate the average matching block percentage of the source video image sequence and the target video image sequence according to the ratio. The average matching block percentage is the mean value Y ₁ of the matching block percentages of all frame images.

步骤604：根据所述平均匹配块百分比与预设匹配块百分比的大小关系，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配。百分比阈值T₃，如果Y₁小于阈值T₃，则判断相似匹配后的所述源视频图像序列和所述目标视频图像序列不精确匹配，如果Y₁大于或等于阈值T₃，则判断相似匹配后的所述源视频图像序列和所述目标视频图像序列精确匹配。Step 604: According to the size relationship between the average matching block percentage and the preset matching block percentage, determine whether the similarly matched source video image sequence and the target video image sequence match exactly. Percentage threshold T ₃ , if Y ₁ is less than the threshold T ₃ , then it is judged that the source video image sequence after similar matching does not exactly match the target video image sequence, if Y ₁ is greater than or equal to the threshold T ₃ , then it is judged similar matching The subsequent sequence of source video images is exactly matched to the sequence of target video images.

本领域的技术人员可以清楚地了解到本发明实施例中的技术可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解，本发明实施例中的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。Those skilled in the art can clearly understand that the technologies in the embodiments of the present invention can be implemented by means of software plus a necessary general-purpose hardware platform. Based on this understanding, the essence of the technical solutions in the embodiments of the present invention or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM , magnetic disk, optical disk, etc., including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in various embodiments or some parts of the embodiments of the present invention.

另外，作为上述各实施例的实现，本公开实施例还提供了一种基于感知与像差的视频内容比对装置，该装置位于终端中，如图7所示，该装置包括：图像序列获取单元10、提取单元20、计算单元30、对齐单元40、第一判断单元50、第二判断单元60和结果输出单元70，其中：In addition, as the implementation of the above-mentioned embodiments, the embodiment of the present disclosure also provides a video content comparison device based on perception and aberration, the device is located in the terminal, as shown in FIG. 7 , the device includes: image sequence acquisition Unit 10, extraction unit 20, calculation unit 30, alignment unit 40, first judging unit 50, second judging unit 60 and result output unit 70, wherein:

图像序列获取单元10被配置为获取源视频图像序列及目标视频图像序列；The image sequence acquiring unit 10 is configured to acquire a source video image sequence and a target video image sequence;

提取单元20被配置为提取所述源视频图像序列及所述目标视频图像序列的感知哈希特征和像差特征；The extraction unit 20 is configured to extract perceptual hash features and disparity features of the source video image sequence and the target video image sequence;

计算单元30被配置为根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量；The calculation unit 30 is configured to calculate the alignment offset of the source video image sequence and the target video image sequence according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence ;

对齐单元40被配置为根据所述对齐偏移量对齐所述源视频图像序列和所述目标视频图像序列；The alignment unit 40 is configured to align the source video image sequence and the target video image sequence according to the alignment offset;

第一判断单元50被配置为根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，判断对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配；The first judging unit 50 is configured to judge the aligned source video image sequence and the target video image sequence according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence whether similar match;

第二判断单元60被配置为果对齐后的所述源视频图像序列和所述目标视频图像序列相似匹配，根据所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配；The second judging unit 60 is configured to, if the aligned source video image sequence and the target video image sequence are similarly matched, according to the aberration feature of the source video image sequence and the aberration feature of the target video image sequence , judging whether the similarly matched source video image sequence and the target video image sequence match exactly;

结果输出单元70被配置为输出基于所述判断结果生成的比对结果。The result output unit 70 is configured to output a comparison result generated based on the judgment result.

本公开实施例提供的基于感知与像差的视频内容比对装置，通过获取源视频图像序列及目标视频图像序列；提取所述源视频图像序列及所述目标视频图像序列的感知哈希特征和像差特征；根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，计算所述源视频图像序列及所述目标视频图像序列的对齐偏移量；根据所述对齐偏移量对齐所述源视频图像序列和所述目标视频图像序列；根据所述源视频图像序列的感知哈希特征及所述目标视频图像序列的感知哈希特征，判断对齐后的所述源视频图像序列和所述目标视频图像序列是否相似匹配；如果对齐后的所述源视频图像序列和所述目标视频图像序列相似匹配，根据所述源视频图像序列的像差特征及所述目标视频图像序列的像差特征，判断相似匹配后的所述源视频图像序列和所述目标视频图像序列是否精确匹配；输出基于所述判断结果生成的比对结果。本公开实施例结合感知哈希与像差算法，采用分步匹配判断法，能够同时满足视频内容比对的低计算复杂度要求和高准确性要求。The video content comparison device based on perception and disparity provided by the embodiments of the present disclosure obtains the source video image sequence and the target video image sequence; extracts the perceptual hash features and sum of the source video image sequence and the target video image sequence Aberration feature; according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, calculate the alignment offset of the source video image sequence and the target video image sequence; according to The alignment offset aligns the source video image sequence and the target video image sequence; according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, determine the aligned Whether the source video image sequence and the target video image sequence are similarly matched; if the aligned source video image sequence and the target video image sequence are similarly matched, according to the aberration characteristics of the source video image sequence and the The aberration feature of the target video image sequence is used to determine whether the similarly matched source video image sequence and the target video image sequence are accurately matched; and a comparison result generated based on the judgment result is output. The embodiment of the present disclosure combines perceptual hashing and aberration algorithms, and adopts a step-by-step matching judgment method, which can meet the requirements of low computational complexity and high accuracy for video content comparison at the same time.

图8是根据一示例性实施例示出的一种终端800的框图。例如，终端800可以是消息收发设备，平板设备，计算机等。参照图8，终端800可以包括以下一个或多个组件，处理组件802，存储器804，电源组件806，多媒体组件808，音频组件810，输入输出接口812以及通信组件814。Fig. 8 is a block diagram showing a terminal 800 according to an exemplary embodiment. For example, the terminal 800 may be a messaging device, a tablet device, a computer, and the like. Referring to FIG. 8 , terminal 800 may include one or more of the following components, processing component 802 , memory 804 , power supply component 806 , multimedia component 808 , audio component 810 , input and output interface 812 and communication component 814 .

处理组件802通常控制终端800的整体操作，诸如显示，数据通信，记录操作等相关联的操作，处理组件802可以包括一个或多个处理器820来执行指令，以完成上述的方法的全部或部分步骤。此外，处理组件802可以包括一个或多个模块，便于处理组件802和其他组件之间的交互。例如，处理组件802可以包括多媒体模块，以方便多媒体组件808和处理组件802之间的交互。The processing component 802 usually controls the overall operation of the terminal 800, such as display, data communication, recording operation and other associated operations, and the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the above methods step. Additionally, processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802 .

存储器804被配置为存储各种类型的数据以支持在终端800的操作。电源组件806为终端800的各种组件提供电力。电源组件806可以包括电源管理系统，一个或多个电源，及其他与为终端800生成、管理和分配电力相关联的组件。多媒体组件808包括在所述终端800和用户之间的提供一个输出接口的屏幕。音频组件810被配置为输出和/或输入音频信号。输入输出接口812为处理组件802和外围接口模块之间提供接口，上述外围接口模块可以是键盘，点击轮，按钮等。这些按钮可包括但不限于：主页按钮、音量按钮、启动按钮和锁定按钮。The memory 804 is configured to store various types of data to support operations at the terminal 800 . The power supply component 806 provides power to various components of the terminal 800 . Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for terminal 800 . The multimedia component 808 includes a screen providing an output interface between the terminal 800 and the user. The audio component 810 is configured to output and/or input audio signals. The input and output interface 812 provides an interface between the processing component 802 and a peripheral interface module. The peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.

通信组件816被配置为便于终端800和其他设备之间有线或无线方式的通信。终端800可以接入基于通信标准的无线网络，如WiFi，2G或3G，或它们的组合。在一个示例性实施例中，通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中，所述通信组件816还包括近场通信(NFC)模块，以促进短程通信。例如，在NFC模块可基于射频识别(RFID)技术，红外数据协会(IrDA)技术，超宽带(UWB)技术，蓝牙(BT)技术和其他技术来实现。The communication component 816 is configured to facilitate wired or wireless communication between the terminal 800 and other devices. The terminal 800 can access a wireless network based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性实施例中，终端800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述方法。In an exemplary embodiment, terminal 800 may be programmed by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the methods described above.

在示例性实施例中，还提供了一种包括指令的非临时性计算机可读存储介质，例如包括指令的存储器804，上述指令可由终端800的处理器820执行以完成上述方法。例如，所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as the memory 804 including instructions, which can be executed by the processor 820 of the terminal 800 to complete the above method. For example, the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

一种非临时性计算机可读存储介质，当所述存储介质中的指令由终端800的处理器执行时，使得终端800能够执行一种信息处理方法，所述方法包括：A non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the terminal 800, the terminal 800 can execute an information processing method, the method comprising:

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于系统实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。以上所述的本发明实施方式，并不构成对本发明保护范围的限定。任何在本发明的精神和原则之内所作的修改、等同替换和改进等，均应包含在本发明的保护范围之内。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the related parts, please refer to the part of the description of the method embodiment. The embodiments of the present invention described above are not intended to limit the protection scope of the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. A video content comparison method based on perception and aberration, characterized in that, comprising:

Obtaining a source video image sequence and a target video image sequence;

extracting perceptual hash features and disparity features of the source video image sequence and the target video image sequence;

calculating an alignment offset between the source video image sequence and the target video image sequence according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence;

aligning the source video image sequence and the target video image sequence according to the alignment offset;

According to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, determine whether the aligned source video image sequence and the target video image sequence are similarly matched;

If the aligned source video image sequence and the target video image sequence are similarly matched, according to the aberration characteristics of the source video image sequence and the aberration characteristics of the target video image sequence, it is judged that the similarly matched Whether the source video image sequence and the target video image sequence match exactly;

Outputting a comparison result generated based on the judgment result.

2. The method according to claim 1, wherein extracting the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence comprises:

Extracting the grayscale value of the source video image sequence and the grayscale value of the target video image sequence, generating a source video grayscale image sequence and a target video grayscale image sequence;

scaling the source video grayscale image sequence and the target video grayscale image sequence;

performing discrete cosine transform on the scaled source video grayscale image sequence and the target video grayscale image sequence;

Respectively take the upper left corner block coefficients of the source video grayscale image sequence and the target video grayscale image sequence after discrete cosine transform;

calculating a coefficient mean of said block coefficients;

Judging the relationship between each coefficient value in the block coefficient and the coefficient mean value, if the coefficient value is greater than or equal to the coefficient mean value, the coefficient is marked as 1, and if the coefficient value is smaller than the coefficient mean value, the coefficient is marked as is 0;

Generating perceptual hash features of the source video image sequence and perceptual hash features of the target video image sequence according to the marking results.

3. The method according to claim 1, wherein extracting the aberration feature of the source video image sequence and the aberration feature of the target video image sequence comprises:

Dividing the scaled source video grayscale image sequence and the target video grayscale image sequence into N pixel blocks;

Calculate the block pixel mean value of all pixels in the N pixel blocks;

Calculate the disparity between the gray value of each pixel and the average value of the pixel in the block, if the disparity is greater than or equal to 0, mark the pixel as 1, and if the disparity is less than 0, mark the pixel as 0;

N block aberration features of the source video image sequence and N block aberration features of the target video image sequence are generated according to the marking result.

4. method according to claim 3, it is characterized in that, according to the aberration characteristic of described source video image sequence and the aberration characteristic of described target video image sequence, judge the described source video image sequence after similar matching and Whether the target video image sequence accurately matches includes:

According to the N block aberration features of the source video image sequence and the N block aberration features of the target video image sequence, generate a gap between the pixel block of the source video image sequence and the pixel block of the target video image sequence the matching result;

calculating the ratio of the matched pixel block to the pixel block according to the matching result;

calculating an average matching block percentage of the source video image sequence and the target video image sequence according to the ratio;

According to the size relationship between the average matching block percentage and the preset matching block percentage, it is judged whether the similarly matched source video image sequence and the target video image sequence match exactly.

5. The method according to claim 4, wherein, according to the size relationship between the average matching block percentage and the preset matching block percentage, the source video image sequence and the target video image sequence after similar matching are judged Exact match includes:

If the average matching block percentage is less than the preset matching block percentage, the similarly matched source video image sequence and the target video image sequence are not exactly matched;

If the average matching block percentage is greater than or equal to the preset matching block percentage, the similarly matched source video image sequence and the target video image sequence are exactly matched.

6. The method according to claim 1, wherein, according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, the source video image sequence and the The alignment offset of the target video image sequence, including:

Select multiple pre-selected alignment offsets;

aligning the source video image sequence and the target video image sequence according to the preselected alignment offset;

According to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, calculate the average Hamming distance between the aligned source video image sequence and the target video image sequence;

The preselected alignment offset with the smallest average Hamming distance is selected as the alignment offset.

7. The method according to claim 1, wherein, according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, determine the aligned source video image sequence Whether the similarity matching with the target video image sequence includes:

calculating an average Hamming distance between the perceptual hash features of the source video image sequence and the perceptual hash features of the target video image sequence;

According to the magnitude relationship between the average Hamming distance and a preset threshold, it is judged whether the aligned source video image sequence and the target video image sequence are similarly matched.

8. The method according to claim 7, wherein, according to the size relationship between the average Hamming distance and a preset threshold, it is judged whether the aligned source video image sequence and the target video image sequence are Similar matches include:

If the average Hamming distance is greater than the preset threshold, the aligned source video image sequence and the target video image sequence are not similarly matched;

If the average Hamming distance is less than or equal to the preset threshold, the aligned source video image sequence and the target video image sequence are similarly matched.

9. The method according to claim 1, wherein, according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, the source video image sequence and the The alignment offset of the target video image sequence also includes:

selecting the source video image sequence and the target video image sequence within a preset time length;

calculating an alignment offset between the source video image sequence and the target video image sequence according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence within a preset time length .

10. A video content comparison device based on perception and aberration, characterized in that it comprises:

An image sequence acquisition unit, configured to acquire a source video image sequence and a target video image sequence;

An extraction unit, configured to extract perceptual hash features and disparity features of the source video image sequence and the target video image sequence;

A calculation unit, configured to calculate an alignment offset between the source video image sequence and the target video image sequence according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence;

an alignment unit, configured to align the source video image sequence and the target video image sequence according to the alignment offset;

The first judging unit, according to the perceptual hash feature of the source video image sequence and the perceptual hash feature of the target video image sequence, judges whether the aligned source video image sequence and the target video image sequence are similarly matched ;

The second judging unit is configured to, if the aligned source video image sequence and the target video image sequence are similarly matched, according to the aberration characteristics of the source video image sequence and the aberration characteristics of the target video image sequence, judging whether the similarly matched source video image sequence and the target video image sequence match exactly;

A result output unit, configured to output a comparison result generated based on the judgment result.

11. A terminal, characterized in that, comprising:

processor;

memory for storing processor-executable instructions;

Wherein, the processor is configured as:

Obtaining a source video image sequence and a target video image sequence;

Outputting a comparison result generated based on the judgment result.