CN114267001B

CN114267001B - Visual recognition system

Info

Publication number: CN114267001B
Application number: CN202210189631.7A
Authority: CN
Inventors: 苗炜; 李东
Original assignee: Beijing Huayu Qizhi Technology Co ltd
Current assignee: Beijing Huayu Qizhi Technology Co ltd
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2022-06-03
Anticipated expiration: 2042-03-01
Also published as: CN114267001A

Abstract

The application relates to the field of visual identification, and specifically discloses a visual identification system, and the visual identification system comprises: the image acquisition device is used for acquiring n frame images from sample video information and video information to be compared, which is used for comparing with the sample video information, wherein n is a natural number which is not zero; an image processing device for setting feature values for m feature objects of each frame image of the sample video information and the video information to be compared, thereby generating a feature vector A for each frame image_1mM is a natural number not equal to zero; the visual processing device identifies whether video content similar to the sample video information exists in the video information to be compared according to the statistical relationship between the feature vectors of the frame images of the sample video information and the frame images of the video information to be compared. According to the video content identification method and device, the video content identification scheme with high accuracy is provided by identifying the statistical distribution condition of different characteristics of the video.

Description

Visual recognition system

Technical Field

The present invention relates to the field of visual recognition, and more particularly, to a visual recognition system for determining whether a video has similar video information.

Background

As society develops into the information age, the efficiency and accuracy of manual processing work are gradually difficult to meet the requirements. For example, when a large amount of videos need to be processed, conventionally, when works such as identifying similar content videos, labeling, identifying similar content in the same video, or editing the same content to save storage space are performed manually, a lot of time is often spent, and accuracy cannot be guaranteed with limited effort.

Although a scheme of judging video similarity using computer vision has been proposed. However, the existing video identification method generally adopts the idea of finding out similarity of images, and compares the similarity of image pixel distribution of different video frames by identifying the content of the images in the video. Although the method improves certain working efficiency compared with a manual method, the method is substantially limited to image contrast, and the overall similarity of the video is difficult to accurately judge only through the contrast of image pixels. For example, after a video is secondarily edited, a new video may only retain partial features compared with an original video, and in this case, although the video expression content has similarity, the overall distribution of image pixels of video frames is significantly different, which results in that the conventional method is difficult to effectively identify and has low efficiency.

Therefore, how to provide a video recognition scheme with high efficiency and high accuracy becomes a technical problem to be solved in the field.

Disclosure of Invention

In view of the above, the present application provides a visual recognition system to provide a video recognition scheme with high efficiency and high accuracy.

According to the present application, there is provided a vision recognition system comprising: the image acquisition device is used for acquiring n frame images from sample video information and video information to be compared, which is used for comparing with the sample video information, wherein n is a natural number which is not zero; an image processing device for setting feature values for m feature objects of each frame image of the sample video information and the video information to be compared, thereby generating a feature vector A for each frame image_1mM is a natural number not equal to zero; the visual processing device identifies whether video content similar to the sample video information exists in the video information to be compared according to the statistical relationship between the feature vectors of the frame images of the sample video information and the frame images of the video information to be compared.

Preferably, the image processing apparatus implements the calculation of the feature value of each feature object by using convolution model processing.

Preferably, the image processing apparatus further performs a dimension reduction calculation on the feature values of the feature object after the convolution processing, so as to obtain the feature vector a of each frame image_1mDimension reduction as feature vector A_1kAnd k is a natural number which is less than m and not zero.

Preferably, after the dimension reduction calculation processingCharacteristic vector A of_1kEach feature object of (a) has a respective weight value.

Preferably, the number of frame images collected from the sample video information is equal to or different from the number of frame images collected from the to-be-compared video information.

Preferably, the visual processing device is configured to respectively combine the feature vector of each frame image of the sample video information and the feature vector of each frame image of the to-be-compared video information into an analysis matrix B_nmAccording to the analysis matrix B_nmThe similarity relation between the sample video information and the video information to be compared on the characteristic object corresponding to the column element is judged according to the statistical relation of the corresponding column element.

Preferably, each row of the analysis matrix corresponds to a feature vector of each frame image, columns of the analysis matrix are arranged according to the time sequence of the n frame images, and/or columns of the analysis matrix are arranged according to a weight value of a feature object corresponding to the column.

Preferably, the visual processing means is adapted to obtain a sample cumulative distribution function curve of the eigenvalues of each column of elements of the analysis matrix.

Preferably, the visual processing device identifies whether the sample video information and the to-be-compared video information have similarity on the feature object according to a sample cumulative distribution function curve of n frame images of the sample video information on a feature object-by-feature object basis and a sample cumulative distribution function curve of n frame images of the to-be-compared video information on a feature object-by-feature object basis.

Preferably, the visual processing device identifies the similarity between the sample video information and the to-be-compared video information according to the number and weight ratio of the feature objects in the m feature objects, which are similar to each other in the sample video information and the to-be-compared video information.

According to the technical scheme of the application, n frame images are respectively collected from sample video information and to-be-compared video information used for being compared with the sample video information by utilizing an image collecting device, and m frame images of each frame image are processed by an image processing deviceSetting characteristic value for characteristic object, generating characteristic vector A according to characteristic value_1mThe visual processing device identifies whether video content similar to the sample video information exists in the to-be-compared video information according to the statistical relationship between the feature vectors of the frame images of the sample video and the to-be-compared video, so that the traditional mode of comparing the video or the image content integrally is eliminated, and a video content identification scheme with high accuracy is provided by identifying the statistical distribution condition of different features of the video.

Additional features and advantages of the present application will be described in detail in the detailed description which follows.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate an embodiment of the invention and, together with the description, serve to explain the invention. In the drawings:

FIG. 1 is a system architecture diagram of a visual recognition system in accordance with a preferred embodiment of the present application;

FIG. 2 is a schematic diagram of a sample distribution function curve according to a preferred embodiment of the present application;

FIG. 3 is a graph illustrating a comparison of sample distribution function curves according to a preferred embodiment of the present application;

fig. 4 video identification example.

Detailed Description

The technical solutions of the present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

The application provides a visual identification system which can be applied to daily life or industrial production, and whether feature vectors of to-be-compared video information and sample video information have similarity in statistical distribution or not is identified by using computer vision, so that whether video content similar to the sample video information exists in the to-be-compared video information or not is identified.

The vision recognition system of the present application includes: the device comprises an image acquisition device, an image processing device and a visual processing device. As shown in fig. 1, the image capturing device is used for respectively obtaining the sample video information and the sample video informationAcquiring n frame images by using the video information to be compared in line comparison, wherein n is a natural number which is not zero; the image processing device is used for setting characteristic values for m characteristic objects of each frame image of the sample video information and the video information to be compared so as to generate a characteristic vector A for each frame image_1mWhere m is a natural number other than zero (the so-called eigenvector can also be considered as a 1-dimensional matrix); the vision processing device identifies whether video content similar to the sample video information exists in the video information to be compared according to the statistical relationship between the feature vectors of the frame image of the sample video information and the frame image of the video information to be compared.

According to the visual identification system, compared with the traditional mode of comparing the whole content between videos or images, the video content identification method can identify the video content of the comparison video quickly and with high accuracy by identifying the statistical distribution condition of different characteristics in the video. For example, the feature object in the sample video information includes a moving bright point, and the feature value distribution conforms to the normal distribution; the same normally distributed moving bright spots also exist in the comparing video information, and even if other characteristic objects are different from the sample video information, the characteristic objects similar to the sample video information in the video information can be identified according to the statistical relationship between the characteristic vectors.

In the visual recognition system of the present application, the image capturing device may be a video frame capturing device, configured to capture n consecutive frame images in an existing video, or capture images every 1 or more frames, and capture n frame images in total, for example, to identify whether video content similar to a sample video exists in an existing to-be-compared video. The image capturing device may also be a video camera, or the like with a video frame capturing function, and when a video is recorded, images may be captured continuously or extracted at 1 or more frames at intervals, for example, in the case of an existing sample video, the recorded content is taken as a to-be-compared video at the same time of recording the video, so as to identify whether a situation similar to the video content of the sample video occurs in the recording area. Different image acquisition devices can be selected according to different application scenes and requirements.

The sample video information can be collected in advance or updated according to the recording schedule. By contrast video is meant video information to be compared with the sample video. In the technical scheme of the application, in order to identify whether video content similar to the sample video information exists in the to-be-compared video information, n frame images are respectively collected from the sample video and the to-be-compared video. Preferably, the number of frame images collected in the sample video information is the same as the number of frame images collected in the to-be-compared video, but the present application is not limited thereto, and the number of frame images collected in the sample video information may be larger or smaller.

When selecting frame images from the sample video and the to-be-compared video, the selection is preferably acquired as follows: collecting frame by frame; collecting every frame with a certain number of frames; keyframe-by-keyframe acquisition, etc. Frame-by-frame acquisition is most common, and all video content can be reserved; collecting specific frame number frame by frame, namely taking one frame every m frames, and reserving a certain number of video frames; and acquiring key frames one by one, namely only keeping the key frames in the video, and discarding all non-key frames. More sophisticated key frame acquisition algorithms, such as motion analysis, may be employed in key frame-by-key frame acquisition. The method is a key frame extraction algorithm proposed by some scholars based on the attributes of the motion characteristics of objects, and the general implementation process is as follows: and analyzing the optical flow of the object motion in the video shot, and selecting the video frame with the minimum optical flow moving frequency in the video shot as the extracted key frame each time. The formula for calculating the motion amount of the video frame by using the optical flow method is as follows:

in the formula, m (k) represents the motion amount of the k-th frame, Lx (i, j, k) represents the component of the optical flow X at the pixel point (i, j) of the k-th frame, and Ly (i, j, k) represents the component of the optical flow y at the pixel point (i, j) of the k-th frame. And after the calculation is finished, taking the local minimum value as the key frame to be extracted. The calculation formula is as follows:

the method can extract a proper amount of key frames from most video shots, and the extracted key frames can also effectively express the characteristics of video motion.

In the scheme, frame-by-frame acquisition and frame-by-frame acquisition with a specific frame number interval are preferred; when the specific frame number is collected frame by frame, the sample video and the to-be-compared video do not need to adopt the same collection frequency. In practice, however, it is preferable that the sample video and the to-be-compared video maintain the same inter-frame capture frequency.

In the technical solution of the present application, as shown in fig. 1, the image capturing device sends n frame images captured from the sample video information and the to-be-compared video information to the image processing device. The image processing device generates a respective feature vector A for each frame image (from the sample video and the to-be-compared video, respectively)_1m(or A)_m1) Specifically, m feature objects are generated for each frame image, and corresponding feature values are calculated, thereby generating respective feature vectors a for each frame image_1mAnd m is a natural number different from zero. Feature objects (features) are dimensions or criteria used to describe or define frame images, such as faces, motions, brightness, edges, and various simplified Feature results of images obtained through convolution, pooling, and the like. The number of feature objects is m, which can be selected according to different application scenarios, for example, m can be 8, 16, 32, 64, 128, 512, etc., and 512 is preferred in this application. Thus, respective feature vectors are generated for a plurality of frame images of the sample video information; and generating each feature vector for the multi-frame image of the video information to be compared.

The image processing device is used for calculating corresponding characteristic values for the m characteristic objects of each frame image, so that quantitative analysis can be carried out on each frame image. The calculation of the characteristic value can be achieved in various ways, for example, according to a pre-designed scale. Preferably, the image processing apparatus performs the calculation of the feature value of each feature object by using a convolution model process, and the calculation of the corresponding feature value is automatically performed when the convolution model generates the feature object (whereinThe convolution processing of the convolution model may be 2D convolution feature extraction, and may also adopt 3D convolution feature extraction with time sequence dimensionality added). For example, the image may be processed by a method of deep learning CNN (such as Resnet, VGG, etc.), so as to gradually extract features from a more complex image through processing of each computation layer such as a convolutional layer and a pooling layer, thereby selecting m feature objects in each frame of image and generating a respective feature vector a for each frame of image_1m. For example, as shown in table 1 below, when the image is 480 × 480 pixels, 512 feature values are extracted by the CNN method, and in this case, m =512, so that a (1 × 512) feature vector is generated for the frame image. Each frame of image has a feature vector formed by 512 feature values (X1, X2, …, X512).

TABLE 1

Preferably, according to the above-described visual recognition system, the image processing device may perform the dimension reduction calculation on the feature value of the feature object after the convolution processing in order to reduce the amount of computation by the visual processing device and further improve the recognition efficiency. The dimensionality reduction calculation can be carried out on the eigenvector A by methods such as Principal Component Analysis (PCA) and Singular Value Decomposition (SVD)_1mReducing dimension to obtain feature vector A_1mDimension reduction into a set of feature vectors A of lower dimensions_1k(for example, as shown in tables 1 and 2, 512 eigenvalues are dimension-reduced to form 200 eigenvalues), where k is a natural number less than m and not zero.

TABLE 2

The degree of influence on the frame image may be the same for m or k feature objects of each frame image, but in most cases it is different. In order to reflect the different effects between the feature objects, the feature objects of each feature vector are preferably accompanied by respective weight values. The weight value can be set according to a preset rule, and can also be solved in the process of executing dimension reduction calculation. In the scheme, taking the Principal Component Analysis (PCA) for dimensionality reduction calculation as an example, three conditions need to be known when the PCA is used for determining the weight coefficient:

1) the coefficient of the index in each principal component linear combination;

2) variance contribution rate of principal component;

3) and (4) normalization of the index weight.

For example: n principal components, m indexes, w represents the coefficient of each principal component, wij represents the coefficient of the jth index of the first principal component, and fi represents the variance contribution rate of the first principal component.

Then the weight of the qth index is:

the normalized calculation result is:

as shown in table 3 below, taking as an example that 200 feature values are formed for each frame of image after calculation, the feature objects of each feature vector newly generated after being processed according to the Principal Component Analysis (PCA) method preferably have respective weight values, so that the influence of the feature objects on the frame of image can be judged according to the weight values. According to different application scenarios, the technical scheme can be also applicable to the feature vectors which are not subjected to dimension reduction processing.

TABLE 3

The image processing apparatus in the visual recognition system provided by the present application is described in detail above. After the feature vectors of each frame of image of the sample video information and each frame of image of the to-be-compared video information are obtained, the judgment is not carried out by comparing the display content between each frame of image, but whether the feature objects similar to the sample video information exist in the to-be-compared video information is identified by comparing the statistical relationship (for example, the statistical distribution consistency of the feature objects) between the feature vectors, and then the similarity of the content is evaluated.

For the feature object 1, if the statistical distribution of the feature values of each frame image or multiple frame images of the sample video information on the feature object 1 is closer or similar to the statistical distribution of the feature values of each frame image or multiple frame images of the to-be-compared video information (for example, the distribution intervals are substantially the same), it may be determined that the sample video information and the to-be-compared video information are closer on the feature object 1. If more than half or more than 2/3 feature objects are close for m or k feature objects, an overall similarity between the sample video and the to-be-compared video can be identified. As described above, in the embodiment in which the weight value is designed, the weight value may be calculated in combination in the recognition determination process.

Preferably, as shown in table 4 below, for convenience of numerical calculation, feature vectors of n frames of images of the sample video or feature vectors after dimension reduction are combined into an analysis matrix B_nm(or B)_nk) And forming another analysis matrix by using the characteristic vectors of the n frames of images of the video to be compared or the characteristic vectors after dimension reduction, wherein X11 refers to the characteristic value of the characteristic object 1 of the 1 frame of image, and the like. According to the analysis matrix B_nmThe similarity relationship between the sample video and the video to be compared on the feature object corresponding to the column element is judged according to the statistical relationship on the statistical distribution.

TABLE 4

For the analysis matrix, each row of the analysis matrix corresponds to a feature vector of each frame image. The rows of the analysis matrix are preferably arranged in a predetermined order, for example, the rows of the analysis matrix are arranged in a time sequence of n frame images, or in an importance degree of different frame images. The columns of the analysis matrix may be arranged in a predetermined order, for example, according to the importance of the feature object, or the columns of the analysis matrix may be arranged according to the weight value of the feature object corresponding to the columns. However, the present application is not limited thereto, and may be arranged randomly.

In order to determine the similarity degree of the feature objects on the basis of the analysis matrix, the vision processing device is preferably configured to obtain a Cumulative Distribution Function (CDF) of samples of the feature values of each column element of the analysis matrix, or a Cumulative Distribution Function (CDF) of the samples. Thus, the feature values of the feature objects in the frame images from the sample video can be plotted into a desired sample distribution function curve (e.g., an expect curve of different feature objects in fig. 2. in fig. 2, the ordinate is the value of the sample cumulative distribution function (the value ranges from 0 to 1), and the abscissa is the sequence of different frames (the value in the parentheses behind the feature objects is the feature weight value), so that the function graph can accurately reflect the statistical distribution of the feature object values, and the feature values of the feature objects in the frame images from the video to be compared are plotted into another sample distribution function curve (e.g., curves other than the expect curve of different feature objects in fig. 2, each curve representing the sample distribution curve of the multi-frame images of one video to be compared in the feature object, so that the present application is not limited to one video to be compared, and can have a plurality of videos to be compared separately), by comparison, the visual processing device identifies whether the sample video information and the video information to be compared have similarity on the characteristic object according to the sample cumulative distribution function curve of the n frame images of the sample video information on the characteristic object by characteristic object and the sample cumulative distribution function curve of the n frame images of the video information to be compared on the characteristic object by characteristic object.

The determination of the agreement of the two distribution function curves can be carried out, for example, according to the (Kolmogorov-Smirnov) test. The KS test is a test method that compares one frequency distribution f (x) with the theoretical distribution g (x) or two observed distributions. The original assumption is H0 that the two data distributions are consistent or that the data conforms to a theoretical distribution. D = max | f (x) -g (x) |, rejecting H0 when the actual observation D > D (n, α), otherwise accepting the H0 hypothesis. As shown in fig. 3 (in fig. 3, the abscissa and the ordinate may have the same meaning as in fig. 2), the maximum perpendicular difference between two distribution curves is calculated for the two sample distribution curves as a D value (static D) as describing the difference between the two sets of data. This D value appears in this figure around x =1, while the D value is 0.45 (0.65-0.25). A threshold may be set for the maximum vertical difference D, and if the threshold is exceeded, the two sample distribution function curves are identified as being different or dissimilar; if the threshold is not exceeded, the two sample distribution function curves are identified as being the same or similar.

And comparing the sample distribution function curve of the multi-frame image of the sample video on each characteristic object with the sample distribution function curve of the to-be-compared video on each characteristic object, so that whether the sample video and the to-be-compared video are similar or dissimilar on the characteristic objects can be known. If more feature objects are identified as similar and fewer feature objects are identified as dissimilar, then it can be identified that the similarity between the sample video and the to-be-compared video is higher; otherwise, it can be identified that the similarity between the sample video and the to-be-compared video is low.

Preferably, the visual processing device identifies the similarity between the sample video information and the to-be-compared video information according to the number and weight proportion of the feature objects of the sample video information and the to-be-compared video information which are similar in m feature objects. For example, if more than half or 2/3 of the m feature objects are identified as similar or identical by the above determination, it can be considered that the sample video information is similar to the to-be-compared video. Or, according to different proportion situations, the similarity between the sample video and the to-be-compared video is distinguished, and the more similar or identical feature objects (and the weight thereof), the higher the similarity can be. For example, there are 200 feature objects after the dimension reduction process, 100 feature objects have the same distribution after the KS analysis, and the other 100 feature objects do not have the same distribution. And accumulating the weight values of 100 feature objects with the same distribution, and if the accumulated result exceeds a certain threshold (for example, 0.6, the threshold can be manually adjusted) which is greater than 0 and less than 1, judging that the similarity between the original video and the video to be compared is higher.

The technical solution of the present application will be exemplified by a specific example.

As shown in fig. 4, the first video is the original video, and the other two videos to be compared are respectively expected to identify the similarity with the original video, and then are respectively categorized. It is obvious that the standby video 1 is closer to the original video from the viewpoint of human beings, and the standby video 2 has not much similarity to the original video. In this task, it is desirable to have the computer perform such a determination quickly using algorithms.

In the conventional method, an algorithm generally adopts two judgment ideas: 1) respectively comparing image similarity by using the image frames in the original video and the image frames of the video to be compared; 2) and identifying the picture contents in the original video and the to-be-compared video, if the two videos have more identification labels such as 'mountain' and 'green', and the like, judging the videos to be similar, otherwise, judging the videos to be dissimilar. Obviously, both methods need to search video pictures comprehensively, are time-consuming, and have low accuracy.

In the method, firstly, the picture frames of an original video and two videos to be compared are extracted, and each video has 1000 frames in total.

The three videos are subjected to feature object extraction processing by using a 3D convolution model (3D-Resnet), and 512-dimensional feature vectors of all the picture frames are generated.

And carrying out dimensionality reduction on 512-dimensional feature vectors of the three videos by using PCA (principal component analysis), and finally obtaining 200-dimensional feature vectors with weighted values.

The feature matrices for each of the three videos are formed by using 200 feature objects as horizontal axes and 1000 frames as vertical axes, as shown in the following table.

For the feature object 1, the original video and the two videos to be compared have 1000 feature values, and as shown in the following table, three sets of cumulative function distributions (CDFs) are respectively formed.

By using the KS method, the similarity between the original video and the statistical distribution of the to-be-compared video 1 in the characteristic object 1 can be judged, and the similarity between the original video and the statistical distribution of the to-be-compared video 2 in the characteristic object 1 is not judged. Thus, the comparison of the cumulative function distribution (CDF) of all 200 feature objects for the three videos is repeated.

Finally, it is known that 163 feature objects in 200 feature objects of the original video and the video 1 to be compared have similar cumulative function distribution, the sum of the weights is 0.92, and if the sum of the weights exceeds a threshold value of 0.6, the content of the original video and the content of the video 1 to be compared can be judged to be similar; in contrast, only 82 feature objects in the original video and the to-be-compared video 2 have similar cumulative function distribution, the sum of the weights is 0.38, and if the sum of the weights is lower than the threshold of 0.6, it can be determined that the original video and the to-be-compared video 2 are not similar.

The preferred embodiments of the present application have been described in detail above, but the present application is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present application within the technical idea of the present application, and these simple modifications all belong to the protection scope of the present application.

It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described in the present application.

In addition, any combination of the various embodiments of the present application can be made, and the same shall be considered as the disclosure of the present application as long as the idea of the present application is not violated.

Claims

1. A vision recognition system, the vision recognition system comprising:

the image acquisition device is used for acquiring n frame images from sample video information and video information to be compared, which is used for comparing with the sample video information, wherein n is a natural number which is not zero;

an image processing device for setting feature values for m feature objects of each frame image of the sample video information and the video information to be compared, thereby generating a feature vector A for each frame image_1mM is a natural number not equal to zero;

a visual processing device that identifies whether video content similar to the sample video information exists in the to-be-compared video information according to a statistical relationship between feature vectors of frame images of the sample video information and frame images of the to-be-compared video information,

the image processing device realizes the calculation of the characteristic value of each characteristic object by utilizing convolution model processing, and performs dimension reduction calculation on the characteristic value of the characteristic object after the convolution processing, so as to obtain the characteristic vector A of each frame image_1mDimension reduction as feature vector A_1kK is a natural number which is less than m and not zero, and the eigenvector A after the dimensionality reduction calculation processing_1kEach feature object of (a) has a respective weight value,

wherein, the visual processing device is used for respectively forming the feature vector of each frame image of the sample video information and the feature vector of each frame image of the video information to be compared into an analysis matrix B_nmAnd identifying whether the sample video information and the to-be-compared video information have similarity on the characteristic object according to the sample cumulative distribution function curve of the n frame images of the sample video information on the characteristic object by characteristic object and the sample cumulative distribution function curve of the n frame images of the to-be-compared video information on the characteristic object by characteristic object.

2. The visual recognition system of claim 1, wherein the number of frame images acquired from the sample video information and the number of frame images acquired from the to-be-compared video information are equal or unequal.

3. The visual recognition system of claim 1,

wherein each row of the analysis matrix corresponds to a feature vector of each frame image, wherein the rows of the analysis matrix are arranged according to the time sequence of the n frame images, and/or

And arranging each column of the analysis matrix according to the weight value of the characteristic object corresponding to the column.

4. The vision recognition system of claim 1, wherein the vision processing device identifies the similarity between the sample video information and the to-be-compared video information according to a number and weight proportion of feature objects in m feature objects that the sample video information and the to-be-compared video information are similar to each other.