CN114267001A - Visual recognition system - Google Patents

Visual recognition system Download PDF

Info

Publication number
CN114267001A
CN114267001A CN202210189631.7A CN202210189631A CN114267001A CN 114267001 A CN114267001 A CN 114267001A CN 202210189631 A CN202210189631 A CN 202210189631A CN 114267001 A CN114267001 A CN 114267001A
Authority
CN
China
Prior art keywords
video information
feature
compared
sample
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210189631.7A
Other languages
Chinese (zh)
Other versions
CN114267001B (en
Inventor
苗炜
李东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huayu Qizhi Technology Co ltd
Original Assignee
Beijing Huayu Qizhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huayu Qizhi Technology Co ltd filed Critical Beijing Huayu Qizhi Technology Co ltd
Priority to CN202210189631.7A priority Critical patent/CN114267001B/en
Publication of CN114267001A publication Critical patent/CN114267001A/en
Application granted granted Critical
Publication of CN114267001B publication Critical patent/CN114267001B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application relates to the field of visual identification, and specifically discloses a visual identification system, and the visual identification system comprises: the image acquisition device is used for acquiring n frame images from sample video information and video information to be compared, which is used for comparing with the sample video information, wherein n is a natural number which is not zero; an image processing device for setting feature values for m feature objects of each frame image of the sample video information and the video information to be compared, thereby generating a feature vector A for each frame image1mM is a natural number not equal to zero; visual processing device, the visual processing deviceAnd identifying whether video content similar to the sample video information exists in the video information to be compared according to the statistical relationship between the feature vectors of the frame images of the sample video information and the frame images of the video information to be compared. According to the video content identification method and device, the video content identification scheme with high accuracy is provided by identifying the statistical distribution condition of different characteristics of the video.

Description

Visual recognition system
Technical Field
The present invention relates to the field of visual recognition, and more particularly, to a visual recognition system for determining whether a video has similar video information.
Background
As society develops into the information age, the efficiency and accuracy of manual processing work are gradually difficult to meet the requirements. For example, when a large amount of videos need to be processed, conventionally, a lot of time is often spent in manually performing tasks such as identifying similar content videos, tagging videos, identifying similar content in the same video, or editing the same content to save storage space, and accuracy cannot be guaranteed with limited effort.
Although a scheme of judging video similarity using computer vision has been proposed. However, the existing video identification method generally adopts the idea of finding out similarity of images, and compares the similarity of image pixel distribution of different video frames by identifying the content of the images in the video. Although the method improves certain working efficiency compared with a manual method, the method is substantially limited to image contrast, and the overall similarity of the video is difficult to accurately judge only through the contrast of image pixels. For example, after a video is secondarily edited, a new video may only retain partial features compared with an original video, and in this case, although the video expression content has similarity, the overall distribution of image pixels of video frames is significantly different, which results in that the conventional method is difficult to effectively identify and has low efficiency.
Therefore, how to provide a video recognition scheme with high efficiency and high accuracy becomes a technical problem to be solved in the field.
Disclosure of Invention
In view of the above, the present application provides a visual recognition system to provide a video recognition scheme with high efficiency and high accuracy.
According to the present application there is provided a vision recognition system comprising: an image capturing device for capturing n frame images from the sample video information and the to-be-compared video information for comparison with the sample video information, respectively,n is a natural number not equal to zero; an image processing device for setting feature values for m feature objects of each frame image of the sample video information and the video information to be compared, thereby generating a feature vector A for each frame image1mM is a natural number not equal to zero; the visual processing device identifies whether video content similar to the sample video information exists in the video information to be compared according to the statistical relationship between the feature vectors of the frame images of the sample video information and the frame images of the video information to be compared.
Preferably, the image processing apparatus implements the calculation of the feature value of each feature object by using convolution model processing.
Preferably, the image processing apparatus further performs a dimension reduction calculation on the feature values of the feature object after the convolution processing, so as to obtain the feature vector a of each frame image1mDimension reduction as feature vector A1kAnd k is a natural number which is less than m and not zero.
Preferably, the feature vector A after the dimension reduction calculation processing1kEach feature object of (a) has a respective weight value.
Preferably, the number of frame images collected from the sample video information is equal to or different from the number of frame images collected from the to-be-compared video information.
Preferably, the visual processing device is configured to respectively combine the feature vector of each frame image of the sample video information and the feature vector of each frame image of the to-be-compared video information into an analysis matrix BnmAccording to the analysis matrix BnmThe similarity relation between the sample video information and the video information to be compared on the characteristic object corresponding to the column element is judged according to the statistical relation of the corresponding column element.
Preferably, each row of the analysis matrix corresponds to a feature vector of each frame image, columns of the analysis matrix are arranged according to the time sequence of the n frame images, and/or columns of the analysis matrix are arranged according to a weight value of a feature object corresponding to the column.
Preferably, the visual processing means is adapted to obtain a sample cumulative distribution function curve of the eigenvalues of each column of elements of the analysis matrix.
Preferably, the visual processing device identifies whether the sample video information and the to-be-compared video information have similarity on the feature object according to a sample cumulative distribution function curve of n frame images of the sample video information on a feature object-by-feature object basis and a sample cumulative distribution function curve of n frame images of the to-be-compared video information on a feature object-by-feature object basis.
Preferably, the visual processing device identifies the similarity between the sample video information and the to-be-compared video information according to the number and weight ratio of the feature objects in the m feature objects, which are similar to each other in the sample video information and the to-be-compared video information.
According to the technical scheme of the application, an image acquisition device is used for acquiring n frame images from sample video information and to-be-compared video information used for being compared with the sample video information, an image processing device is used for setting characteristic values for m characteristic objects of each frame image, and a characteristic vector A is generated according to the characteristic values1mThe visual processing device identifies whether video content similar to the sample video information exists in the to-be-compared video information according to the statistical relationship between the feature vectors of the frame images of the sample video and the to-be-compared video, so that the traditional mode of comparing the video or the image content integrally is eliminated, and a video content identification scheme with high accuracy is provided by identifying the statistical distribution condition of different features of the video.
Additional features and advantages of the present application will be described in detail in the detailed description which follows.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate an embodiment of the invention and, together with the description, serve to explain the invention. In the drawings:
FIG. 1 is a system architecture diagram of a visual recognition system in accordance with a preferred embodiment of the present application;
FIG. 2 is a schematic diagram of a sample distribution function curve according to a preferred embodiment of the present application;
FIG. 3 is a graph illustrating a comparison of sample distribution function curves according to a preferred embodiment of the present application;
fig. 4 video recognition example.
Detailed Description
The technical solutions of the present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
The application provides a visual identification system which can be applied to daily life or industrial production, and whether feature vectors of to-be-compared video information and sample video information have similarity in statistical distribution or not is identified by using computer vision, so that whether video content similar to the sample video information exists in the to-be-compared video information or not is identified.
The vision recognition system of the present application includes: the device comprises an image acquisition device, an image processing device and a visual processing device. As shown in fig. 1, the image capturing device is configured to capture n frame images from sample video information and to-be-compared video information for comparison with the sample video information, respectively, where n is a natural number different from zero; the image processing device is used for setting characteristic values for m characteristic objects of each frame image of the sample video information and the video information to be compared so as to generate a characteristic vector A for each frame image1mWhere m is a natural number other than zero (the so-called eigenvector can also be considered as a 1-dimensional matrix); the vision processing device identifies whether video content similar to the sample video information exists in the video information to be compared according to the statistical relationship between the feature vectors of the frame images of the sample video information and the frame images of the video information to be compared.
According to the visual identification system, compared with the traditional mode of comparing the whole content between videos or images, the video content identification method can identify the video content of the comparison video quickly and with high accuracy by identifying the statistical distribution condition of different characteristics in the video. For example, the feature object in the sample video information includes a moving bright point, and the feature value distribution conforms to the normal distribution; the same normally distributed moving bright spots also exist in the comparing video information, and even if other characteristic objects are different from the sample video information, the characteristic objects similar to the sample video information in the video information can be identified according to the statistical relationship between the characteristic vectors.
In the visual recognition system of the present application, the image capturing device may be a video frame capturing device, configured to capture n consecutive frame images in an existing video, or capture images every 1 or more frames, and capture n frame images in total, for example, to identify whether video content similar to a sample video exists in an existing to-be-compared video. The image capturing device may also be a video camera, or the like with a video frame capturing function, and when a video is recorded, images may be captured continuously or extracted at 1 or more frames at intervals, for example, in the case of an existing sample video, the recorded content is taken as a to-be-compared video at the same time of recording the video, so as to identify whether a situation similar to the video content of the sample video occurs in the recording area. Different image acquisition devices can be selected according to different application scenes and requirements.
The sample video information can be collected in advance or updated according to the recording schedule. By contrast video is meant video information to be compared with the sample video. In the technical scheme of the application, in order to identify whether video content similar to the sample video information exists in the to-be-compared video information, n frame images are respectively collected from the sample video and the to-be-compared video. Preferably, the number of frame images collected in the sample video information is the same as the number of frame images collected in the to-be-compared video, but the present application is not limited thereto, and the number of frame images collected in the sample video information may be larger or smaller.
When selecting frame images from the sample video and the to-be-compared video, the selection is preferably acquired as follows: collecting frame by frame; collecting every frame with a certain number of frames; keyframe-by-keyframe acquisition, etc. Frame-by-frame acquisition is most common, and all video content can be reserved; collecting specific frame number frame by frame, namely taking one frame every m frames, and reserving a certain number of video frames; and acquiring key frames one by one, namely only keeping the key frames in the video, and discarding all non-key frames. More sophisticated key frame acquisition algorithms, such as motion analysis, may be employed in key frame-by-key frame acquisition. The method is a key frame extraction algorithm proposed by some scholars based on the attributes of the motion characteristics of objects, and the general implementation process is as follows: and analyzing the optical flow of the object motion in the video shot, and selecting the video frame with the minimum optical flow moving frequency in the video shot as the extracted key frame each time. The formula for calculating the motion amount of the video frame by using the optical flow method is as follows:
Figure 715237DEST_PATH_IMAGE001
in the formula, m (k) represents the motion amount of the k-th frame, Lx (i, j, k) represents the component of the optical flow X at the pixel point (i, j) of the k-th frame, and Ly (i, j, k) represents the component of the optical flow y at the pixel point (i, j) of the k-th frame. And after the calculation is finished, taking the local minimum value as the key frame to be extracted. The calculation formula is as follows:
Figure 570060DEST_PATH_IMAGE002
the method can extract a proper amount of key frames from most video shots, and the extracted key frames can also effectively express the characteristics of video motion.
In the scheme, frame-by-frame acquisition and frame-by-frame acquisition with a specific frame number interval are preferred; when the specific frame number is collected frame by frame, the sample video and the to-be-compared video do not need to adopt the same collection frequency. In practice, however, it is preferable that the sample video and the to-be-compared video maintain the same inter-frame capture frequency.
In the technical solution of the present application, as shown in fig. 1, the image capturing device sends n frame images captured from the sample video information and the to-be-compared video information to the image processing device. The image processing device generates a respective feature vector A for each frame image (from the sample video and the to-be-compared video, respectively)1m(or A)m1) Specifically, m feature objects are generated for each frame image, and corresponding feature values are calculated, thereby generating respective feature vectors a for each frame image1mAnd m is a natural number different from zero. Feature objects (features) are dimensions or criteria used to describe or define frame images, such as faces, motions, brightness, edges, and various simplified Feature results of images obtained through convolution, pooling, and the like. The number of feature objects is m, which can be selected according to different application scenarios, for example, m can be 8, 16, 32, 64, 128, 512, etc., and 512 is preferred in this application. Thus, respective feature vectors are generated for a plurality of frame images of the sample video information; and generating each feature vector for the multi-frame image of the video information to be compared.
The image processing device is used for calculating corresponding characteristic values for the m characteristic objects of each frame image, so that quantitative analysis can be carried out on each frame image. The calculation of the characteristic value can be achieved in various ways, for example, according to a pre-designed scale. Preferably, the image processing apparatus implements the calculation of the feature value of each feature object by using convolution model processing, and the calculation of the corresponding feature value is automatically completed when the convolution model generates the feature object (wherein the convolution processing of the convolution model may be 2D convolution feature extraction, and may also adopt 3D convolution feature extraction with an increased time sequence dimension). For example, the image may be processed by a method of deep learning CNN (such as Resnet, VGG, etc.), so as to gradually extract features from a more complex image through processing of each computation layer such as a convolutional layer and a pooling layer, thereby selecting m feature objects in each frame of image and generating a respective feature vector a for each frame of image1m. For example, as shown in table 1 below, when the image is 480 × 480 pixels, 512 feature values are extracted by the CNN method, and in this case, m =512, so that a (1 × 512) feature vector is generated for the frame image. Each frame image has a feature vector formed by 512 feature values (X1, X2, …, X512).
TABLE 1
Figure 536748DEST_PATH_IMAGE003
Preferably, according to the above vision recognition system, the amount of computation by the vision processing device is further increased to reduce the amount of computation by the vision processing deviceAnd the image processing device can perform dimension reduction calculation on the feature values of the feature objects after convolution processing. The dimensionality reduction calculation can be carried out on the eigenvector A by methods such as Principal Component Analysis (PCA) and Singular Value Decomposition (SVD)1mReducing dimension to obtain feature vector A1mDimension reduction into a set of feature vectors A of lower dimensions1k(for example, as shown in tables 1 and 2, 512 eigenvalues are dimension-reduced to form 200 eigenvalues), where k is a natural number less than m and not zero.
TABLE 2
Figure 604061DEST_PATH_IMAGE004
The degree of influence on the frame image may be the same for m or k feature objects of each frame image, but in most cases it is different. In order to reflect the different effects between the feature objects, the feature objects of each feature vector are preferably accompanied by respective weight values. The weight value can be set according to a preset rule, and can also be solved in the process of executing dimension reduction calculation. In the scheme, taking the Principal Component Analysis (PCA) for dimensionality reduction calculation as an example, three conditions need to be known when the PCA is used for determining the weight coefficient:
1) the coefficient of the index in each principal component linear combination;
2) variance contribution rate of principal component;
3) and (4) normalization of the index weight.
For example: n principal components, m indexes, w represents the coefficient of each principal component, wij represents the coefficient of the jth index of the first principal component, and fi represents the variance contribution rate of the first principal component.
Then the weight of the qth index is:
Figure 887275DEST_PATH_IMAGE005
the normalized calculation result is:
Figure 866995DEST_PATH_IMAGE006
as shown in table 3 below, taking as an example that 200 feature values are formed for each frame of image after calculation, the feature objects of each feature vector newly generated after being processed according to the Principal Component Analysis (PCA) method preferably have respective weight values, so that the influence of the feature objects on the frame of image can be judged according to the weight values. According to different application scenarios, the technical scheme can be also applicable to the feature vectors which are not subjected to dimension reduction processing.
TABLE 3
Figure 320979DEST_PATH_IMAGE007
The image processing apparatus in the visual recognition system provided by the present application is described in detail above. After the feature vectors of each frame of image of the sample video information and each frame of image of the to-be-compared video information are obtained, the judgment is not carried out by comparing the display content between each frame of image, but whether the feature objects similar to the sample video information exist in the to-be-compared video information is identified by comparing the statistical relationship (for example, the statistical distribution consistency of the feature objects) between the feature vectors, and then the similarity of the content is evaluated.
For the feature object 1, if the statistical distribution of the feature values of each frame image or multiple frame images of the sample video information on the feature object 1 is closer or similar to the statistical distribution of the feature values of each frame image or multiple frame images of the to-be-compared video information (for example, the distribution intervals are substantially the same), it may be determined that the sample video information and the to-be-compared video information are closer on the feature object 1. If more than half or more than 2/3 feature objects are close for m or k feature objects, an overall similarity between the sample video and the to-be-compared video can be identified. As described above, in the embodiment in which the weight value is designed, the weight value may be calculated in combination in the recognition determination process.
Preferably, as shown in Table 4 below, in order toThe method is convenient for numerical calculation, and the characteristic vectors of n frames of images of the sample video or the characteristic vectors after dimension reduction form an analysis matrix Bnm(or B)nk) And forming another analysis matrix by using the feature vectors of the n frames of images of the video to be compared or the feature vectors after dimension reduction, wherein X11 refers to the feature value of the feature object 1 of the 1 frame of image, and the like. According to the analysis matrix BnmThe similarity relationship between the sample video and the video to be compared on the feature object corresponding to the column element is judged according to the statistical relationship on the statistical distribution.
TABLE 4
Figure 988720DEST_PATH_IMAGE008
For the analysis matrix, each row of the analysis matrix corresponds to a feature vector of each frame image. The rows of the analysis matrix are preferably arranged in a predetermined order, for example, the rows of the analysis matrix are arranged in a time sequence of n frame images, or in an importance degree of different frame images. The columns of the analysis matrix may be arranged in a predetermined order, for example, according to the importance of the feature object, or the columns of the analysis matrix may be arranged according to the weight value of the feature object corresponding to the columns. However, the present application is not limited thereto, and may be arranged randomly.
In order to determine the similarity degree of the feature objects on the basis of the analysis matrix, the vision processing device is preferably configured to obtain a Cumulative Distribution Function (CDF) of samples of the feature values of each column element of the analysis matrix, or a Cumulative Distribution Function (CDF) of the samples. Thus, the feature values of the feature objects in the frame images from the sample video can be plotted into a desired sample distribution function curve (e.g., an expect curve of different feature objects in fig. 2. in fig. 2, the ordinate is the value of the sample cumulative distribution function (the value ranges from 0 to 1), and the abscissa is the sequence of different frames (the value in the parentheses behind the feature objects is the feature weight value), so that the function graph can accurately reflect the statistical distribution of the feature object values, and the feature values of the feature objects in the frame images from the video to be compared are plotted into another sample distribution function curve (e.g., curves other than the expect curve of different feature objects in fig. 2, each curve representing the sample distribution curve of the multi-frame images of one video to be compared in the feature object, so that the present application is not limited to one video to be compared, and can have a plurality of videos to be compared separately), by comparison, the visual processing device identifies whether the sample video information and the to-be-compared video information have similarity on the characteristic object according to a sample cumulative distribution function curve of the n frame images of the sample video information on the characteristic object by characteristic object and a cumulative sample distribution function curve of the n frame images of the to-be-compared video information on the characteristic object by characteristic object.
The determination of the agreement of the two distribution function curves can be carried out, for example, according to the (Kolmogorov-Smirnov) test. The KS test is a test method that compares one frequency distribution f (x) with the theoretical distribution g (x) or two observed distributions. The original assumption is H0 that the two data distributions are consistent or that the data conforms to a theoretical distribution. D = max | f (x) -g (x) |, rejecting H0 when the actual observation D > D (n, α), otherwise accepting the H0 hypothesis. As shown in fig. 3 (in fig. 3, the abscissa and the ordinate may have the same meaning as in fig. 2), the maximum perpendicular difference between two distribution curves is calculated for the two sample distribution curves as a D value (static D) as describing the difference between the two sets of data. This D value appears in this figure around x =1, while the D value is 0.45 (0.65-0.25). A threshold may be set for the maximum vertical difference D, and if the threshold is exceeded, the two sample distribution function curves are identified as being different or dissimilar; if the threshold is not exceeded, the two sample distribution function curves are identified as being the same or similar.
And comparing the sample distribution function curve of the multi-frame image of the sample video on each characteristic object with the sample distribution function curve of the to-be-compared video on each characteristic object, so that whether the sample video and the to-be-compared video are similar or dissimilar on the characteristic objects can be known. If more feature objects are identified as similar and fewer feature objects are identified as dissimilar, then a lower degree of similarity between the sample video and the to-be-compared video may be identified; otherwise, it can be identified that the similarity between the sample video and the to-be-compared video is high.
Preferably, the visual processing device identifies the similarity between the sample video information and the to-be-compared video information according to the number and weight proportion of the feature objects of the sample video information and the to-be-compared video information which are similar in m feature objects. For example, if more than half or 2/3 of the m feature objects are identified as similar or identical by the above determination, it can be considered that the sample video information is similar to the to-be-compared video. Or, according to different proportion situations, the similarity between the sample video and the to-be-compared video is distinguished, and the more similar or identical feature objects (and the weight thereof), the higher the similarity can be. For example, there are 200 feature objects after the dimension reduction process, 100 feature objects have the same distribution after the KS analysis, and the other 100 feature objects do not have the same distribution. And accumulating the weight values of 100 feature objects with the same distribution, and if the accumulated result exceeds a certain threshold (for example, 0.6, the threshold can be manually adjusted) which is greater than 0 and less than 1, judging that the similarity between the original video and the video to be compared is higher.
The technical solution of the present application will be exemplified by a specific example.
As shown in fig. 4, the first video is the original video, and the other two videos to be compared are respectively expected to identify the similarity with the original video, and then are respectively categorized. It is obvious that the standby video 1 is closer to the original video from the viewpoint of human beings, and the standby video 2 has not much similarity to the original video. In this task, it is desirable to have the computer perform such a determination quickly using algorithms.
In the traditional method, an algorithm generally adopts two judgment ideas: 1) respectively comparing image similarity by using the image frames in the original video and the image frames of the video to be compared; 2) and identifying the picture contents in the original video and the to-be-compared video, if the two videos have more identification labels such as 'mountain' and 'green', and the like, judging the videos to be similar, otherwise, judging the videos to be dissimilar. Obviously, both methods need to search video pictures comprehensively, are time-consuming, and have low accuracy.
In the method, firstly, the picture frames of an original video and two videos to be compared are extracted, and each video has 1000 frames in total.
The three videos are subjected to feature object extraction processing by using a 3D convolution model (3D-Resnet), and 512-dimensional feature vectors of all the picture frames are generated.
And carrying out dimensionality reduction on 512-dimensional feature vectors of the three videos by using PCA (principal component analysis), and finally obtaining 200-dimensional feature vectors with weighted values.
The feature matrices for each of the three videos are formed by using 200 feature objects as horizontal axes and 1000 frames as vertical axes, as shown in the following table.
Figure 392020DEST_PATH_IMAGE009
For the feature object 1, the original video and the two videos to be compared have 1000 feature values, and as shown in the following table, three sets of cumulative function distributions (CDFs) are respectively formed.
Figure 805290DEST_PATH_IMAGE010
By using the KS method, the similarity between the original video and the statistical distribution of the to-be-compared video 1 in the characteristic object 1 can be judged, and the similarity between the original video and the statistical distribution of the to-be-compared video 2 in the characteristic object 1 is not judged. Thus, the comparison of the cumulative function distribution (CDF) of all 200 feature objects for the three videos is repeated.
Finally, it is known that 163 feature objects in 200 feature objects of the original video and the video 1 to be compared have similar cumulative function distribution, the sum of the weights is 0.92, and if the sum of the weights exceeds a threshold value of 0.6, the content of the original video and the content of the video 1 to be compared can be judged to be similar; in contrast, only 82 feature objects in the original video and the to-be-compared video 2 have similar cumulative function distribution, the sum of the weights is 0.38, and if the sum of the weights is lower than the threshold of 0.6, it can be determined that the original video and the to-be-compared video 2 are not similar.
The preferred embodiments of the present application have been described in detail above, but the present application is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present application within the technical idea of the present application, and these simple modifications all belong to the protection scope of the present application.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described in the present application.
In addition, any combination of the various embodiments of the present application is also possible, and the same should be considered as disclosed in the present application as long as it does not depart from the idea of the present application.

Claims (10)

1. A vision recognition system, the vision recognition system comprising:
the image acquisition device is used for acquiring n frame images from sample video information and video information to be compared, which is used for comparing with the sample video information, wherein n is a natural number which is not zero;
an image processing device for setting feature values for m feature objects of each frame image of the sample video information and the video information to be compared, thereby generating a feature vector A for each frame image1mM is a natural number not equal to zero;
the visual processing device identifies whether video content similar to the sample video information exists in the video information to be compared according to the statistical relationship between the feature vectors of the frame images of the sample video information and the frame images of the video information to be compared.
2. The vision recognition system of claim 1, wherein the image processing device implements the calculation of the feature value of each feature object using a convolution model process.
3. The vision recognition system of claim 2, wherein the image processing device further performs a dimensionality reduction calculation on the feature values of the feature objects subjected to the convolution processing, so as to obtain the feature vector a of each frame image1mDimension reduction as feature vector A1kAnd k is a natural number which is less than m and not zero.
4. The visual recognition system of claim 3, wherein the dimension-reduced computed processed feature vector A1kEach feature object of (a) has a respective weight value.
5. The visual recognition system of claim 1, wherein the number of frame images acquired from the sample video information and the number of frame images acquired from the to-be-compared video information are equal or unequal.
6. The vision recognition system of claim 4, wherein the vision processing device is configured to combine the feature vector of each frame image of the sample video information and the feature vector of each frame image of the to-be-compared video information into an analysis matrix BnmAccording to the analysis matrix BnmThe similarity relation between the sample video information and the video information to be compared on the characteristic object corresponding to the column element is judged according to the statistical relation of the corresponding column element.
7. The visual recognition system of claim 6,
wherein each row of the analysis matrix corresponds to a feature vector of each frame image, wherein the rows of the analysis matrix are arranged according to the time sequence of the n frame images, and/or
And arranging each column of the analysis matrix according to the weight value of the characteristic object corresponding to the column.
8. The vision recognition system of claim 6, wherein the vision processing device is configured to obtain a sample cumulative distribution function curve of eigenvalues for each column element of the analysis matrix.
9. The vision recognition system of claim 8, wherein the vision processing device identifies whether the sample video information and the to-be-compared video information have similarity on the feature object according to a sample cumulative distribution function curve of n frame images of the sample video information on a feature object-by-feature object basis and a cumulative sample distribution function curve of n frame images of the to-be-compared video information on a feature object-by-feature object basis.
10. The vision recognition system of claim 9, wherein the vision processing device identifies the similarity between the sample video information and the to-be-compared video information according to a number and weight proportion of feature objects in m feature objects that the sample video information and the to-be-compared video information are similar to each other.
CN202210189631.7A 2022-03-01 2022-03-01 Visual recognition system Active CN114267001B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210189631.7A CN114267001B (en) 2022-03-01 2022-03-01 Visual recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210189631.7A CN114267001B (en) 2022-03-01 2022-03-01 Visual recognition system

Publications (2)

Publication Number Publication Date
CN114267001A true CN114267001A (en) 2022-04-01
CN114267001B CN114267001B (en) 2022-06-03

Family

ID=80833849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210189631.7A Active CN114267001B (en) 2022-03-01 2022-03-01 Visual recognition system

Country Status (1)

Country Link
CN (1) CN114267001B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201405045D0 (en) * 2014-03-21 2014-05-07 Secr Defence Recognition of objects within a video
CN109189991A (en) * 2018-08-17 2019-01-11 百度在线网络技术(北京)有限公司 Repeat video frequency identifying method, device, terminal and computer readable storage medium
CN110569384A (en) * 2019-09-09 2019-12-13 深圳市乐福衡器有限公司 AI scanning method
CN111383201A (en) * 2018-12-29 2020-07-07 深圳Tcl新技术有限公司 Scene-based image processing method and device, intelligent terminal and storage medium
CN112395457A (en) * 2020-12-11 2021-02-23 中国搜索信息科技股份有限公司 Video to-be-retrieved positioning method applied to video copyright protection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201405045D0 (en) * 2014-03-21 2014-05-07 Secr Defence Recognition of objects within a video
CN109189991A (en) * 2018-08-17 2019-01-11 百度在线网络技术(北京)有限公司 Repeat video frequency identifying method, device, terminal and computer readable storage medium
CN111383201A (en) * 2018-12-29 2020-07-07 深圳Tcl新技术有限公司 Scene-based image processing method and device, intelligent terminal and storage medium
CN110569384A (en) * 2019-09-09 2019-12-13 深圳市乐福衡器有限公司 AI scanning method
CN112395457A (en) * 2020-12-11 2021-02-23 中国搜索信息科技股份有限公司 Video to-be-retrieved positioning method applied to video copyright protection

Also Published As

Publication number Publication date
CN114267001B (en) 2022-06-03

Similar Documents

Publication Publication Date Title
CN111310731B (en) Video recommendation method, device, equipment and storage medium based on artificial intelligence
CN108304820B (en) Face detection method and device and terminal equipment
JP5213486B2 (en) Object tracking device and object tracking method
CN113112519B (en) Key frame screening method based on interested target distribution
JP2004199669A (en) Face detection
CN104778481A (en) Method and device for creating sample library for large-scale face mode analysis
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN112734747B (en) Target detection method and device, electronic equipment and storage medium
CN111488943A (en) Face recognition method and device
CN106529441A (en) Fuzzy boundary fragmentation-based depth motion map human body action recognition method
CN112001280B (en) Real-time and online optimized face recognition system and method
CN103426005B (en) Automatic database creating video sectioning method for automatic recognition of micro-expressions
CN111881803A (en) Livestock face recognition method based on improved YOLOv3
JP4369308B2 (en) Representative image selection device, representative image selection method, and representative image selection program
CN114267001B (en) Visual recognition system
CN111860368A (en) Pedestrian re-identification method, device, equipment and storage medium
CN116721288A (en) Helmet detection method and system based on YOLOv5
CN111222473A (en) Analysis and recognition method for clustering faces in video
CN115527147A (en) Multi-mode target re-recognition method
CN111950586B (en) Target detection method for introducing bidirectional attention
CN112580442B (en) Behavior identification method based on multi-dimensional pyramid hierarchical model
JP4449483B2 (en) Image analysis apparatus, image analysis method, and computer program
CN118470577B (en) Inspection scene identification method and system based on big data
CN118570708B (en) Intelligent video analysis method and system
CN118587444B (en) Digital media data segmentation method, system, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant