WO2021027325A1 - 视频相似度获取方法、装置、计算机设备及存储介质 - Google Patents
视频相似度获取方法、装置、计算机设备及存储介质 Download PDFInfo
- Publication number
- WO2021027325A1 WO2021027325A1 PCT/CN2020/087033 CN2020087033W WO2021027325A1 WO 2021027325 A1 WO2021027325 A1 WO 2021027325A1 CN 2020087033 W CN2020087033 W CN 2020087033W WO 2021027325 A1 WO2021027325 A1 WO 2021027325A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- picture
- compared
- feature vector
- target picture
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to a method, device, computer equipment and storage medium for acquiring video similarity.
- the embodiments of the present application provide a method, device, computer equipment and storage medium for obtaining video similarity, aiming to solve the problem that in the prior art, when judging the similarity of two videos, the user compares the similarity between the two with the naked eye. , It is difficult to accurately obtain the performer’s micro-expression for accurate video similarity judgment.
- an embodiment of the present application provides a method for acquiring video similarity, which includes:
- the average similarity between the video to be compared and the original video is sent to the uploader.
- an embodiment of the present application provides a video similarity acquisition device, which includes:
- the video receiving unit is configured to receive the video to be compared uploaded by the uploader, and obtain the original video corresponding to the video to be compared;
- a video splitting unit configured to perform video splitting on both the video to be compared and the original video to obtain a set of pictures to be compared corresponding to the video to be compared, and an original picture set corresponding to the original video;
- a first picture set obtaining unit configured to preprocess the video to be compared by an optical flow method to obtain a first target picture set corresponding to the video to be compared;
- a second picture set acquiring unit configured to preprocess the original video by an optical flow method to obtain a second target picture set corresponding to the original video
- the first extraction unit is configured to obtain a target picture feature vector corresponding to each target picture in the first target picture set through a convolutional neural network to form a first picture feature vector set;
- the second extraction unit is configured to obtain a target picture feature vector corresponding to each target picture in the second target picture set through a convolutional neural network to form a second picture feature vector set;
- the average similarity obtaining unit is used to obtain the similarity between each picture feature vector in the first picture feature vector set and the corresponding picture feature vector in the second picture feature vector set to obtain the video to be compared and the corresponding picture feature vector.
- the average similarity sending unit is configured to send the average similarity between the video to be compared and the original video to the uploader.
- an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer
- the program implements the video similarity acquisition method described in the first aspect.
- the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the above-mentioned The video similarity acquisition method described on the one hand.
- the embodiments of the present application provide a method, device, computer equipment, and storage medium for acquiring video similarity. This method realizes the combination of micro-expression recognition technology and video image comparison to determine the similarity between videos, and improves the recognition accuracy of the similarity between videos.
- FIG. 1 is a schematic diagram of an application scenario of a video similarity acquisition method provided by an embodiment of the application
- FIG. 2 is a schematic flowchart of a method for acquiring video similarity provided by an embodiment of the application
- FIG. 3 is a schematic diagram of a sub-flow of a method for acquiring a video similarity provided by an embodiment of the application;
- FIG. 4 is a schematic diagram of another sub-flow of the method for obtaining video similarity provided by an embodiment of the application.
- FIG. 5 is a schematic block diagram of a video similarity acquisition device provided by an embodiment of the application.
- FIG. 6 is a schematic block diagram of subunits of a video similarity acquisition device provided by an embodiment of the application.
- FIG. 7 is a schematic block diagram of another subunit of the video similarity acquisition device provided by an embodiment of the application.
- FIG. 8 is a schematic block diagram of a computer device provided by an embodiment of the application.
- FIG. 1 is a schematic diagram of an application scenario of a video similarity acquisition method provided by an embodiment of this application
- FIG. 2 is a schematic flowchart of a video similarity acquisition method provided by an embodiment of this application. The method is applied to a server, and the method is executed by application software installed in the server.
- the method includes steps S110 to S180.
- S110 Receive a video to be compared uploaded by the uploader, and obtain an original video corresponding to the video to be compared.
- the uploading terminal is a smart terminal used by the user, such as a smart phone, a tablet computer, etc.
- the video to be compared stored in the server
- the similarity of the corresponding original video can be that the user first selects a certain original video in the server, and then obtains the storage path of the original video in the server.
- the video to be compared is obtained After uploading to the cache area in the server, the original video is obtained according to the storage path of the original video, and then a series of video data processing is performed to analyze the similarity between the two videos.
- obtaining the original video corresponding to the video to be compared in step S110 includes:
- the user when the uploader uploads the video to be compared to the server, the user first selects the original video on the user interaction interface of the uploader (the user interaction interface is the communication medium between the server and the uploader), and then selects the original video.
- the server After determining the original video, the server can determine the storage path of the original video, the server sends the storage path to the uploader, and the default storage path of the video to be compared obtained by the uploader in the server is the original video storage path.
- the server can obtain the corresponding original video according to the default storage path of the video to be compared in the server.
- the user selects the original video to be compared with before uploading the video to be compared, and stores the video to be compared and the original video in the same storage path on the server, facilitating quick comparison of data in the same data partition , No need to migrate data across regions.
- S130 Preprocess the video to be compared by an optical flow method to obtain a first target picture set corresponding to the video to be compared.
- micro-expression analysis may be performed by the optical flow method to obtain the first target picture set corresponding to the video to be compared.
- step S130 includes:
- the scene of the object forms a series of continuously changing images on the retina of the human eye, and this series of continuously changing information constantly "flows through” the retina (that is, the image plane) , Seems to be a kind of light "flow", so it is called optical flow.
- the optical flow expresses the change of the image, contains the information of the target movement, and can be used to determine the target's movement.
- the three elements of optical flow one is the motion velocity field, which is a necessary condition for the formation of optical flow; the second is the part with optical characteristics such as gray-scale pixels, which can carry motion information; the third is the imaging projection from the scene to the The image plane can thus be observed.
- optical flow is based on points. Specifically, let (u, v) be the optical flow of image point (x, y), then (x, y, u, v) is called optical flow point.
- the collection of all optical flow points is called the optical flow field.
- a corresponding image motion field, or image velocity field is formed on the image plane.
- the optical flow field corresponds to the sports field.
- the image can be dynamically analyzed. If there is no moving target in the image, the optical flow vector changes continuously throughout the image area. When there are moving objects in the image (when the user has micro expressions, the face will move, which is equivalent to moving objects), there is relative movement between the target and the background. The velocity vector formed by the moving object must be different from the background velocity vector, so that the position of the moving object can be calculated. Preprocessing by the optical flow method can obtain the first target picture set corresponding to the video to be compared.
- S140 Preprocess the original video by using an optical flow method to obtain a second target picture set corresponding to the original video.
- the process is the same as that for the video to be compared by the optical flow method.
- the optical flow method is used for preprocessing.
- the first total number of pictures included in the first target picture set does not exceed the second total number of pictures included in the second target picture set, and is generally included in the first target picture set
- the first total number of pictures is equal to the second total number of pictures included in the two target picture sets.
- the similarity between each picture in the first target picture set and the corresponding picture in the second target picture set needs to be compared at this time.
- the target picture feature vector corresponding to each target picture in the first target picture set can be obtained through the convolutional neural network, and the target picture feature vector corresponding to each target picture in the first target picture set is used as the vector Analysis of the amount of data and the similarity with the corresponding pictures in the second target picture set.
- step S150 includes:
- S152 Input the picture pixel matrix corresponding to each preprocessed picture to the input layer of the convolutional neural network model to obtain feature maps corresponding to each preprocessed picture;
- grayscale, edge detection, and binarization are sequentially performed on each target picture in the first target picture set, and then the preprocessed picture corresponding to each target picture can be obtained.
- the gray-scale of color image is a basic method of image processing. It is widely used in the field of pattern recognition. Reasonable gray-scale will greatly help the extraction and subsequent processing of image information and save storage space. Speed up processing.
- the method of edge detection is to examine the changes in the gray level of the pixels of the image in a certain area, and to identify the points with obvious brightness changes in the digital image.
- Image edge detection can greatly reduce the amount of data, and eliminate irrelevant information, and preserve the important structural attributes of the image.
- operators used for edge detection In addition to Sobel operator (ie Sobel operator), there are also Laplacian edge detection operator (ie Laplacian edge detection operator), Canny edge detection operator (ie Canney operator) and so on.
- Binarization is a type of image thresholding. According to the selection of the threshold, the binarization method can be divided into global threshold method, dynamic threshold method and local threshold method.
- the maximum between-class variance method also called Otsu algorithm
- Otsu algorithm is commonly used for thresholding to eliminate some of the smaller gradient values Pixels, the pixel value of the image after binarization is 0 or 255.
- the target picture feature vector of the target picture When obtaining the target picture feature vector of the target picture, first obtain the picture pixel matrix corresponding to the preprocessed picture, and then use the picture pixel matrix corresponding to the preprocessed picture as the input of the input layer in the convolutional neural network model to obtain the feature map , Then input the feature map to the pooling layer to obtain the one-dimensional vector corresponding to the maximum value of the feature map, and finally input the one-dimensional vector corresponding to the maximum value of the feature map to the fully connected layer to obtain the preprocessed image
- the corresponding target image feature vector Since each target picture in the first target picture set obtains a corresponding target picture feature vector, the first picture feature vector set can be formed.
- the process is the same as that of acquiring the first target picture through the convolutional neural network.
- the target picture feature vector corresponding to each target picture in the set is the same. Since each target picture in the second target picture set obtains a corresponding target picture feature vector, the second picture feature vector set can be formed.
- the video to be compared is an imitation video of the original video
- generally large body movements are relatively similar.
- they can be obtained separately The first picture feature vector set and the second picture feature vector set, and then the similarity between each picture feature vector in the first picture feature vector set and the corresponding picture feature vector in the second picture feature vector set is obtained to obtain all The average similarity between the video to be compared and the original video.
- step S170 includes:
- the average Euclidean distance value is calculated, and the average Euclidean distance value is used as the The similarity between the first picture feature vector set and the second picture feature vector set.
- the first picture feature vector set includes 10 picture feature vectors, denoted as a1-a10; the second picture feature vector set also includes 10 picture feature vectors, denoted as b1-b10; at this time, a1 is calculated
- the Euclidean distance between a1 and b1 is taken as the first degree of similarity between a1 and b1, and the Euclidean distance between a2 and b2 is calculated as the second degree of similarity between a2 and b2,..., calculate a10 and
- the Euclidean distance between b10 is taken as the tenth similarity between a10 and b10, and the average value of the first similarity to the tenth similarity is obtained as the average similarity between the video to be compared and the original video
- This calculation method can obtain the specific similarity between two similar videos based on the micro-expression similarity, which is more accurate than the video similarity that can be judged by the naked eye.
- the average similarity between the video to be compared and the original video is used as the video similarity
- the score of the degree is sent to the uploader, so as to realize the score notification to the uploader.
- step S180 the method further includes:
- the server detects that the originator citation information corresponding to the video to be compared is not the same as the originator information of the original video, it means that the user has not edited the originator citation information correctly, which may cause later legal risks. Correct the notification information of the original creator citation information corresponding to the video to be compared to the uploader to inform the user to modify the original creator citation information in time, thereby effectively avoiding subsequent legal risks.
- This method realizes the combination of micro-expression recognition technology and video image comparison to determine the similarity between videos, and improves the recognition accuracy of the similarity between videos.
- An embodiment of the present application also provides a video similarity acquisition device, which is used to execute any embodiment of the foregoing video similarity acquisition method.
- FIG. 5 is a schematic block diagram of a video similarity obtaining apparatus provided by an embodiment of the present application.
- the video similarity acquisition device 100 may be configured in a server.
- the video similarity acquisition device 100 includes a video receiving unit 110, a video splitting unit 120, a first picture set acquiring unit 130, a second picture set acquiring unit 140, a first extracting unit 150, and a second extracting unit. 160.
- the video receiving unit 110 is configured to receive the video to be compared uploaded by the uploader, and obtain the original video corresponding to the video to be compared.
- the uploading terminal is a smart terminal used by the user, such as a smart phone, a tablet computer, etc.
- the video to be compared stored in the server
- the similarity of the corresponding original video can be that the user first selects a certain original video in the server, and then obtains the storage path of the original video in the server.
- the video to be compared is obtained After uploading to the cache area in the server, the original video is obtained according to the storage path of the original video, and then a series of video data processing is performed to analyze the similarity between the two videos.
- the video receiving unit 110 is further configured to:
- the user when the uploader uploads the video to be compared to the server, the user first selects the original video on the user interaction interface of the uploader (the user interaction interface is the communication medium between the server and the uploader), and then selects the original video.
- the server After determining the original video, the server can determine the storage path of the original video, the server sends the storage path to the uploader, and the default storage path of the video to be compared obtained by the uploader in the server is the original video storage path.
- the server can obtain the corresponding original video according to the default storage path of the video to be compared in the server.
- the user selects the original video to be compared with before uploading the video to be compared, and stores the video to be compared and the original video in the same storage path on the server, facilitating quick comparison of data in the same data partition , No need to migrate data across regions.
- the video splitting unit 120 is configured to perform video splitting on the video to be compared and the original video to obtain a set of pictures to be compared corresponding to the video to be compared, and an original picture set corresponding to the original video .
- the first picture set acquiring unit 130 is configured to preprocess the video to be compared by an optical flow method to obtain a first target picture set corresponding to the video to be compared.
- micro-expression analysis may be performed by the optical flow method to obtain the first target picture set corresponding to the video to be compared.
- the first picture set acquiring unit 130 includes:
- the vector feature obtaining unit 131 is configured to obtain the velocity vector feature corresponding to each pixel of each frame of the picture in the video to be compared;
- the first target picture set acquiring unit 132 is configured to, if the velocity vector feature of at least one frame of pictures in the video to be compared does not keep changing continuously, compose the corresponding pictures into the first target in the video to be compared Picture collection.
- the scene of the object forms a series of continuously changing images on the retina of the human eye, and this series of continuously changing information constantly "flows through” the retina (that is, the image plane) , Seems to be a kind of light "flow", so it is called optical flow.
- the optical flow expresses the change of the image, contains the information of the target movement, and can be used to determine the target's movement.
- the three elements of optical flow one is the motion velocity field, which is a necessary condition for the formation of optical flow; the second is the part with optical characteristics such as gray-scale pixels, which can carry motion information; the third is the imaging projection from the scene to the The image plane can thus be observed.
- optical flow is based on points. Specifically, let (u, v) be the optical flow of image point (x, y), then (x, y, u, v) is called optical flow point.
- the collection of all optical flow points is called the optical flow field.
- a corresponding image motion field, or image velocity field is formed on the image plane.
- the optical flow field corresponds to the sports field.
- the image can be dynamically analyzed. If there is no moving target in the image, the optical flow vector changes continuously throughout the image area. When there are moving objects in the image (when the user has micro expressions, the face will move, which is equivalent to moving objects), there is relative movement between the target and the background. The velocity vector formed by the moving object must be different from the background velocity vector, so that the position of the moving object can be calculated. Preprocessing by the optical flow method can obtain the first target picture set corresponding to the video to be compared.
- the second picture set obtaining unit 140 is configured to preprocess the original video by the optical flow method to obtain a second target picture set corresponding to the original video.
- the process is the same as that for the video to be compared by the optical flow method.
- the optical flow method is used for preprocessing.
- the first total number of pictures included in the first target picture set does not exceed the second total number of pictures included in the second target picture set, and is generally included in the first target picture set
- the first total number of pictures is equal to the second total number of pictures included in the two target picture sets.
- the first extraction unit 150 is configured to obtain a target picture feature vector corresponding to each target picture in the first target picture set through a convolutional neural network to form a first picture feature vector set.
- the similarity between each picture in the first target picture set and the corresponding picture in the second target picture set needs to be compared at this time.
- the target picture feature vector corresponding to each target picture in the first target picture set can be obtained through the convolutional neural network, and the target picture feature vector corresponding to each target picture in the first target picture set is used as the vector Analysis of the amount of data and the similarity with the corresponding pictures in the second target picture set.
- the first extraction unit 150 includes:
- the preprocessing unit 151 is configured to preprocess each target picture in the first target picture set to obtain a preprocessed picture corresponding to each target picture, and a picture pixel matrix corresponding to each preprocessed picture; Wherein, preprocessing the target picture is to sequentially perform grayscale, edge detection and binarization processing on the target picture;
- the convolution unit 152 is configured to input the picture pixel matrix corresponding to each preprocessed picture to the input layer of the convolutional neural network model to obtain a feature map corresponding to each preprocessed picture;
- the pooling unit 153 is configured to input each feature map to the pooling layer in the convolutional neural network model to obtain a one-dimensional vector corresponding to each feature map;
- the fully connected unit 154 is configured to input the one-dimensional vector corresponding to each feature map to the fully connected layer of the convolutional neural network model to obtain the target image feature vector corresponding to each feature map to form a first image feature vector set.
- grayscale, edge detection, and binarization are sequentially performed on each target picture in the first target picture set, and then the preprocessed picture corresponding to each target picture can be obtained.
- the gray-scale of color image is a basic method of image processing. It is widely used in the field of pattern recognition. Reasonable gray-scale will greatly help the extraction and subsequent processing of image information and save storage space. Speed up processing.
- the method of edge detection is to examine the changes in the gray level of the pixels of the image in a certain area, and to identify the points with obvious brightness changes in the digital image.
- Image edge detection can greatly reduce the amount of data, and eliminate irrelevant information, and preserve the important structural attributes of the image.
- operators used for edge detection In addition to the Sobel operator (ie Sobel operator), there are also Laplacian edge detection operator (ie Laplacian edge detection operator), Canny edge detection operator (ie Canney operator) and so on.
- Binarization is a type of image thresholding. According to the selection of the threshold, the binarization method can be divided into global threshold method, dynamic threshold method and local threshold method.
- the maximum between-class variance method also called Otsu algorithm
- Otsu algorithm is commonly used for thresholding to eliminate some of the smaller gradient values Pixels, the pixel value of the image after binarization is 0 or 255.
- the target picture feature vector of the target picture When obtaining the target picture feature vector of the target picture, first obtain the picture pixel matrix corresponding to the preprocessed picture, and then use the picture pixel matrix corresponding to the preprocessed picture as the input of the input layer in the convolutional neural network model to obtain the feature map , Then input the feature map to the pooling layer to obtain the one-dimensional vector corresponding to the maximum value of the feature map, and finally input the one-dimensional vector corresponding to the maximum value of the feature map to the fully connected layer to obtain the preprocessed image
- the corresponding target image feature vector Since each target picture in the first target picture set obtains a corresponding target picture feature vector, the first picture feature vector set can be formed.
- the second extraction unit 160 is configured to obtain a target picture feature vector corresponding to each target picture in the second target picture set through a convolutional neural network to form a second picture feature vector set.
- the process is the same as that of acquiring the first target picture through the convolutional neural network.
- the target picture feature vector corresponding to each target picture in the set is the same. Since each target picture in the second target picture set obtains a corresponding target picture feature vector, the second picture feature vector set can be formed.
- the average similarity obtaining unit 170 is configured to obtain the similarity between each picture feature vector in the first picture feature vector set and the corresponding picture feature vector in the second picture feature vector set to obtain the video to be compared and The average similarity of the original video.
- the video to be compared is an imitation video of the original video
- generally large body movements are relatively similar.
- they can be obtained separately The first picture feature vector set and the second picture feature vector set, and then the similarity between each picture feature vector in the first picture feature vector set and the corresponding picture feature vector in the second picture feature vector set is obtained to obtain all The average similarity between the video to be compared and the original video.
- the average similarity obtaining unit 170 is further configured to:
- the average Euclidean distance value is calculated, and the average Euclidean distance value is used as the The similarity between the first picture feature vector set and the second picture feature vector set.
- the first picture feature vector set includes 10 picture feature vectors, denoted as a1-a10; the second picture feature vector set also includes 10 picture feature vectors, denoted as b1-b10; at this time, a1 is calculated
- the Euclidean distance between a1 and b1 is taken as the first degree of similarity between a1 and b1, and the Euclidean distance between a2 and b2 is calculated as the second degree of similarity between a2 and b2,..., calculate a10 and
- the Euclidean distance between b10 is taken as the tenth similarity between a10 and b10, and the average value of the first similarity to the tenth similarity is obtained as the average similarity between the video to be compared and the original video
- This calculation method can obtain the specific similarity between two similar videos based on the micro-expression similarity, which is more accurate than the video similarity that can be judged by the naked eye.
- the average similarity sending unit 180 is configured to send the average similarity between the video to be compared and the original video to the uploader.
- the average similarity between the video to be compared and the original video is used as the video similarity
- the score of the degree is sent to the uploader, so as to realize the score notification to the uploader.
- the video similarity acquisition device 100 further includes:
- the original information comparison unit 190 is configured to send a notice of correcting the original creator reference information corresponding to the to-be-compared video if the original creator reference information corresponding to the video to be compared is not the same as the original creator information of the original video Information to upload terminal.
- the server detects that the originator citation information corresponding to the video to be compared is not the same as the originator information of the original video, it means that the user has not edited the originator citation information correctly, which may cause later legal risks. Correct the notification information of the original creator citation information corresponding to the video to be compared to the uploader to inform the user to modify the original creator citation information in time, thereby effectively avoiding subsequent legal risks.
- the device realizes the combination of micro-expression recognition technology and video image comparison to determine the similarity between videos, and improves the recognition accuracy of the similarity between videos.
- the above-mentioned video similarity acquisition device can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 8.
- FIG. 8 is a schematic block diagram of a computer device according to an embodiment of the present application.
- the computer device 500 is a server, and the server may be an independent server or a server cluster composed of multiple servers.
- the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
- the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
- the processor 502 can execute the video similarity acquisition method.
- the processor 502 is used to provide calculation and control capabilities, and support the operation of the entire computer device 500.
- the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503.
- the processor 502 can execute the video similarity acquisition method.
- the network interface 505 is used for network communication, such as providing data information transmission.
- the structure shown in FIG. 8 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
- the specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
- the processor 502 is configured to run a computer program 5032 stored in a memory to implement the video similarity acquisition device disclosed in the embodiment of the present application.
- the embodiment of the computer device shown in FIG. 8 does not constitute a limitation on the specific configuration of the computer device.
- the computer device may include more or less components than those shown in the figure. Or combine certain components, or different component arrangements.
- the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 8 and will not be repeated here.
- the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
- a computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium.
- the computer-readable storage medium stores a computer program, where the computer program is executed by a processor to implement the video similarity acquisition method disclosed in the embodiments of the present application.
- the disclosed equipment, device, and method may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods, or the units with the same function may be combined into one. Units, for example, multiple units or components can be combined or integrated into another system, or some features can be omitted or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a storage medium.
- the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), magnetic disk or optical disk and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (20)
- 一种视频相似度获取方法,其中,包括:接收上传端所上传的待比对视频,获取与所述待比对视频对应的原始视频;将所述待对比视频及所述原始视频均进行视频拆分,得到与所述待对比视频对应的待对比图片集合,及与所述原始视频对应的原始图片集合;通过光流法对所述待比对视频进行预处理,得到与所述待比对视频对应的第一目标图片集合;通过光流法对所述原始视频进行预处理,得到与所述原始视频对应的第二目标图片集合;通过卷积神经网络获取所述第一目标图片集合中每一目标图片对应的目标图片特征向量,以组成第一图片特征向量集合;通过卷积神经网络获取所述第二目标图片集合中每一目标图片对应的目标图片特征向量,以组成第二图片特征向量集合;获取所述第一图片特征向量集合中各图片特征向量与所述第二图片特征向量集合中对应图片特征向量的相似度,以得到所述待比对视频与所述原始视频的平均相似度;以及将所述待比对视频与所述原始视频的平均相似度发送至上传端。
- 根据权利要求1所述的视频相似度获取方法,其中,所述获取与所述待比对视频对应的原始视频,包括:根据所述待对比视频对应的存储路径在服务器中获取对应的原始视频。
- 根据权利要求1所述的视频相似度获取方法,其中,所述通过光流法对所述待比对视频进行预处理,得到与所述待比对视频对应的第一目标图片集合,包括:获取所述待比对视频中各帧图片的各像素点对应的速度矢量特征;若所述待比对视频中存在至少一帧图片的所述速度矢量特征未保持连续变化,将对应图片组成所述待比对视频中的第一目标图片集合。
- 根据权利要求1所述的视频相似度获取方法,其中,所述通过卷积神经网络获取所述第一目标图片集合中每一目标图片对应的目标图片特征向量,以组成第一图片特征向量集合,包括:将所述第一目标图片集合中每一目标图片进行预处理,得到与各目标图片对应的预处理后图片,及与每一预处理后图片对应的图片像素矩阵;其中,将目标图片进行预处理为依序对所述目标图片进行灰度化、边缘检测和二值化处理;将与各预处理后图片对应的图片像素矩阵输入至卷积神经网络模型中输入层,得到与各预处理后图片对应的特征图;将各特征图输入至卷积神经网络模型中池化层,得到与各特征图对应的一维向量;将与各特征图对应的一维向量输入至卷积神经网络模型中全连接层,得到与各特征图对应的目标图片特征向量,以组成第一图片特征向量集合。
- 根据权利要求1所述的视频相似度获取方法,其中,所述获取所述第一图片特征向量集合中各图片特征向量与所述第二图片特征向量集合中对应图片特征向量的相似度,包括:获取所述第一图片特征向量集合中各图片特征向量与所述第二图片特征向量集合中对应图片特征向量的欧氏距离后求平均欧氏距离值,以所述平均欧氏距离值作为所述第一图片特征向量集合与所述第二图片特征向量集合之间的相似度。
- 根据权利要求1所述的视频相似度获取方法,其中,所述将所述待比对视频与所述原始视频的平均相似度发送至上传端之后,还包括:若所述待比对视频对应的原创者引用信息与所述原始视频的原创者信息不相同,发送更正所述待比对视频对应的原创者引用信息的通知信息至上传端。
- 一种视频相似度获取装置,其中,包括:视频接收单元,用于接收上传端所上传的待比对视频,获取与所述待比对视频对应的原 始视频;视频拆分单元,用于将所述待对比视频及所述原始视频均进行视频拆分,得到与所述待对比视频对应的待对比图片集合,及与所述原始视频对应的原始图片集合;第一图片集合获取单元,用于通过光流法对所述待比对视频进行预处理,得到与所述待比对视频对应的第一目标图片集合;第二图片集合获取单元,用于通过光流法对所述原始视频进行预处理,得到与所述原始视频对应的第二目标图片集合;第一提取单元,用于通过卷积神经网络获取所述第一目标图片集合中每一目标图片对应的目标图片特征向量,以组成第一图片特征向量集合;第二提取单元,用于通过卷积神经网络获取所述第二目标图片集合中每一目标图片对应的目标图片特征向量,以组成第二图片特征向量集合;平均相似度获取单元,用于获取所述第一图片特征向量集合中各图片特征向量与所述第二图片特征向量集合中对应图片特征向量的相似度,以得到所述待比对视频与所述原始视频的平均相似度;以及平均相似度发送单元,用于将所述待比对视频与所述原始视频的平均相似度发送至上传端。
- 根据权利要求7所述的视频相似度获取装置,其中,所述第一图片集合获取单元,包括:矢量特征获取单元,用于获取所述待比对视频中各帧图片的各像素点对应的速度矢量特征;第一目标图片集合获取单元,用于若所述待比对视频中存在至少一帧图片的所述速度矢量特征未保持连续变化,将对应图片组成所述待比对视频中的第一目标图片集合。
- 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下步骤:接收上传端所上传的待比对视频,获取与所述待比对视频对应的原始视频;将所述待对比视频及所述原始视频均进行视频拆分,得到与所述待对比视频对应的待对比图片集合,及与所述原始视频对应的原始图片集合;通过光流法对所述待比对视频进行预处理,得到与所述待比对视频对应的第一目标图片集合;通过光流法对所述原始视频进行预处理,得到与所述原始视频对应的第二目标图片集合;通过卷积神经网络获取所述第一目标图片集合中每一目标图片对应的目标图片特征向量,以组成第一图片特征向量集合;通过卷积神经网络获取所述第二目标图片集合中每一目标图片对应的目标图片特征向量,以组成第二图片特征向量集合;获取所述第一图片特征向量集合中各图片特征向量与所述第二图片特征向量集合中对应图片特征向量的相似度,以得到所述待比对视频与所述原始视频的平均相似度;以及将所述待比对视频与所述原始视频的平均相似度发送至上传端。
- 如权利要求9所述的计算机设备,其中,所述处理器执行所述计算机程序时实现所述获取与所述待比对视频对应的原始视频的步骤,包括:根据所述待对比视频对应的存储路径在服务器中获取对应的原始视频。
- 如权利要求9所述的计算机设备,其中,所述处理器执行所述计算机程序时实现所述通过光流法对所述待比对视频进行预处理,得到与所述待比对视频对应的第一目标图片集合的步骤,包括:获取所述待比对视频中各帧图片的各像素点对应的速度矢量特征;若所述待比对视频中存在至少一帧图片的所述速度矢量特征未保持连续变化,将对应图片组成所述待比对视频中的第一目标图片集合。
- 如权利要求9所述的计算机设备,其中,所述处理器执行所述计算机程序时实现所述通过卷积神经网络获取所述第一目标图片集合中每一目标图片对应的目标图片特征向量,以组成第一图片特征向量集合的步骤,包括:将所述第一目标图片集合中每一目标图片进行预处理,得到与各目标图片对应的预处理后图片,及与每一预处理后图片对应的图片像素矩阵;其中,将目标图片进行预处理为依序对所述目标图片进行灰度化、边缘检测和二值化处理;将与各预处理后图片对应的图片像素矩阵输入至卷积神经网络模型中输入层,得到与各预处理后图片对应的特征图;将各特征图输入至卷积神经网络模型中池化层,得到与各特征图对应的一维向量;将与各特征图对应的一维向量输入至卷积神经网络模型中全连接层,得到与各特征图对应的目标图片特征向量,以组成第一图片特征向量集合。
- 如权利要求9所述的计算机设备,其中,所述处理器执行所述计算机程序时实现所述获取所述第一图片特征向量集合中各图片特征向量与所述第二图片特征向量集合中对应图片特征向量的相似度的步骤,包括:获取所述第一图片特征向量集合中各图片特征向量与所述第二图片特征向量集合中对应图片特征向量的欧氏距离后求平均欧氏距离值,以所述平均欧氏距离值作为所述第一图片特征向量集合与所述第二图片特征向量集合之间的相似度。
- 如权利要求9所述的计算机设备,其中,所述处理器执行所述计算机程序时实现所述将所述待比对视频与所述原始视频的平均相似度发送至上传端的步骤之后,还用于实现如下步骤:若所述待比对视频对应的原创者引用信息与所述原始视频的原创者信息不相同,发送更正所述待比对视频对应的原创者引用信息的通知信息至上传端。
- 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行如下步骤:接收上传端所上传的待比对视频,获取与所述待比对视频对应的原始视频;将所述待对比视频及所述原始视频均进行视频拆分,得到与所述待对比视频对应的待对比图片集合,及与所述原始视频对应的原始图片集合;通过光流法对所述待比对视频进行预处理,得到与所述待比对视频对应的第一目标图片集合;通过光流法对所述原始视频进行预处理,得到与所述原始视频对应的第二目标图片集合;通过卷积神经网络获取所述第一目标图片集合中每一目标图片对应的目标图片特征向量,以组成第一图片特征向量集合;通过卷积神经网络获取所述第二目标图片集合中每一目标图片对应的目标图片特征向量,以组成第二图片特征向量集合;获取所述第一图片特征向量集合中各图片特征向量与所述第二图片特征向量集合中对应图片特征向量的相似度,以得到所述待比对视频与所述原始视频的平均相似度;以及将所述待比对视频与所述原始视频的平均相似度发送至上传端。
- 如权利要求15所述的存储介质,其中,所述计算机程序当被处理器执行时使所述处理器执行所述获取与所述待比对视频对应的原始视频的步骤,包括:根据所述待对比视频对应的存储路径在服务器中获取对应的原始视频。
- 如权利要求15所述的存储介质,其中,所述计算机程序当被处理器执行时使所述处理器执行所述通过光流法对所述待比对视频进行预处理,得到与所述待比对视频对应的第一目标图片集合的步骤,包括:获取所述待比对视频中各帧图片的各像素点对应的速度矢量特征;若所述待比对视频中存在至少一帧图片的所述速度矢量特征未保持连续变化,将对应图片组成所述待比对视频中的第一目标图片集合。
- 如权利要求15所述的存储介质,其中,所述通过卷积神经网络获取所述第一目标图片集合中每一目标图片对应的目标图片特征向量,以组成第一图片特征向量集合的步骤,包括:将所述第一目标图片集合中每一目标图片进行预处理,得到与各目标图片对应的预处理后图片,及与每一预处理后图片对应的图片像素矩阵;其中,将目标图片进行预处理为依序对所述目标图片进行灰度化、边缘检测和二值化处理;将与各预处理后图片对应的图片像素矩阵输入至卷积神经网络模型中输入层,得到与各预处理后图片对应的特征图;将各特征图输入至卷积神经网络模型中池化层,得到与各特征图对应的一维向量;将与各特征图对应的一维向量输入至卷积神经网络模型中全连接层,得到与各特征图对应的目标图片特征向量,以组成第一图片特征向量集合。
- 如权利要求15所述的存储介质,其中,所述计算机程序当被处理器执行时使所述处理器执行所述获取所述第一图片特征向量集合中各图片特征向量与所述第二图片特征向量集合中对应图片特征向量的相似度的步骤,包括:获取所述第一图片特征向量集合中各图片特征向量与所述第二图片特征向量集合中对应图片特征向量的欧氏距离后求平均欧氏距离值,以所述平均欧氏距离值作为所述第一图片特征向量集合与所述第二图片特征向量集合之间的相似度。
- 如权利要求15所述的存储介质,其中,所述计算机程序当被处理器执行时使所述处理器执行所述将所述待比对视频与所述原始视频的平均相似度发送至上传端的步骤之后,还用于执行如下步骤:若所述待比对视频对应的原创者引用信息与所述原始视频的原创者信息不相同,发送更正所述待比对视频对应的原创者引用信息的通知信息至上传端。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910752907.6A CN110674837A (zh) | 2019-08-15 | 2019-08-15 | 视频相似度获取方法、装置、计算机设备及存储介质 |
CN201910752907.6 | 2019-08-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021027325A1 true WO2021027325A1 (zh) | 2021-02-18 |
Family
ID=69075353
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/087033 WO2021027325A1 (zh) | 2019-08-15 | 2020-04-26 | 视频相似度获取方法、装置、计算机设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110674837A (zh) |
WO (1) | WO2021027325A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118097501A (zh) * | 2024-02-29 | 2024-05-28 | 广州米麦文化传媒有限公司 | 一种视频处理方法和视频处理系统 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674837A (zh) * | 2019-08-15 | 2020-01-10 | 深圳壹账通智能科技有限公司 | 视频相似度获取方法、装置、计算机设备及存储介质 |
CN111553218A (zh) * | 2020-04-20 | 2020-08-18 | 南京医科大学 | 一种基于人体姿势识别的智能医学技能教学监测系统 |
CN111601115B (zh) * | 2020-05-12 | 2022-03-01 | 腾讯科技(深圳)有限公司 | 一种视频检测的方法、相关装置、设备及存储介质 |
CN111586473B (zh) * | 2020-05-20 | 2023-01-17 | 北京字节跳动网络技术有限公司 | 视频的裁剪方法、装置、设备及存储介质 |
CN111723868B (zh) * | 2020-06-22 | 2023-07-21 | 海尔优家智能科技(北京)有限公司 | 用于去除同源图片的方法、装置及服务器 |
CN114627560A (zh) * | 2022-05-13 | 2022-06-14 | 浙江大华技术股份有限公司 | 一种动作识别方法、动作识别模型训练方法及相关装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140369610A1 (en) * | 2011-12-29 | 2014-12-18 | Rakuten, Inc. | Image search system, image search method, image search device, program, and information recording medium |
CN105141903A (zh) * | 2015-08-13 | 2015-12-09 | 中国科学院自动化研究所 | 一种基于颜色信息的在视频中进行目标检索的方法 |
CN106991373A (zh) * | 2017-03-02 | 2017-07-28 | 中国人民解放军国防科学技术大学 | 一种基于深度学习和图论的拷贝视频检测方法 |
CN110674837A (zh) * | 2019-08-15 | 2020-01-10 | 深圳壹账通智能科技有限公司 | 视频相似度获取方法、装置、计算机设备及存储介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017107188A1 (zh) * | 2015-12-25 | 2017-06-29 | 中国科学院深圳先进技术研究院 | 视频分类快速识别的方法及装置 |
CN106548494A (zh) * | 2016-09-26 | 2017-03-29 | 浙江工商大学 | 一种基于场景样本库的影视图像深度提取方法 |
CN109214238B (zh) * | 2017-06-30 | 2022-06-28 | 阿波罗智能技术(北京)有限公司 | 多目标跟踪方法、装置、设备及存储介质 |
CN109947991A (zh) * | 2017-10-31 | 2019-06-28 | 腾讯科技(深圳)有限公司 | 一种关键帧提取方法、装置和存储介质 |
CN109857893A (zh) * | 2019-01-16 | 2019-06-07 | 平安科技(深圳)有限公司 | 图片检索方法、装置、计算机设备及存储介质 |
CN109886130B (zh) * | 2019-01-24 | 2021-05-28 | 上海媒智科技有限公司 | 目标对象的确定方法、装置、存储介质和处理器 |
-
2019
- 2019-08-15 CN CN201910752907.6A patent/CN110674837A/zh active Pending
-
2020
- 2020-04-26 WO PCT/CN2020/087033 patent/WO2021027325A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140369610A1 (en) * | 2011-12-29 | 2014-12-18 | Rakuten, Inc. | Image search system, image search method, image search device, program, and information recording medium |
CN105141903A (zh) * | 2015-08-13 | 2015-12-09 | 中国科学院自动化研究所 | 一种基于颜色信息的在视频中进行目标检索的方法 |
CN106991373A (zh) * | 2017-03-02 | 2017-07-28 | 中国人民解放军国防科学技术大学 | 一种基于深度学习和图论的拷贝视频检测方法 |
CN110674837A (zh) * | 2019-08-15 | 2020-01-10 | 深圳壹账通智能科技有限公司 | 视频相似度获取方法、装置、计算机设备及存储介质 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118097501A (zh) * | 2024-02-29 | 2024-05-28 | 广州米麦文化传媒有限公司 | 一种视频处理方法和视频处理系统 |
Also Published As
Publication number | Publication date |
---|---|
CN110674837A (zh) | 2020-01-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021027325A1 (zh) | 视频相似度获取方法、装置、计算机设备及存储介质 | |
US11727577B2 (en) | Video background subtraction using depth | |
US20220027669A1 (en) | Objects and Features Neural Network | |
US11527105B2 (en) | System and method for scalable cloud-robotics based face recognition and face analysis | |
US9864901B2 (en) | Feature detection and masking in images based on color distributions | |
JP7476428B2 (ja) | 画像の視線補正方法、装置、電子機器、コンピュータ可読記憶媒体及びコンピュータプログラム | |
US9547908B1 (en) | Feature mask determination for images | |
CN111415358B (zh) | 图像分割方法、装置、电子设备及存储介质 | |
JP6961797B2 (ja) | プレビュー写真をぼかすための方法および装置ならびにストレージ媒体 | |
CN108960045A (zh) | 眼球追踪方法、电子装置及非暂态电脑可读取记录媒体 | |
CN108921782A (zh) | 一种图像处理方法、装置及存储介质 | |
CN110807427B (zh) | 一种视线追踪方法、装置、计算机设备和存储介质 | |
WO2021027329A1 (zh) | 基于图像识别的信息推送方法、装置、及计算机设备 | |
JP2009526495A (ja) | モード間の関心領域画像オブジェクト区分 | |
JP2002342756A (ja) | デジタル画像において目と口の位置を検出する方法 | |
US20220164988A1 (en) | Methods and Systems for Calibrating Surface Data Capture Devices | |
CN111008935B (zh) | 一种人脸图像增强方法、装置、系统及存储介质 | |
CN111079613B (zh) | 姿势识别方法和装置、电子设备及存储介质 | |
CN112883940A (zh) | 静默活体检测方法、装置、计算机设备及存储介质 | |
WO2021008068A1 (zh) | 图像处理方法及装置 | |
CN114299363A (zh) | 图像处理模型的训练方法、图像分类方法及装置 | |
CN115953813B (zh) | 一种表情驱动方法、装置、设备及存储介质 | |
CN110321009B (zh) | Ar表情处理方法、装置、设备和存储介质 | |
CN113012030A (zh) | 图像拼接方法、装置及设备 | |
CN113128277A (zh) | 一种人脸关键点检测模型的生成方法及相关设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20852667 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20852667 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.08.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20852667 Country of ref document: EP Kind code of ref document: A1 |