WO2022033252A1 - 视频匹配方法、基于区块链的侵权存证方法和装置 - Google Patents

视频匹配方法、基于区块链的侵权存证方法和装置 Download PDF

Info

Publication number
WO2022033252A1
WO2022033252A1 PCT/CN2021/105214 CN2021105214W WO2022033252A1 WO 2022033252 A1 WO2022033252 A1 WO 2022033252A1 CN 2021105214 W CN2021105214 W CN 2021105214W WO 2022033252 A1 WO2022033252 A1 WO 2022033252A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
candidate
videos
target video
target
Prior art date
Application number
PCT/CN2021/105214
Other languages
English (en)
French (fr)
Inventor
蒋晨
张伟
王清
程远
徐富荣
黄凯明
张晓博
钱烽
杨旭东
潘覃
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2022033252A1 publication Critical patent/WO2022033252A1/zh
Priority to US18/149,552 priority Critical patent/US11954152B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/091Active learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Definitions

  • This document relates to the field of computer technology, in particular to a video matching method, a blockchain-based infringement evidence storage method and device.
  • the embodiments of this specification provide a video matching method, a blockchain-based infringement evidence storage method and device, so as to deal with the mismatching and missing matching of various features between videos and videos, and support the infringement of multiple video clips Positioning, improve the efficiency of video matching, thereby reducing the cost of manual review.
  • a video matching method which includes: based on multiple feature vectors of the target video, retrieving from the video database similar to the target video.
  • the candidate video based on the target video and the candidate video, construct a temporal similarity matrix feature map between the target video and the candidate video; use the temporal similarity matrix feature map as a deep learning detection model input to output the video clips in the candidate video that match the target video and the corresponding similarity; when the similarity corresponding to the video clips matching the target video in the candidate video is greater than or equal to the preset
  • upload the infringement evidence including the abstract of the target video, the video clips in the candidate video that match the target video, and the corresponding similarity to the blockchain.
  • the deep learning detection model is obtained by training based on temporal similarity matrix feature maps of multiple groups of sample videos and corresponding labels, wherein the sample videos include query videos and candidate videos corresponding to the query videos, and the sample videos correspond to The tags include the video clips and infringement marks matched by the query video in the sample video in the corresponding candidate video.
  • a blockchain-based method for storing evidence of an infringement video comprising: obtaining multiple feature vectors of a target video; based on the multiple feature vectors of the target video, retrieving from a video database similar to the target video based on the target video and the candidate video, construct a temporal similarity matrix feature map between the target video and the candidate video; use the temporal similarity matrix feature map as a deep learning detection
  • the input of the model is to output the video clips that match the target video in the candidate video and the corresponding similarity; wherein, the deep learning detection model is based on the temporal similarity matrix feature maps of multiple groups of sample videos and
  • the corresponding labels are obtained by training, wherein the sample video includes the query video and the candidate video corresponding to the query video, and the label corresponding to the sample video includes the video clip and the infringement mark that the query video in the sample video matches in the corresponding candidate video.
  • a blockchain-based infringement certificate storage device including: a candidate video retrieval module, based on multiple feature vectors of the target video, from a video database to retrieve candidate videos similar to the target video; features A graph construction module, based on the target video and the candidate video, constructs a temporal similarity matrix feature map between the target video and the candidate video; a model output module, converts the temporal similarity matrix feature map As the input of the deep learning detection model, to output the video clips in the candidate video that match the target video and the corresponding similarity; the evidence upload module, when the video in the candidate video matches the target video When the similarity corresponding to the clip is greater than or equal to the preset similarity threshold, upload the infringement evidence containing the abstract of the target video, the video clips in the candidate video that match the target video, and the corresponding similarity to the website. in the blockchain.
  • the deep learning detection model is obtained by training based on temporal similarity matrix feature maps of multiple groups of sample videos and corresponding labels, wherein the sample videos include query videos and candidate videos corresponding to the query videos, and the sample videos correspond to The tags include the video clips and infringement marks matched by the query video in the sample video in the corresponding candidate video.
  • a video matching device including: a feature vector acquisition module, which acquires multiple feature vectors of a target video; a candidate video retrieval module, based on the multiple feature vectors of the target video, retrieves and matches from a video database from a video database.
  • a feature map construction module based on the target video and the candidate video, constructs a temporal similarity matrix feature map between the target video and the candidate video
  • a model output module The time domain similarity matrix feature map is used as the input of the deep learning detection model, to output the video clips and the corresponding similarity that match the target video in the candidate video;
  • the deep learning detection model is It is obtained by training based on the temporal similarity matrix feature maps of multiple groups of sample videos and the corresponding labels, wherein the sample videos include the query video and the candidate videos corresponding to the query videos, and the labels corresponding to the sample videos include the query videos in the sample videos.
  • an electronic device comprising: a processor; and a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the following operations: target video-based A plurality of feature vectors to retrieve candidate videos similar to the target video from the video database; based on the target video and the candidate video, construct a temporal similarity matrix feature between the target video and the candidate video Figure; take the temporal similarity matrix feature map as the input of the deep learning detection model, to output the video clips in the candidate video that match the target video and the corresponding similarity; When the similarity corresponding to the video clips matching the target video is greater than or equal to the preset similarity threshold, the abstract of the target video, the video clips matching the target video in the candidate videos and the corresponding video clips will be included. The infringement evidence of the similarity is uploaded to the blockchain.
  • the deep learning detection model is obtained by training based on temporal similarity matrix feature maps of multiple groups of sample videos and corresponding labels, wherein the sample videos include query videos and candidate videos corresponding to the query videos, and the sample videos correspond to The tags include the video clips and infringement marks matched by the query video in the sample video in the corresponding candidate video.
  • a computer-readable storage medium stores one or more programs that, when executed by an electronic device including a plurality of application programs, cause all The electronic device performs the following operations: based on multiple feature vectors of the target video, retrieving a candidate video similar to the target video from the video database; based on the target video and the candidate video, constructing the target video and the target video The temporal similarity matrix feature map between the candidate videos; the temporal similarity matrix feature map is used as the input of the deep learning detection model to output the video clips in the candidate videos that match the target video and the corresponding When the similarity corresponding to the video clip matching the target video in the candidate video is greater than or equal to the preset similarity threshold, the abstract containing the target video, the candidate video with the The video clips matching the target video and the corresponding infringement evidence of similarity are uploaded to the blockchain.
  • the deep learning detection model is obtained by training based on temporal similarity matrix feature maps of multiple groups of sample videos and corresponding labels, wherein the sample videos include query videos and candidate videos corresponding to the query videos, and the sample videos correspond to The tags include the video clips and infringement marks matched by the query video in the sample video in the corresponding candidate video.
  • an electronic device comprising: a processor; and a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the following operations: obtain an image of a target video multiple feature vectors; based on multiple feature vectors of the target video, retrieve a candidate video similar to the target video from a video database; build the target video and the target video based on the target video and the candidate video The temporal similarity matrix feature map between the candidate videos; the temporal similarity matrix feature map is used as the input of the deep learning detection model to output the video clips in the candidate videos that match the target video and the corresponding The similarity; wherein, the deep learning detection model is obtained based on the temporal similarity matrix feature maps of multiple groups of sample videos and the corresponding label training, wherein the sample videos include query videos and candidate videos corresponding to the query videos, The tags corresponding to the sample videos include video clips and infringement marks matched by the query videos in the sample videos in the corresponding candidate videos.
  • a computer-readable storage medium stores one or more programs, the one or more programs, when executed by an electronic device including a plurality of application programs, cause all The electronic device performs the following operations: obtaining multiple feature vectors of the target video; retrieving candidate videos similar to the target video from the video database based on the multiple feature vectors of the target video; based on the target video and the target video candidate video, construct a temporal similarity matrix feature map between the target video and the candidate video; use the temporal similarity matrix feature map as the input of the deep learning detection model to output the The video clips matched with the target video and the corresponding similarity; wherein, the deep learning detection model is obtained by training based on the time-domain similarity matrix feature maps of multiple groups of sample videos and corresponding labels, wherein the sample videos contain the query The video and the candidate video corresponding to the query video, and the label corresponding to the sample video includes the video segment and the infringement mark matched in the corresponding candidate video by the query video in the sample video
  • a candidate video similar to the target video can be retrieved from the video database based on multiple feature vectors of the target video, and then Based on the target video and the candidate video, construct the temporal similarity matrix feature map between the target video and the candidate video. Finally, the temporal similarity matrix feature map is used as the input of the deep learning detection model, and the output is obtained to obtain the target video and the target video.
  • the video clips that match the target video and the corresponding infringement evidence of similarity are uploaded to the blockchain.
  • the methods provided in the embodiments of this specification use a deep learning detection model. On the one hand, in terms of the efficiency of infringement location, it can detect any number of infringing segments of a possible infringing video, and at the same time, using vector retrieval combined with the detection model can greatly improve the detection of infringing videos. Efficiency; on the other hand, it also reduces the cost of manual review.
  • non-tampering feature of the blockchain is also used to upload the summary of the infringing target video, the video clips in the candidate videos that match the target video, and the corresponding similarity to the blockchain, in case of infringement evidence. Obtain evidence of target video infringement in the blockchain.
  • FIG. 1 is a schematic flowchart of the implementation of a blockchain-based infringement certificate storage method provided by an embodiment of this specification.
  • FIG. 2 is a schematic diagram of an implementation flowchart of a video matching method according to an embodiment of the present specification.
  • FIG. 3 is a schematic flowchart of a video matching method applied to a scenario provided by an embodiment of the present specification.
  • FIG. 4 is a schematic diagram of a temporal similarity matrix feature map drawn in a video matching method according to an embodiment of the present specification.
  • FIG. 5 is a schematic structural diagram of a blockchain-based infringement certificate storage device according to an embodiment of the present specification.
  • FIG. 6 is a schematic structural diagram of a video matching apparatus according to an embodiment of the present specification.
  • FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present specification.
  • FIG. 8 is a schematic structural diagram of another electronic device according to an embodiment of the present specification.
  • FIG. 1 is a schematic diagram of the implementation flow of a blockchain-based infringement video certificate storage method provided by an embodiment of this specification, including:
  • S110 based on a plurality of feature vectors of the target video, retrieve a candidate video similar to the target video from a video database.
  • the deep learning detection model is obtained by training based on the temporal similarity matrix feature maps of multiple groups of sample videos and corresponding labels, wherein the sample videos include query videos and candidate videos corresponding to the query videos, and the labels corresponding to the sample videos include The video clips and infringement marks matched by the query video in the sample video in the corresponding candidate video.
  • the preset similarity threshold can be obtained according to an empirical value, and is used to define whether the target video is infringed, for example, it can be set to 60%. It should be understood that due to the limited storage space in the blockchain, when the embodiment of this specification records the infringement evidence of the target video, the target video can be converted into a series of hash values through a hash encryption algorithm, and the target video can be converted into a series of hash values. The hash value, as well as the video clips in the candidate videos that match the target video and the corresponding similarity are uploaded to the blockchain, and the nodes in the blockchain that have the authority to store the evidence will perform a consensus operation on the infringement evidence, and in the consensus It is then recorded in the newly generated block. When it is necessary to obtain the infringement evidence, the infringement evidence containing the hash value of the target video can be downloaded from the blockchain based on the hash value of the target video.
  • candidate videos similar to the target video can be retrieved from the video database, and then based on the target video and the candidate video, the relationship between the target video and the candidate video can be constructed.
  • Temporal similarity matrix feature map is used as the input of the deep learning detection model, and the output is obtained to obtain the video clips in the candidate video that match the target video and the corresponding similarity; and in the candidate video
  • the similarity corresponding to the video clips matching the target video is greater than or equal to the preset similarity threshold, upload the infringement evidence containing the abstract of the target video, the video clips matching the target video in the candidate videos, and the corresponding similarity.
  • the deep learning detection model is obtained by training based on the temporal similarity matrix feature maps of multiple groups of sample videos and the corresponding labels, wherein the sample videos include the query video and the candidate video corresponding to the query video, and the sample video
  • the corresponding tags include video clips and infringement marks matched by the query video in the sample video in the corresponding candidate video.
  • the method provided in the embodiments of this specification utilizes the non-tamperable feature of the blockchain to upload the abstract of the infringing target video, the video clips in the candidate videos that match the target video, and the corresponding similarity to the blockchain, so as to prepare Obtain evidence of target video infringement from the blockchain when infringing evidence.
  • the obtained vector retrieval result will contain the matching results of N candidate videos. .
  • a highly robust algorithm is needed to deal with the mismatching and missing matching of feature vectors.
  • the retrieval result contains a large set of videos roughly sorted by the search engine, high efficiency is required.
  • the video matching algorithm also supports the infringement positioning of multiple video clips to reduce the cost of manual review.
  • the commonly used dynamic programming algorithms in the industry, CCF competitions and other schemes are easily affected by the noise of the feature vector retrieval results, which are not robust enough, and the efficiency of video matching decreases sharply with the increase of the duration of the infringing video.
  • the embodiment of this specification also proposes a video matching method, which can obtain multiple feature vectors of the target video, and based on the multiple feature vectors of the target video , retrieve the candidate videos similar to the target video from the video database, and then construct the temporal similarity matrix feature map between the target video and the candidate video based on the target video and the candidate video.
  • the temporal similarity matrix feature map is used as The input of the deep learning detection model, the output obtains the video clips in the candidate video that match the target video and the corresponding similarity; wherein, the deep learning detection model is based on the temporal similarity matrix feature map of multiple sets of sample videos and the corresponding labels
  • the sample video includes the query video and the candidate video corresponding to the query video
  • the label corresponding to the sample video includes video clips and infringement marks that match the query video in the sample video in the corresponding candidate video.
  • the methods provided in the embodiments of this specification use a deep learning detection model.
  • a deep learning detection model it can detect any number of infringing segments of a possible infringing video, and at the same time, using vector retrieval combined with the detection model can greatly improve the detection of infringing videos.
  • Efficiency on the other hand, it also reduces the cost of manual review.
  • the execution body of the method may be, but is not limited to, at least one of devices that can be configured to execute the method provided by the embodiments of the present invention, such as a personal computer and a server.
  • the execution body of the method is a server capable of executing the method. It can be understood that the execution body of the method is a server, which is only an exemplary description, and should not be construed as a limitation of the method.
  • FIG. 2 a schematic diagram of an implementation flow of a video matching method provided by one or more embodiments of this specification is shown in FIG. 2 , including: S210 , acquiring multiple feature vectors of a target video.
  • the target video may be a suspicious infringing video
  • the candidate video described later can be used as evidence of the suspected infringing video infringement.
  • the acquisition of multiple feature vectors of the target video may specifically include dividing the target video into multiple video segments, and then extracting one or more feature vectors for each video segment.
  • the target video can also be extracted to obtain multiple video frames, the key frames in the target video can be extracted, or multiple video frames in the target video can be randomly extracted, and the target video can also be extracted every preset time period.
  • One video frame obtains multiple video frames, and then extracts one or more feature vectors from the extracted video frames.
  • One of the feature vectors corresponds to a feature extraction algorithm.
  • the multiple feature vectors of the target video may specifically include multiple feature vectors corresponding to multiple video clips or video frames of the target video, and one video clip or video frame corresponds to one feature vector; or, the multiple feature vectors of the target video also It may include: multiple feature vectors of the target video extracted by multiple feature extraction algorithms; or, multiple feature vectors of the target video may also include: multiple video segments or videos of the target video are respectively extracted by multiple feature extraction algorithms. Multiple feature vectors obtained by frame extraction, one video clip or video frame corresponds to multiple feature vectors.
  • S220 based on the multiple feature vectors of the target video, retrieve a candidate video similar to the target video from the video database.
  • the video database contains a large number of videos, each video corresponds to one or more feature vectors, and a feature vector corresponds to a feature extraction algorithm.
  • the eigenvectors that match with each eigenvector of the target video can be retrieved from the video database, and then the video corresponding to these matching eigenvectors is determined. for candidate videos.
  • retrieving candidate videos similar to the target video from the video database includes: from the video database, obtaining a feature vector retrieval result similar to the multiple feature vectors of the target video; The retrieval results of feature vectors similar to multiple feature vectors of the target video are obtained, and candidate videos similar to the target video are obtained from the video database.
  • the feature vector retrieval results that are similar to multiple feature vectors of the target video may specifically include: first several feature vectors that match each feature vector, or one feature vector that best matches each feature vector.
  • a feature vector that best matches each of the multiple feature vectors of the target video may be obtained from a video database, and then a candidate video corresponding to the best matching feature vector is determined. That is to say, the retrieval result of one feature vector may correspond to multiple matching feature vectors of one candidate video, or may be different matching feature vectors of multiple candidate videos.
  • FIG. 3 it is a schematic diagram of applying the video matching method provided in the embodiment of the present specification to an actual scene.
  • q1 ⁇ qn are multiple feature vectors of the target video
  • V3 and V1 are the vector retrieval results of two candidate videos retrieved from the video database that are similar to the target video.
  • V 3 q1 are the similarity values of the matching positions of the feature vector q1 of the target video in the candidate video V 3
  • V 3 q2 are the similarity of the matching positions of the feature vector q2 of the target video in the candidate video V 3 Degree value
  • V 3 qn is the similarity value of the matching position with the feature vector qn of the target video in the candidate video V 3
  • V 1, q1 is the candidate video V 1 matches the feature vector q1 of the target video
  • the similarity value of the position, V 1, qn is the similarity value of the position matching the feature vector qn of the target video in the candidate video V 1 .
  • the vector retrieval result between the multiple feature vectors of the above-mentioned target video and the multiple feature vectors in the candidate video will include the relationship between the feature vector of the target video shown in FIG. 3 and the feature vector of the candidate video.
  • the similarity between the matching position (that is, the similar position) and the corresponding position in order to facilitate the deep learning detection model to accurately learn the matching video clips and the corresponding similarity between the target video and the candidate video, this specification implements For example, a temporal similarity matrix feature map can be constructed based on the vector retrieval results between the target video and the candidate video.
  • a temporal similarity matrix feature map between the target video and the candidate video including: based on multiple feature vectors of the target video and multiple feature vectors in the candidate video.
  • construct a similarity matrix between multiple feature vectors of the target video and multiple feature vectors of the candidate video based on the similarity matrix between multiple feature vectors of the target video and multiple feature vectors in the candidate video , in the temporal dimension, construct the temporal similarity matrix feature map between the target video and the candidate video.
  • the multiple feature vectors of the target video can be compared with the feature vectors in the candidate video.
  • the distribution of similarity matrices between multiple feature vectors is plotted in a two-dimensional feature map. As shown in FIG. 4 , a feature map of a temporal similarity matrix between a target video and a candidate video is drawn in the video matching method provided in the embodiment of the present specification.
  • the abscissa is the time domain axis of the target video
  • the ordinate is the time domain value of the candidate video
  • the triangle-shaped pattern corresponds to a feature vector of the target video and the candidate video
  • the square-shaped pattern corresponds to the target video and the candidate video.
  • Another feature vector of the candidate video, the value of each pattern is the similarity score in the vector retrieval result.
  • the different feature vectors can be drawn in the same time-domain similarity matrix feature map.
  • different eigenvectors can also be drawn in different time-domain similarity matrix feature maps, that is, on the left side of the lower half as shown in Figure 3, and the time-domain similarity obtained by drawing each eigenvector
  • the matrix feature map is used as a channel input of the deep learning detection model, then when there are multiple feature vectors in multiple feature vectors of the target vector, there will be multiple time domain similarity matrix feature maps as the multiple channels of the deep learning detection model. enter.
  • the embodiment of this specification may, according to the temporal correspondence between the target video and the candidate video, Construct a temporal similarity matrix feature map between the target video and the candidate video.
  • a temporal similarity matrix feature map between the target video and the candidate video Including: drawing the similarity matrix between multiple feature vectors of the target video and multiple feature vectors in the candidate video on a two-dimensional feature map according to the time-domain correspondence between the target video and the candidate video to obtain the target video Temporal similarity matrix feature maps between videos and candidate videos.
  • the relationship between the multiple feature vectors of the target video and the multiple feature vectors of each candidate video in the candidate video is compared.
  • the similarity matrix is drawn on the two-dimensional feature map to obtain the time-domain similarity matrix feature map between the target video and the candidate video, including: according to the time-domain correspondence between the target video and multiple candidate videos, respectively
  • the similarity matrix between the multiple feature vectors of the video and the multiple feature vectors of each candidate video in the multiple candidate videos is drawn on multiple two-dimensional feature maps, and the similarity between the target video and multiple candidate videos is obtained.
  • temporal similarity matrix feature maps splicing multiple temporal similarity matrix feature maps between the target video and multiple candidate videos to obtain temporal similarity matrix feature maps between the target video and multiple candidate videos .
  • the similarity matrix between the multiple feature vectors of the target video and the multiple feature vectors of each candidate video in the multiple candidate videos can be drawn on multiple two-dimensional feature maps. , and obtain multiple temporal similarity matrix feature maps between the target video and multiple candidate videos.
  • the multiple time-domain similarity matrix feature maps can be spliced to obtain a time-domain similarity matrix feature map. For example, when there are four candidate videos, four temporal similarity matrix feature maps between the target video and the four candidate videos will be obtained, and then these four temporal similarity matrix feature maps will be spliced into 2 ⁇ 2 The temporal similarity matrix feature map of , is used as the input to the deep learning detection model.
  • the deep learning detection model is obtained by training based on the temporal similarity matrix feature maps of multiple sets of sample videos and the corresponding labels, wherein the sample videos include the query videos and the candidate videos corresponding to the query videos, and the labels corresponding to the sample videos include the sample videos.
  • the video clips in the corresponding candidate videos that match the query video in and the infringement flag, where the infringement flag is used to identify whether the matched video clip is infringed. It should be understood that the number of candidate videos corresponding to the query video may be one or multiple.
  • the sample video includes the query video and multiple candidate videos corresponding to the query video
  • the label corresponding to the sample video includes the video that matches the query video in the sample video among the corresponding candidate videos Fragments and Infringement Marks.
  • the label corresponding to the sample video includes a video clip and a corresponding label (label), and the label is infringement or non-infringement.
  • the label corresponding to the sample video used to train the deep learning detection model is usually a discretized label, that is, “Yes” or “No”, which corresponds to the label in the embodiments of this specification.
  • the label is "infringement” or “non-infringement”
  • the output is the detection frame position [x1, y1, x2, y2], where [x1, x2] corresponds to the target video.
  • the matching time segment of , [y1, y2] corresponds to the matching time segment in the candidate video, and the confidence level corresponding to [x1, y1, x2, y2] is used to characterize the similarity of the matching time segment.
  • the time-domain similarity matrix feature map of each group of sample videos is the time-domain similarity matrix feature map between the group of query videos and its corresponding candidate videos, and the acquisition method of the time-domain similarity matrix feature map is the same as above.
  • the acquisition method of the time-domain similarity matrix feature map is the same, which is not repeated in this paper.
  • the deep learning detection models in the embodiments of this specification include but are not limited to the following models: a faster convolutional neural network-based candidate region detection model Faster-Rcnn; a masked convolutional neural network-based candidate Region detection model Mask-Rcnn; real-time object detection model YOLO; single-shot multi-frame detection model SSD.
  • the training process of the deep learning detection model Faster-Rcnn is as follows: input the test image; input the whole image into the convolutional neural network for feature extraction; use RPN to generate a bunch of anchor boxes, cut and filter them, and then judge the anchors by softmax It belongs to the foreground or background, that is, an object or not, that is, a two-classification process; at the same time, another branch frame returns to correct the anchor frame to form a more accurate proposal (Note: The more accurate here is Relative to the next frame regression of the fully connected layer); map the proposed window to the convolutional feature map of the last layer of the convolutional neural network; generate a fixed-size feature map for each RoI through the RoI pooling layer; use Softmax Loss (detection classification probability) and Smooth L1Loss (detection bounding box regression) jointly train the classification probability and bounding box regression.
  • the deep learning detection model Mask-Rcnn is based on the Faster RCNN prototype and adds a branch for the segmentation task, that is, FCN (full convolutional neural network) is used for each Proposal Box of Faster RCNN.
  • FCN full convolutional neural network
  • the connection layer is converted into a convolutional layer) for semantic segmentation, and the segmentation task is carried out at the same time as the positioning and classification tasks.
  • the deep learning detection model YOLO (full English name: You Only Look Once) is an object detection model.
  • YOLO has a concise architecture based on CNN and anchor boxes, and is a real-time object detection technology for commonly used problems.
  • YOLO divides the image into 13 ⁇ 13 cells: each cell is responsible for predicting 5 bounding boxes. The bounding box describes the rectangle that encloses the object.
  • YOLO also outputs a confidence level (ie, similarity in the embodiment of this specification), which is used to indicate the degree to which the predicted bounding box actually contains a certain object.
  • Previous detection systems used classifiers or localizers for detection, applying the model to multiple locations and scales of the image, with high-scoring regions of the image as detection targets.
  • YOLO takes a completely different approach. It applies a single neural network to the entire image, which divides the image into regions, predicts bounding boxes and probabilities for each region, and assigns weights to all boxes according to the probability.
  • SSD does not generate proposals, which greatly improves the detection speed.
  • the traditional approach is to convert the image into different sizes (image pyramids) first, then detect them separately, and finally combine the results (NMS).
  • NMS results
  • the SSD algorithm can achieve the same effect by synthesizing the feature maps of different convolutional layers.
  • the main network structure of the algorithm is VGG16, and the last two fully connected layers are changed to convolutional layers, and then 4 convolutional layers are added to construct the network structure.
  • the outputs (feature maps) of five different convolutional layers are convolved with two different 3 ⁇ 3 convolution kernels, one outputs the confidence for classification, and each default box generates 21 categories of confidence; one Output localization for regression, each default box generates 4 coordinate values (x, y, w, h).
  • the temporal similarity matrix feature map is used as the input of the deep learning detection model to output the video clips in the candidate videos that match the target video and the corresponding similarity, including: using the temporal similarity matrix feature map as The input of the deep learning detection model is to output the interval range of the video clips in the candidate video that match the target video in the temporal dimension, and the similarity between the matching video clips.
  • the deep learning detection model outputs the position and confidence of the detection frame on each temporal similarity matrix feature map, so as to achieve the purpose of locating the infringement of the target video.
  • Time segment, [y1, y2] is the time segment in the candidate video, and the similarity between the matched video segments can be specifically characterized by confidence.
  • the detection frame matching the target video and the matching detection frame in the candidate video can be output similarity between.
  • multiple feature vectors of the target video can be obtained, and based on the multiple feature vectors of the target video, candidate videos similar to the target video can be retrieved from the video database, and then based on the target video and the candidate videos , construct the temporal similarity matrix feature map between the target video and the candidate video, and finally, use the temporal similarity matrix feature map as the input of the deep learning detection model, and output the video clips in the candidate video that match the target video and The corresponding similarity; wherein, the deep learning detection model is obtained by training based on the temporal similarity matrix feature maps of multiple groups of sample videos and the corresponding labels.
  • the sample videos include the query videos and the candidate videos corresponding to the query videos, and the sample videos corresponding
  • the tags include the video clips and infringement marks matched by the query video in the sample video in the corresponding candidate video.
  • the methods provided in the embodiments of this specification use a deep learning detection model. On the one hand, in terms of the efficiency of infringement location, it can detect any number of infringing segments of a possible infringing video, and at the same time, using vector retrieval combined with the detection model can greatly improve the detection of infringing videos. Efficiency; on the other hand, it also reduces the cost of manual review.
  • FIG. 5 is a schematic structural diagram of a blockchain-based infringement certificate storage device 500 provided by one or more embodiments of this specification, including: a candidate video retrieval module 510, based on multiple feature vectors of the target video, from the video database retrieve a candidate video that is similar to the target video; the feature map building module 520, based on the target video and the candidate video, constructs a temporal similarity matrix feature map between the target video and the candidate video; model
  • the output module 530 uses the time-domain similarity matrix feature map as the input of the deep learning detection model to output the video clips and the corresponding similarity in the candidate videos that match the target video; the evidence upload module 540, When the similarity corresponding to the video segment in the candidate video that matches the target video is greater than or equal to the preset similarity threshold, it will include the abstract of the target video, the candidate video and the target video.
  • the matching video clips and the corresponding similarity of infringement evidence are uploaded to the blockchain.
  • the deep learning detection model is obtained by training based on temporal similarity matrix feature maps of multiple groups of sample videos and corresponding labels, wherein the sample videos include query videos and candidate videos corresponding to the query videos, and the sample videos correspond to The tags include the video clips and infringement marks matched by the query video in the sample video in the corresponding candidate video.
  • the video matching apparatus 500 can implement the methods of the method embodiments shown in FIG. 1 and FIG. 2 .
  • FIG. 6 is a schematic structural diagram of a video matching apparatus 600 provided by one or more embodiments of this specification, including: a feature vector acquisition module 610, which acquires multiple feature vectors of a target video; a candidate video retrieval module 620, based on the target video A plurality of feature vectors of the video, from the video database to retrieve candidate videos similar to the target video; the feature map construction module 630, based on the target video and the candidate video, constructs the target video and the candidate video.
  • a feature vector acquisition module 610 which acquires multiple feature vectors of a target video
  • a candidate video retrieval module 620 based on the target video
  • a plurality of feature vectors of the video from the video database to retrieve candidate videos similar to the target video
  • the feature map construction module 630 based on the target video and the candidate video, constructs the target video and the candidate video.
  • the candidate video retrieval module 620 is configured to: obtain, from the video database, a retrieval result of feature vectors that is similar to multiple feature vectors of the target video; For the retrieval results of feature vectors similar to multiple feature vectors of the target video, candidate videos similar to the target video are obtained from the video database.
  • the feature map construction module 630 is configured to: based on the relationship between multiple feature vectors of the target video and multiple feature vectors of each candidate video in the candidate videos.
  • the vector retrieval result constructs the similarity matrix between the multiple feature vectors of the target video and the multiple feature vectors of the candidate video; based on the multiple feature vectors of the target video and the multiple feature vectors in the candidate video.
  • the similarity matrix between feature vectors, in the time domain dimension constructs a time domain similarity matrix feature map between the target video and the candidate video.
  • the feature map construction module 630 is configured to: according to the temporal correspondence between the target video and the candidate video, combine the multiple feature vectors of the target video with the The similarity matrix between multiple feature vectors of the candidate video is drawn on a two-dimensional feature map to obtain a temporal similarity matrix feature map between the target video and the candidate video.
  • the feature map construction module 630 is configured to: according to the temporal correspondence between the target video and the multiple candidate videos , draw the similarity matrix between the multiple feature vectors of the target video and the multiple feature vectors of each candidate video in the multiple candidate videos on multiple two-dimensional feature maps to obtain the target Multiple temporal similarity matrix feature maps between the video and the multiple candidate videos; splicing multiple temporal similarity matrix feature maps between the target video and the multiple candidate videos to obtain the A temporal similarity matrix feature map between the target video and the multiple candidate videos.
  • the model output module 640 is configured to: use the temporal similarity matrix feature map as an input of a deep learning detection model, so as to output the candidate video that matches the target video.
  • the deep learning detection model includes at least one of the following: a faster convolutional neural network-based candidate region detection model Faster-Rcnn; a masked convolutional neural network-based The candidate region detection model Mask-Rcnn; the real-time object detection model YOLO; the single-shot multi-frame detection model SSD.
  • the video matching apparatus 600 can implement the methods of the method embodiments shown in FIGS. 2 to 4 .
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present specification.
  • the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory.
  • the memory may include memory, such as high-speed random-access memory (Random-Access Memory, RAM), or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • RAM random-Access Memory
  • non-volatile memory such as at least one disk memory.
  • the electronic equipment may also include hardware required for other services.
  • the processor, network interface and memory can be connected to each other through an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Component Interconnect Standard) bus. Industry Standard Architecture, extended industry standard structure) bus, etc.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one bidirectional arrow is used in FIG. 7, but it does not mean that there is only one bus or one type of bus.
  • the program may include program code, and the program code includes computer operation instructions.
  • the memory may include memory and non-volatile memory and provide instructions and data to the processor.
  • the processor reads the corresponding computer program from the non-volatile memory into the memory and runs it, forming a blockchain-based infringement certificate device at the logical level.
  • the processor executes the program stored in the memory, and is specifically configured to perform the following operations: based on a plurality of feature vectors of the target video, from a video database, retrieve a candidate video similar to the target video; based on the target video and the target video candidate video, construct a temporal similarity matrix feature map between the target video and the candidate video; use the temporal similarity matrix feature map as the input of the deep learning detection model to output the The video clips matching the target video and the corresponding similarity; when the similarity corresponding to the video clips matching the target video in the candidate video is greater than or equal to the preset similarity threshold, the The abstract of the target video, the video clips in the candidate video that match the target video, and the corresponding infringement evidence of similarity are uploaded to the blockchain.
  • the deep learning detection model is obtained by training based on temporal similarity matrix feature maps of multiple groups of sample videos and corresponding labels, wherein the sample videos include query videos and candidate videos corresponding to the query videos, and the sample videos correspond to The tags include the video clips and infringement marks matched by the query video in the sample video in the corresponding candidate video.
  • the blockchain-based infringement certificate storage method disclosed in the embodiment shown in FIG. 1 of the present specification can be applied to a processor, or implemented by a processor.
  • a processor may be an integrated circuit chip with signal processing capabilities.
  • each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processor, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with one or more embodiments of this specification may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the electronic device can also execute the block chain-based infringement certificate storage method shown in FIG. 1 , which will not be repeated in this specification.
  • the electronic devices in this specification do not exclude other implementations, such as logic devices or the combination of software and hardware, etc. That is to say, the execution subjects of the following processing procedures are not limited to each logic unit. It can also be a hardware or logic device.
  • the embodiments of the present specification also provide a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and the one or more programs include instructions, and the instructions, when used by a portable electronic device including a plurality of application programs During execution, the portable electronic device can be made to execute the method of the embodiment shown in FIG.
  • the target video retrieves a candidate video similar to the target video from the video database ; Based on the target video and the candidate video, construct the temporal similarity matrix feature map between the target video and the candidate video;
  • the temporal similarity matrix feature map is used as the input of the deep learning detection model , to output the video clips in the candidate video that match the target video and the corresponding similarity; when the similarity corresponding to the video clips matching the target video in the candidate video is greater than or equal to the preset similarity
  • the degree threshold is reached, upload the infringement evidence containing the abstract of the target video, the video clips in the candidate video that match the target video, and the corresponding similarity to the blockchain.
  • the deep learning detection model is obtained by training based on temporal similarity matrix feature maps of multiple groups of sample videos and corresponding labels, wherein the sample videos include query videos and candidate videos corresponding to the query videos, and the sample videos correspond to The tags include the video clips and infringement marks matched by the query video in the sample video in the corresponding candidate video.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present specification.
  • the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory.
  • the memory may include memory, such as high-speed random-access memory (Random-Access Memory, RAM), or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • RAM Random-Access Memory
  • non-volatile memory such as at least one disk memory.
  • the electronic equipment may also include hardware required for other services.
  • the processor, network interface and memory can be connected to each other through an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Component Interconnect) bus. Industry Standard Architecture, extended industry standard structure) bus, etc.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one bidirectional arrow is used in FIG. 8, but it does not mean that there is only one bus or one type of bus.
  • the program may include program code, and the program code includes computer operation instructions.
  • the memory may include memory and non-volatile memory and provide instructions and data to the processor.
  • the processor reads the corresponding computer program from the non-volatile memory into the memory and runs it, forming a video matching device on a logical level.
  • the processor executes the program stored in the memory, and is specifically configured to perform the following operations: Acquire multiple feature vectors of the target video; Based on the multiple feature vectors of the target video, retrieve from the video database similar to the target video candidate video; based on the target video and the candidate video, construct a temporal similarity matrix feature map between the target video and the candidate video; use the temporal similarity matrix feature map as a deep learning detection model input, to output the video clips that match the target video in the candidate video and the corresponding similarity; wherein, the deep learning detection model is based on the temporal similarity matrix feature map of multiple groups of sample videos and the corresponding similarity
  • the sample video includes the query video and the candidate video corresponding to the query video, and the label corresponding to the sample video includes the video clip and the infringement mark that the query video in the sample video matches in the corresponding candidate video.
  • the electronic device when locating a target video for infringement, multiple feature vectors of the target video can be obtained, and based on the multiple feature vectors of the target video, the target video can be retrieved from the video database with the same value as the target video. Similar candidate videos, and then based on the target video and the candidate video, construct the temporal similarity matrix feature map between the target video and the candidate video. Finally, the temporal similarity matrix feature map is used as the input of the deep learning detection model, and the output is obtained.
  • the video clips in the candidate video that match the target video and the corresponding similarity on the one hand, in terms of the efficiency of infringement positioning, it can detect multiple features of any number of infringing clips of possible infringing videos, and at the same time use the vector retrieval combined with the detection model. It can greatly improve the detection efficiency of infringing videos; on the other hand, it also reduces the cost of manual review.
  • each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processor, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with one or more embodiments of this specification may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the electronic device can also perform the video matching methods shown in FIGS. 2 to 4 , which will not be repeated in this specification.
  • the electronic devices in this specification do not exclude other implementations, such as logic devices or the combination of software and hardware, etc. That is to say, the execution subjects of the following processing procedures are not limited to each logic unit. It can also be a hardware or logic device.
  • the embodiments of the present specification also provide a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and the one or more programs include instructions, and the instructions, when used by a portable electronic device including a plurality of application programs During execution, the portable electronic device can be made to execute the method of the embodiment shown in FIG. 2 to FIG.
  • the deep learning detection model 4 is specifically configured to perform the following operations: obtaining multiple feature vectors of the target video; based on the multiple feature vectors of the target video, Retrieve a candidate video similar to the target video from the video database; build a temporal similarity matrix feature map between the target video and the candidate video based on the target video and the candidate video; The time-domain similarity matrix feature map is used as the input of the deep learning detection model to output the video clips and the corresponding similarity degrees that match the target video in the candidate video; wherein, the deep learning detection model is based on multiple groups.
  • the temporal similarity matrix feature map of the sample video and the corresponding label are obtained by training, wherein the sample video includes the query video and the candidate video corresponding to the query video, and the label corresponding to the sample video includes the query video in the sample video in the corresponding candidate video.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or A combination of any of these devices.
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computer Hardware Design (AREA)
  • Bioethics (AREA)
  • Technology Law (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

本说明书公开了一种视频匹配方法、基于区块链的侵权存证方法和装置,该基于区块链的侵权存证方法包括:获取目标视频的多个特征向量;基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;当相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中。

Description

视频匹配方法、基于区块链的侵权存证方法和装置 技术领域
本文件涉及计算机技术领域,尤其涉及一种视频匹配方法、基于区块链的侵权存证方法和装置。
背景技术
目前,在对某个可疑的侵权视频进行侵权定位时,通常是首先从该视频中提取多种特征,再经过搜索引擎检索后,得到与该视频相匹配的多个候选视频的匹配结果。而对该可疑的侵权视频进行最终的侵权定位,还需要分别计算多个候选视频与该可疑的侵权视频的相似度。这需要一种高鲁棒性的算法来应对视频与视频之间的多种特征的误匹配和漏匹配,以及支持多个视频片段的侵权定位。
然而,现有的视频匹配方法容易受到特征检索结果的噪声影响,且随着视频时长的增加,视频与视频之间匹配的效率也急剧降低。此外,针对侵权视频以及其侵权证据的存证,对于业界来说,也是亟需解决的问题。
发明内容
本说明书实施例提供了一种视频匹配方法、基于区块链的侵权存证方法和装置,以应对视频与视频之间的多种特征的误匹配和漏匹配,以及支持多个视频片段的侵权定位,提高视频匹配的效率,从而降低人工审核的成本。
为解决上述技术问题,本说明书实施例是这样实现的:第一方面,提出了一种视频匹配方法,包括:基于目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;当所述候选视频中与所述目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中。
其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
第二方面,一种基于区块链的侵权视频存证方法,包括:获取目标视频的多个特征向量;基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;其中,所述 深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
第三方面,提出了一种基于区块链的侵权存证装置,包括:候选视频检索模块,基于目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;特征图构建模块,基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;模型输出模块,将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;证据上传模块,当所述候选视频中与所述目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中。
其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
第四方面,提出了一种视频匹配装置,包括:特征向量获取模块,获取目标视频的多个特征向量;候选视频检索模块,基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;特征图构建模块,基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;模型输出模块,将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
第五方面,提出了一种电子设备,包括:处理器;以及被安排成存储计算机可执行指令的存储器,所述可执行指令在被执行时使所述处理器执行以下操作:基于目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;当所述候选视频中与所述目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中。
其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
第六方面,提出了一种计算机可读存储介质,所述计算机可读存储介质存储一个或 多个程序,所述一个或多个程序当被包括多个应用程序的电子设备执行时,使得所述电子设备执行以下操作:基于目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;当所述候选视频中与所述目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中。
其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
第七方面,提出了一种电子设备,包括:处理器;以及被安排成存储计算机可执行指令的存储器,所述可执行指令在被执行时使所述处理器执行以下操作:获取目标视频的多个特征向量;基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
第八方面,提出了一种计算机可读存储介质,所述计算机可读存储介质存储一个或多个程序,所述一个或多个程序当被包括多个应用程序的电子设备执行时,使得所述电子设备执行以下操作:获取目标视频的多个特征向量;基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
本说明书实施例采用上述技术方案至少可以达到下述技术效果:在对目标视频进行侵权定位时,能够基于该目标视频的多个特征向量,从视频数据库中检索与目标视频相似的候选视频,再基于目标视频和候选视频,构建目标视频和候选视频之间的时域相似度矩阵特征图,最后,将时域相似度矩阵特征图作为深度学习检测模型的输入,输出得到候选视频中与目标视频相匹配的视频片段和对应的相似度;并在候选视频中与目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有目标视频的摘要、候选视频中与目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块 链中。本说明书实施例提供的方法利用深度学习检测模型,一方面在侵权定位的效率上,能够检测可能的侵权视频的任意多个侵权片段,同时利用向量检索结合检测模型能够极大地提高侵权视频的检测效率;另一方面,也降低了人工审核的成本。此外,还利用区块链不可篡改的特性,将存在侵权的目标视频的摘要、候选视频中与目标视频相匹配的视频片段和对应的相似度上传至区块链中,以备侵权指证时从区块链中获取目标视频侵权的证据。
附图说明
此处所说明的附图用来提供对本说明书的进一步理解,构成本说明书的一部分,本说明书的示意性实施例及其说明用于解释本说明书,并不构成对本说明书的不当限定。在附图中:
图1为本说明书一个实施例提供的基于区块链的侵权存证方法的实现流程示意图。
图2为本说明书一个实施例提供的一种视频匹配方法的实现流程示意图。
图3为本说明书一个实施例提供的视频匹配方法的应用于一种场景中的流程示意图。
图4为本说明书一个实施例提供的视频匹配方法中绘制的时域相似度矩阵特征图的示意图。
图5为本说明书一个实施例提供的基于区块链的侵权存证装置的结构示意图。
图6为本说明书一个实施例提供的视频匹配装置的结构示意图。
图7为本说明书一个实施例提供的一种电子设备的结构示意图。
图8为本说明书一个实施例提供的另一种电子设备的结构示意图。
具体实施方式
为使本文件的目的、技术方案和优点更加清楚,下面将结合本说明书具体实施例及相应的附图对本说明书技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本文件一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本文件保护的范围。
以下结合附图,详细说明本说明书各实施例提供的技术方案。
针对侵权视频以及其侵权证据的存证,对于业界来说,也是亟需解决的问题。本说明书实施例通过引入区块链,利用区块链的不可篡改的特性,将目标视频的相关信息、候选视频相关信息及目标视频是否侵权的信息写入到区块链中,从而保证区块链中的侵权信息的可信性,以用于对目标视频是否侵权进行快速取证。具体地,图1是本说明书一个实施例提供的一种基于区块链的侵权视频存证方法的实现流程示意图,包括:
S110,基于目标视频的多个特征向量,从视频数据库中检索与目标视频相似的候选视频。
S120,基于目标视频和所述候选视频,构建目标视频和候选视频之间的时域相似度 矩阵特征图。
S130,将时域相似度矩阵特征图作为深度学习检测模型的输入,以输出候选视频中与目标视频相匹配的视频片段和对应的相似度。
其中,深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
S140,当候选视频中与目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有目标视频的摘要、候选视频中与目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中。
其中,预设相似度阈值可根据经验值得到,用于界定目标视频是否存在侵权,比如可以设置为60%。应理解,由于区块链中的存储空间有限,本说明书实施例对目标视频的侵权证据进行存证时,可将目标视频通过哈希加密算法转化为一串哈希值,将该目标视频的哈希值、以及候选视频中与目标视频相匹配的视频片段和对应的相似度上传至区块链中,由区块链中具备存证权限的节点对该侵权证据进行共识操作,并在共识之后记录到新生成的区块中。当需要获取该侵权证据时,可基于目标视频的哈希值从区块链中下载包含有目标视频的哈希值的侵权证据。
图1所示实施例相关步骤的具体实现可参考下文所述的图2所示实施例中对应的步骤的具体实现,本说明书一个或多个实施例在此不再赘述。
在对目标视频进行侵权定位时,能够基于该目标视频的多个特征向量,从视频数据库中检索与目标视频相似的候选视频,再基于目标视频和候选视频,构建目标视频和候选视频之间的时域相似度矩阵特征图,最后,将时域相似度矩阵特征图作为深度学习检测模型的输入,输出得到候选视频中与目标视频相匹配的视频片段和对应的相似度;并在候选视频中与目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有目标视频的摘要、候选视频中与目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中,其中,深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。本说明书实施例提供的方法利用区块链不可篡改的特性,将存在侵权的目标视频的摘要、候选视频中与目标视频相匹配的视频片段和对应的相似度上传至区块链中,以备侵权指证时从区块链中获取目标视频侵权的证据。
如背景技术中所述,在对侵权视频进行侵权定位时,需要从侵权视频中提取的多种特征向量,经过指定搜索引擎检索后,得到的向量检索结果中会包含N个候选视频的匹配结果。这些结果,需要分别与侵权视频计算相似度并进行侵权定位。这里就需要高鲁棒性的算法来应对特征向量的误匹配和漏匹配,同时检索结果中如果包含的搜索引擎粗排序的视频集合大,则要求高效率。
此外,该视频匹配算法还要支持多个视频片段的侵权定位,来降低人工审核的成本。 然而业界常用的动态规划算法,CCF竞赛等方案都容易受到特征向量检索结果的噪声影响,不够鲁棒,而且随着侵权视频时长的增加,视频匹配的效率也急剧降低。
本说明书实施例为了解决现有的侵权视频的检测效率和准确率较低的问题,还提出一种视频匹配方法,能够获取目标视频的多个特征向量,并基于该目标视频的多个特征向量,从视频数据库中检索与目标视频相似的候选视频,再基于目标视频和候选视频,构建目标视频和候选视频之间的时域相似度矩阵特征图,最后,将时域相似度矩阵特征图作为深度学习检测模型的输入,输出得到候选视频中与目标视频相匹配的视频片段和对应的相似度;其中,深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,样本视频包含查询视频和查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
本说明书实施例提供的方法利用深度学习检测模型,一方面在侵权定位的效率上,能够检测可能的侵权视频的任意多个侵权片段,同时利用向量检索结合检测模型能够极大地提高侵权视频的检测效率;另一方面,也降低了人工审核的成本。
本说明书实施例提供的视频方法,该方法的执行主体,可以但不限于个人电脑、服务器等能够被配置为执行本发明实施例提供的该方法的装置中的至少一种。
为便于描述,下文以该方法的执行主体为能够执行该方法的服务器为例,对该方法的实施方式进行介绍。可以理解,该方法的执行主体为服务器只是一种示例性的说明,并不应理解为对该方法的限定。
具体地,本说明书一个或多个实施例提供的一种视频匹配方法的实现流程示意图如图2所示,包括:S210,获取目标视频的多个特征向量。
其中,目标视频具体可以是可疑的侵权视频,后续所述的候选视频即可以作为该可疑的侵权视频侵权的证据。
可选地,获取目标视频的多个特征向量具体可以将目标视频拆分为多个视频片段,再针对每个视频片段抽取一种或多种特征向量。或者,还可以对目标视频进行抽帧得到多个视频帧,可抽取目标视频中的关键帧也可随机抽取目标视频中的多个视频帧,还可以每隔预设时间段抽取目标视频中的一个视频帧得到多个视频帧,再对抽取的视频帧抽取一种或多种特征向量。其中一种特征向量对应于一种特征提取算法。
因此,目标视频的多个特征向量具体可以包括目标视频的多个视频片段或视频帧对应的多个特征向量,一个视频片段或视频帧对应一个特征向量;或者,目标视频的多个特征向量还可以包括:通过多种特征提取算法提取得到的目标视频的多种特征向量;或者,目标视频的多个特征向量还可以包括:通过多种特征提取算法分别对目标视频的多个视频片段或视频帧提取得到的多个特征向量,一个视频片段或视频帧对应多种特征向量。
S220,基于目标视频的多个特征向量,从视频数据库中检索与目标视频相似的候选视频。
其中,视频数据库中包含了海量的视频,每个视频对应一种或多种特征向量,一种 特征向量对应于一种特征提取算法。
可选地,可分别基于目标视频的多个特征向量中的各特征向量,从视频数据库中检索与目标视频的各特征向量相匹配的特征向量,再确定这些相匹配的特征向量对应的视频即为候选视频。具体地,基于目标视频的多个特征向量,从视频数据库中检索与目标视频相似的候选视频,包括:从视频数据库中,获取与目标视频的多个特征向量相似的特征向量检索结果;基于与目标视频的多个特征向量相似的特征向量检索结果,从视频数据库中获取与目标视频相似的候选视频。
其中,与目标视频的多个特征向量相似的特征向量检索结果具体可包括:与各特征向量相匹配的前几个特征向量,或者与各特征向量最匹配的一个特征向量。比如,可以从视频数据库中,分别获取与目标视频的多个特征向量中的各视频向量相匹配的前k个特征向量,再确定这前k个特征向量对应的m个候选视频,其中m小于或等于k,且m大于或等于1,当m=k时,则表明这k个特征向量来自k个不同的候选视频,当m=1时,则表明这k个特征向量来自同一个候选视频,或者也可以从视频数据库中,分别获取与目标视频的多个特征向量中的各视频向量最匹配的一个特征向量,再确定该最匹配的一个特征向量对应的候选视频。也就是说,一个特征向量的检索结果,可能对应于一个候选视频的多个匹配特征向量,也可能是多个候选视频的不同匹配特征向量。
如图3所示,为本说明书实施例提供的视频匹配方法应用于一种实际场景中的示意图。在图3中,q1~qn为目标视频的多个特征向量,V 3和V 1为从视频数据库中检索到的与目标视频相似的两个候选视频的向量检索结果。图左侧中,V 3,q1为候选视频V 3中与目标视频的特征向量q1匹配位置的相似度值,V 3,q2为候选视频V 3中与目标视频的特征向量q2匹配位置的相似度值,V 3,qn为候选视频V 3中与目标视频的特征向量qn匹配位置的相似度值;图右侧中,V 1,q1为候选视频V 1中与目标视频的特征向量q1匹配位置的相似度值,V 1,qn为候选视频V 1中与目标视频的特征向量qn匹配位置的相似度值。
S230,基于目标视频和候选视频,构建目标视频和所述候选视频之间的时域相似度矩阵特征图。
应理解,由于上述目标视频的多个特征向量、与候选视频中的多个特征向量之间的向量检索结果中会包含有图3所示的目标视频的特征向量与候选视频的特征向量之间相匹配的位置(也就是相似的位置)和对应位置的相似度,为了便于深度学习检测模型能够准确地学习到目标视频与候选视频之间相匹配的视频片段和对应的相似度,本说明书实施例可基于目标视频和候选视频之间的向量检索结果构建时域相似度矩阵特征图。具体地,基于目标视频和候选视频,构建目标视频和候选视频之间的时域相似度矩阵特征图,包括:基于目标视频的多个特征向量、与候选视频中的多个特征向量之间的向量检索结果,构建目标视频的多个特征向量与候选视频的多个特征向量之间的相似度矩阵;基于目标视频的多个特征向量与候选视频中的多个特征向量之间的相似度矩阵,在时域维度上,构建目标视频和候选视频之间的时域相似度矩阵特征图。
当候选视频的数量为一个时,可基于目标视频的与候选视频中的多个特征向量之间的相似度矩阵,在同一时域维度上,将目标视频的多个特征向量与候选视频中的多个特 征向量之间的相似度矩阵的分布绘制在二维特征图中。如图4所示,为本说明书实施例提供的视频匹配方法中绘制得到的目标视频和候选视频之间的时域相似度矩阵特征图。在图4中,横坐标为目标视频的时域轴,纵坐标为候选视频的时域值,三角形状的图样对应于目标视频和候选视频的一种特征向量,方块形状的对应于目标视频和候选视频的另一种特征向量,各图样的取值为向量检索结果中的相似度分数。在实际应用中,为提高视频匹配的效率,可将这不同的特征向量绘制在同一时域相似度矩阵特征图中。
或者,还可将不同的特征向量绘制在不同的时域相似度矩阵特征图中,即如图3所示的下半部分的左侧,可将每一种特征向量绘制得到的时域相似度矩阵特征图作为深度学习检测模型的一个通道输入,那么目标向量的多个特征向量存在多种特征向量时,则会存在多个时域相似度矩阵特征图作为深度学习检测模型的多个通道的输入。
可选地,为便于深度学习检测模型准确高效地确定目标视频与候选视频之间的相似视频片段和对应的相似度,本说明书实施例可根据目标视频和候选视频之间的时域对应关系,构建目标视频和候选视频之间的时域相似度矩阵特征图。具体地,基于目标视频的多个特征向量与候选视频中的多个特征向量之间的相似度矩阵,在时域维度上,构建目标视频和候选视频之间的时域相似度矩阵特征图,包括:根据目标视频和候选视频之间的时域对应关系,将目标视频的多个特征向量与候选视频中的多个特征向量之间的相似度矩阵,绘制在二维特征图上,得到目标视频和候选视频之间的时域相似度矩阵特征图。
可选地,当候选视频有多个时,根据目标视频和候选视频之间的时域对应关系,将目标视频的多个特征向量与候选视频中的各候选视频的多个特征向量之间的相似度矩阵,绘制在二维特征图上,得到目标视频和候选视频之间的时域相似度矩阵特征图,包括:根据目标视频和多个候选视频之间的时域对应关系,分别将目标视频的多个特征向量与多个候选视频中的各候选视频的多个特征向量之间的相似度矩阵,绘制在多个二维特征图上,得到目标视频和多个候选视频之间的多个时域相似度矩阵特征图;将目标视频和多个候选视频之间的多个时域相似度矩阵特征图进行拼接,得到目标视频和多个候选视频之间的时域相似度矩阵特征图。
当候选视频有多个时,则可分别将目标视频的多个特征向量与多个候选视频中的各候选视频的多个特征向量之间的相似度矩阵,绘制在多个二维特征图上,得到目标视频和多个候选视频之间的多个时域相似度矩阵特征图。为了提高深度学习检测模型的学习效率,可将这多个时域相似度矩阵特征图进行拼接,得到一个时域相似度矩阵特征图。比如,当候选视频有四个时,则会得到目标视频和四个候选视频之间的四个时域相似度矩阵特征图,再将这四个时域相似度矩阵特征图拼接为2×2的时域相似度矩阵特征图作为深度学习检测模型的输入。
S240,将时域相似度矩阵特征图作为深度学习检测模型的输入,以输出候选视频中与目标视频相匹配的视频片段和对应的相似度。
其中,深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志, 侵权标志用于标识该相匹配的视频片段是否存在侵权。应理解,查询视频对应的候选视频可以是一个也可以是多个。当查询视频对应的候选视频为多个时,样本视频包含查询视频和查询视频对应的多个候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的各个候选视频中所匹配的视频片段及侵权标志。具体地,样本视频对应的标签包括视频片段和对应的label(标签),label为侵权或非侵权。
应理解,在对深度学习检测模型进行训练时,用于训练深度学习检测模型的样本视频对应的标签通常为离散化的标签,即“是”或“否”,对应于本说明书实施例中的标签即为“侵权”或“非侵权”,而在深度学习检测模型进行预测时,输出的则是检测框位置[x1,y1,x2,y2],其中[x1,x2]对应于目标视频中的匹配时间片段,[y1,y2]对应于候选视频中的匹配时间片段,以及[x1,y1,x2,y2]对应的置信度,用于表征匹配时间片段的相似度。
其中,每组样本视频的时域相似度矩阵特征图为该组查询视频和其对应的候选视频之间的时域相似度矩阵特征图,该时域相似度矩阵特征图的获取方式与上文所述的时域相似度矩阵特征图的获取方式一致,本文不再赘述。
可选地,本说明书实施例中的深度学习检测模型包括但不限于下述模型:更快速的基于卷积神经网络的候选区域检测模型Faster-Rcnn;带掩码的基于卷积神经网络的候选区域检测模型Mask-Rcnn;实时物体检测模型YOLO;单次多框检测模型SSD。
其中,深度学习检测模型Faster-Rcnn的训练过程为:输入测试图像;将整张图片输入卷积神经网络,进行特征提取;用RPN生成一堆锚框,对其进行裁剪过滤后通过softmax判断anchors属于前景(foreground)或者后景(background),即是物体或者不是物体,即是一个二分类过程;同时,另一分支边框回归修正锚框,形成较精确的proposal(注:这里的较精确是相对于后面全连接层的再一次边框回归而言);把建议窗口映射卷积神经网络的最后一层卷积特征图上;通过RoI pooling层使每个RoI生成固定尺寸的特征图;利用Softmax Loss(探测分类概率)和Smooth L1Loss(探测边框回归)对分类概率和边框回归联合训练。
深度学习检测模型Mask-Rcnn以Faster RCNN原型,增加了一个分支用于分割任务,即对于Faster RCNN的每个Proposal Box都要使用FCN(全卷积神经网络,将传统卷积神经网络中的全连接层转化成一个个的卷积层)进行语义分割,分割任务与定位、分类任务是同时进行的。
深度学习检测模型YOLO(英文全称:You Only Look Once),是一种对象检测模型,YOLO有基于CNN和锚框的简洁架构,并且是一种针对普遍使用问题的实时对象检测技术。YOLO将图像分成13×13个单元格:每个单元格负责预测5个边界框。边界框描述包围对象的矩形。YOLO还输出一个置信度(即本说明书实施例中的相似度),用于指示预测的边界框实际包含某个对象的程度。先前的检测系统使用分类器或定位器进行检测,将模型应用于图像的多个位置和尺度,图像的高得分区域作为检测目标。YOLO则采取了完全不同的方法。它将单个神经网络应用于整个图像,该网络将图像分成区域,预测每个区域的边界框和概率,并依据概率大小对所有边框分配权重。
与Faster-Rcnn相比,SSD没有生成proposal的过程,这就极大提高了检测速度。针对不同大小的目标检测,传统的做法是先将图像转换成不同大小(图像金字塔),然后 分别检测,最后将结果综合起来(NMS)。而SSD算法则利用不同卷积层的特征图进行综合也能达到同样的效果。算法的主网络结构是VGG16,将最后两个全连接层改成卷积层,并随后增加了4个卷积层来构造网络结构。对其中5种不同的卷积层的输出(特征图)分别用两个不同的3×3的卷积核进行卷积,一个输出分类用的confidence,每个default box生成21个类别confidence;一个输出回归用的localization,每个default box生成4个坐标值(x,y,w,h)。
可选地,将时域相似度矩阵特征图作为深度学习检测模型的输入,以输出候选视频中与目标视频相匹配的视频片段和对应的相似度,包括:将时域相似度矩阵特征图作为深度学习检测模型的输入,以输出候选视频中与目标视频相匹配的视频片段在时域维度上的区间范围、以及相匹配的视频片段之间的相似度。
具体地,深度学习检测模型输出各个时域相似度矩阵特征图上的检测框位置以及置信度,达到对目标视频的侵权定位的目的。其中,候选视频中与目标视频相匹配的视频片段在时域维度上的区间范围,具体可以是检测框位置:[x1,y1,x2,y2],其中[x1,x2]为目标视频中的时间片段,[y1,y2]为候选视频中的时间片段,相匹配的视频片段之间的相似度具体可以用置信度来表征。如图3下半部分的右侧所示,将时域相似度矩阵特征图作为深度学习检测模型的输入后,便可输出候选视频中与目标视频相匹配的检测框、以及相匹配的检测框之间的相似度。
在对目标视频进行侵权定位时,能够获取目标视频的多个特征向量,并基于该目标视频的多个特征向量,从视频数据库中检索与目标视频相似的候选视频,再基于目标视频和候选视频,构建目标视频和候选视频之间的时域相似度矩阵特征图,最后,将时域相似度矩阵特征图作为深度学习检测模型的输入,输出得到候选视频中与目标视频相匹配的视频片段和对应的相似度;其中,深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,样本视频包含查询视频和查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。本说明书实施例提供的方法利用深度学习检测模型,一方面在侵权定位的效率上,能够检测可能的侵权视频的任意多个侵权片段,同时利用向量检索结合检测模型能够极大地提高侵权视频的检测效率;另一方面,也降低了人工审核的成本。
图5是本说明书一个或多个实施例提供的一种基于区块链的侵权存证装置500的结构示意图,包括:候选视频检索模块510,基于目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;特征图构建模块520,基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;模型输出模块530,将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;证据上传模块540,当所述候选视频中与所述目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中。
其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本 视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
视频匹配装置500能够实现图1和图2的方法实施例的方法,具体可参考图1和图2所示实施例的基于区块链的侵权存证方法和视频匹配方法,不再赘述。
图6是本说明书一个或多个实施例提供的一种视频匹配装置600的结构示意图,包括:特征向量获取模块610,获取目标视频的多个特征向量;候选视频检索模块620,基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;特征图构建模块630,基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;模型输出模块640,将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
可选地,在一种实施方式,所述候选视频检索模块620,用于:从所述视频数据库中,获取与所述目标视频的多个特征向量相似的特征向量检索结果;基于与所述目标视频的多个特征向量相似的特征向量检索结果,从视频数据库中获取与所述目标视频相似的候选视频。
可选地,在一种实施方式,所述特征图构建模块630,用于:基于所述目标视频的多个特征向量、与所述候选视频中的各候选视频的多个特征向量之间的向量检索结果,构建所述目标视频的多个特征向量与所述候选视频的多个特征向量之间的相似度矩阵;基于所述目标视频的多个特征向量与所述候选视频中的多个特征向量之间的相似度矩阵,在时域维度上,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图。
可选地,在一种实施方式,所述特征图构建模块630,用于:根据所述目标视频和所述候选视频之间的时域对应关系,将所述目标视频的多个特征向量与所述候选视频的多个特征向量之间的相似度矩阵,绘制在二维特征图上,得到所述目标视频和所述候选视频之间的时域相似度矩阵特征图。
可选地,在一种实施方式,当所述候选视频有多个时,所述特征图构建模块630,用于:根据所述目标视频和所述多个候选视频之间的时域对应关系,分别将所述目标视频的多个特征向量与所述多个候选视频中的各候选视频的多个特征向量之间的相似度矩阵,绘制在多个二维特征图上,得到所述目标视频和所述多个候选视频之间的多个时域相似度矩阵特征图;将所述目标视频和所述多个候选视频之间的多个时域相似度矩阵特征图进行拼接,得到所述目标视频和所述多个候选视频之间的时域相似度矩阵特征图。
可选地,在一种实施方式,所述模型输出模块640,用于:将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段在时域维度上的区间范围、以及所述相匹配的视频片段之间的相似度。
可选地,在一种实施方式,所述深度学习检测模型包括下述至少一种:更快速的基 于卷积神经网络的候选区域检测模型Faster-Rcnn;带掩码的基于卷积神经网络的候选区域检测模型Mask-Rcnn;实时物体检测模型YOLO;单次多框检测模型SSD。
视频匹配装置600能够实现图2~图4的方法实施例的方法,具体可参考图2~图4所示实施例的视频匹配方法,不再赘述。
图7是本说明书的一个实施例提供的电子设备的结构示意图。请参考图7,在硬件层面,该电子设备包括处理器,可选地还包括内部总线、网络接口、存储器。其中,存储器可能包含内存,例如高速随机存取存储器(Random-Access Memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少1个磁盘存储器等。当然,该电子设备还可能包括其他业务所需要的硬件。
处理器、网络接口和存储器可以通过内部总线相互连接,该内部总线可以是ISA(Industry Standard Architecture,工业标准体系结构)总线、PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。
存储器,用于存放程序。具体地,程序可以包括程序代码,所述程序代码包括计算机操作指令。存储器可以包括内存和非易失性存储器,并向处理器提供指令和数据。
处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行,在逻辑层面上形成基于区块链的侵权存证装置。处理器,执行存储器所存放的程序,并具体用于执行以下操作:基于目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;当所述候选视频中与所述目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中。
其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
上述如本说明书图1所示实施例揭示的基于区块链的侵权存证方法可以应用于处理器中,或者由处理器实现。处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本说明书一个或多个实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可 以是任何常规的处理器等。结合本说明书一个或多个实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
该电子设备还可执行图1的基于区块链的侵权存证方法,本说明书在此不再赘述。
当然,除了软件实现方式之外,本说明书的电子设备并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。
本说明书实施例还提出了一种计算机可读存储介质,该计算机可读存储介质存储一个或多个程序,该一个或多个程序包括指令,该指令当被包括多个应用程序的便携式电子设备执行时,能够使该便携式电子设备执行图4所示实施例的方法,并具体用于执行以下操作:基于目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;当所述候选视频中与所述目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中。
其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
图8是本说明书的一个实施例提供的电子设备的结构示意图。请参考图8,在硬件层面,该电子设备包括处理器,可选地还包括内部总线、网络接口、存储器。其中,存储器可能包含内存,例如高速随机存取存储器(Random-Access Memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少1个磁盘存储器等。当然,该电子设备还可能包括其他业务所需要的硬件。
处理器、网络接口和存储器可以通过内部总线相互连接,该内部总线可以是ISA(Industry Standard Architecture,工业标准体系结构)总线、PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。
存储器,用于存放程序。具体地,程序可以包括程序代码,所述程序代码包括计算机操作指令。存储器可以包括内存和非易失性存储器,并向处理器提供指令和数据。
处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行,在逻辑层面上形成视频匹配装置。处理器,执行存储器所存放的程序,并具体用于执行以下操作: 获取目标视频的多个特征向量;基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
采用本说明书实施例提供的电子设备可知道:在对目标视频进行侵权定位时,能够获取目标视频的多个特征向量,并基于该目标视频的多个特征向量,从视频数据库中检索与目标视频相似的候选视频,再基于目标视频和候选视频,构建目标视频和候选视频之间的时域相似度矩阵特征图,最后,将时域相似度矩阵特征图作为深度学习检测模型的输入,输出得到候选视频中与目标视频相匹配的视频片段和对应的相似度,一方面在侵权定位的效率上,能够检测可能的侵权视频的任意多个侵权片段的多个特征,同时利用向量检索结合检测模型能够极大地提高侵权视频的检测效率;另一方面,也降低了人工审核的成本。
上述如本说明书图2~图4所示实施例揭示的视频匹配方法可以应用于处理器中,或者由处理器实现。处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本说明书一个或多个实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本说明书一个或多个实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
该电子设备还可执行图2~图4的视频匹配方法,本说明书在此不再赘述。
当然,除了软件实现方式之外,本说明书的电子设备并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。
本说明书实施例还提出了一种计算机可读存储介质,该计算机可读存储介质存储一个或多个程序,该一个或多个程序包括指令,该指令当被包括多个应用程序的便携式电子设备执行时,能够使该便携式电子设备执行图2~图4所示实施例的方法,并具体用于执行以下操作:获取目标视频的多个特征向量;基于所述目标视频的多个特征向量, 从视频数据库中检索与所述目标视频相似的候选视频;基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
总之,以上所述仅为本说明书的较佳实施例而已,并非用于限定本说明书的保护范围。凡在本说明书一个或多个实施例的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本说明书一个或多个实施例的保护范围之内。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。

Claims (14)

  1. 一种基于区块链的侵权存证方法,包括:
    基于目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;
    基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;
    将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;
    当所述候选视频中与所述目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中;
    其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
  2. 一种视频匹配方法,包括:
    获取目标视频的多个特征向量;
    基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;
    基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;
    将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;
    其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
  3. 如权利要求2所述的方法,基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频,包括:
    从所述视频数据库中,获取与所述目标视频的多个特征向量相似的特征向量检索结果;
    基于与所述目标视频的多个特征向量相似的特征向量检索结果,从视频数据库中获取与所述目标视频相似的候选视频。
  4. 如权利要求3所述的方法,基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图,包括:
    基于所述目标视频的多个特征向量、与所述候选视频中的各候选视频的多个特征向量之间的向量检索结果,构建所述目标视频的多个特征向量与所述候选视频的多个特征向量之间的相似度矩阵;
    基于所述目标视频的多个特征向量与所述候选视频中的多个特征向量之间的相似 度矩阵,在时域维度上,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图。
  5. 如权利要求4所述的方法,基于所述目标视频的多个特征向量与所述候选视频中的各候选视频的多个特征向量之间的相似度矩阵,在时域维度上,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图,包括:
    根据所述目标视频和所述候选视频之间的时域对应关系,将所述目标视频的多个特征向量与所述候选视频的多个特征向量之间的相似度矩阵,绘制在二维特征图上,得到所述目标视频和所述候选视频之间的时域相似度矩阵特征图。
  6. 如权利要求5所述的方法,当所述候选视频有多个时,根据所述目标视频和所述候选视频之间的时域对应关系,将所述目标视频的多个特征向量与所述候选视频中的各候选视频的多个特征向量之间的相似度矩阵,绘制在二维特征图上,得到所述目标视频和所述候选视频之间的时域相似度矩阵特征图,包括:
    根据所述目标视频和所述多个候选视频之间的时域对应关系,分别将所述目标视频的多个特征向量与所述多个候选视频中的各候选视频的多个特征向量之间的相似度矩阵,绘制在多个二维特征图上,得到所述目标视频和所述多个候选视频之间的多个时域相似度矩阵特征图;
    将所述目标视频和所述多个候选视频之间的多个时域相似度矩阵特征图进行拼接,得到所述目标视频和所述多个候选视频之间的时域相似度矩阵特征图。
  7. 如权利要求2或6所述的方法,将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度,包括:
    将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段在时域维度上的区间范围、以及所述相匹配的视频片段之间的相似度。
  8. 如权利要求2所述的方法,所述深度学习检测模型包括下述至少一种:
    更快速的基于卷积神经网络的候选区域检测模型Faster-Rcnn;
    带掩码的基于卷积神经网络的候选区域检测模型Mask-Rcnn;
    实时物体检测模型YOLO;
    单次多框检测模型SSD。
  9. 一种基于区块链的侵权存证装置,包括:
    候选视频检索模块,基于目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;
    特征图构建模块,基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;
    模型输出模块,将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;
    证据上传模块,当所述候选视频中与所述目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中;
    其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应 的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
  10. 一种视频匹配装置,包括:
    特征向量获取模块,获取目标视频的多个特征向量;
    候选视频检索模块,基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;
    特征图构建模块,基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;
    模型输出模块,将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;
    其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
  11. 一种电子设备,包括:
    处理器;以及
    被安排成存储计算机可执行指令的存储器,所述可执行指令在被执行时使所述处理器执行以下操作:
    基于目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;
    基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;
    将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;
    当所述候选视频中与所述目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中;
    其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
  12. 一种计算机可读存储介质,所述计算机可读存储介质存储一个或多个程序,所述一个或多个程序当被包括多个应用程序的电子设备执行时,使得所述电子设备执行以下操作:
    基于目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;
    基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;
    将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频 中与所述目标视频相匹配的视频片段和对应的相似度;
    当所述候选视频中与所述目标视频相匹配的视频片段对应的相似度大于或等于预设相似度阈值时,将包含有所述目标视频的摘要、所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度的侵权证据上传至区块链中;
    其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
  13. 一种电子设备,包括:
    处理器;以及
    被安排成存储计算机可执行指令的存储器,所述可执行指令在被执行时使所述处理器执行以下操作:
    获取目标视频的多个特征向量;
    基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;
    基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;
    将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;
    其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
  14. 一种计算机可读存储介质,所述计算机可读存储介质存储一个或多个程序,所述一个或多个程序当被包括多个应用程序的电子设备执行时,使得所述电子设备执行以下操作:
    获取目标视频的多个特征向量;
    基于所述目标视频的多个特征向量,从视频数据库中检索与所述目标视频相似的候选视频;
    基于所述目标视频和所述候选视频,构建所述目标视频和所述候选视频之间的时域相似度矩阵特征图;
    将所述时域相似度矩阵特征图作为深度学习检测模型的输入,以输出所述候选视频中与所述目标视频相匹配的视频片段和对应的相似度;
    其中,所述深度学习检测模型为基于多组样本视频的时域相似度矩阵特征图和对应的标签训练得到的,其中样本视频包含查询视频和所述查询视频对应的候选视频,样本视频对应的标签包括样本视频中的查询视频在对应的候选视频中所匹配的视频片段及侵权标志。
PCT/CN2021/105214 2020-08-14 2021-07-08 视频匹配方法、基于区块链的侵权存证方法和装置 WO2022033252A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/149,552 US11954152B2 (en) 2020-08-14 2023-01-03 Video matching methods and apparatuses, and blockchain-based infringement evidence storage methods and apparatuses

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010816354.9 2020-08-14
CN202010816354.9A CN111737522B (zh) 2020-08-14 2020-08-14 视频匹配方法、基于区块链的侵权存证方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/149,552 Continuation US11954152B2 (en) 2020-08-14 2023-01-03 Video matching methods and apparatuses, and blockchain-based infringement evidence storage methods and apparatuses

Publications (1)

Publication Number Publication Date
WO2022033252A1 true WO2022033252A1 (zh) 2022-02-17

Family

ID=72658482

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/105214 WO2022033252A1 (zh) 2020-08-14 2021-07-08 视频匹配方法、基于区块链的侵权存证方法和装置

Country Status (4)

Country Link
US (1) US11954152B2 (zh)
CN (1) CN111737522B (zh)
TW (1) TW202207154A (zh)
WO (1) WO2022033252A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707022A (zh) * 2022-05-31 2022-07-05 浙江大学 视频问答数据集标注方法、装置、存储介质及电子设备
CN114710713A (zh) * 2022-03-31 2022-07-05 慧之安信息技术股份有限公司 基于深度学习的自动化视频摘要生成方法
CN116188815A (zh) * 2022-12-12 2023-05-30 北京数美时代科技有限公司 一种视频相似度检测方法、系统、存储介质和电子设备

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737522B (zh) * 2020-08-14 2021-03-02 支付宝(杭州)信息技术有限公司 视频匹配方法、基于区块链的侵权存证方法和装置
CN112347478B (zh) * 2020-10-13 2021-08-24 北京天融信网络安全技术有限公司 一种恶意软件检测方法及装置
CN112507875A (zh) * 2020-12-10 2021-03-16 上海连尚网络科技有限公司 一种用于检测视频重复度的方法与设备
CN113038195B (zh) * 2021-03-17 2023-04-11 北京市商汤科技开发有限公司 视频处理方法、装置、系统、介质及计算机设备
CN113255484B (zh) * 2021-05-12 2023-10-03 北京百度网讯科技有限公司 视频匹配方法、视频处理方法、装置、电子设备及介质
CN113360709B (zh) * 2021-05-28 2023-02-17 维沃移动通信(杭州)有限公司 短视频侵权风险的检测方法、装置和电子设备
CN113378902B (zh) * 2021-05-31 2024-02-23 深圳神目信息技术有限公司 一种基于优化视频特征的视频抄袭检测方法
CN113283351B (zh) * 2021-05-31 2024-02-06 深圳神目信息技术有限公司 一种使用cnn优化相似度矩阵的视频抄袭检测方法
CN113763211A (zh) * 2021-09-23 2021-12-07 支付宝(杭州)信息技术有限公司 基于区块链的侵权检测方法及装置和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737135A (zh) * 2012-07-10 2012-10-17 北京大学 基于变形敏感的软级联模型的视频拷贝检测方法及系统
CN106778464A (zh) * 2016-11-09 2017-05-31 深圳市深网视界科技有限公司 一种基于深度学习的行人重识别方法和装置
US20170289409A1 (en) * 2016-03-30 2017-10-05 Nec Laboratories America, Inc. Large margin high-order deep learning with auxiliary tasks for video-based anomaly detection
CN110851761A (zh) * 2020-01-15 2020-02-28 支付宝(杭州)信息技术有限公司 基于区块链的侵权检测方法、装置、设备及存储介质
CN110958319A (zh) * 2019-12-05 2020-04-03 腾讯科技(深圳)有限公司 一种基于区块链的侵权存证管理方法及装置
CN111737522A (zh) * 2020-08-14 2020-10-02 支付宝(杭州)信息技术有限公司 视频匹配方法、基于区块链的侵权存证方法和装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8019763B2 (en) * 2006-02-27 2011-09-13 Microsoft Corporation Propagating relevance from labeled documents to unlabeled documents
US20150170333A1 (en) * 2011-08-31 2015-06-18 Google Inc. Grouping And Presenting Images
CN107180056B (zh) 2016-03-11 2020-11-06 阿里巴巴集团控股有限公司 视频中片段的匹配方法和装置
CN106778686A (zh) 2017-01-12 2017-05-31 深圳职业技术学院 一种基于深度学习和图论的拷贝视频检测方法和系统
CN106991373A (zh) 2017-03-02 2017-07-28 中国人民解放军国防科学技术大学 一种基于深度学习和图论的拷贝视频检测方法
CN108647245B (zh) * 2018-04-13 2023-04-18 腾讯科技(深圳)有限公司 多媒体资源的匹配方法、装置、存储介质及电子装置
CN108763295B (zh) 2018-04-18 2021-04-30 复旦大学 一种基于深度学习的视频近似拷贝检索算法
CN113434592A (zh) 2018-10-31 2021-09-24 创新先进技术有限公司 基于区块链的数据存证方法及装置、电子设备
CN110569702B (zh) * 2019-02-14 2021-05-14 创新先进技术有限公司 视频流的处理方法和装置
US20220027407A1 (en) * 2020-07-27 2022-01-27 Audible Magic Corporation Dynamic identification of unknown media

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737135A (zh) * 2012-07-10 2012-10-17 北京大学 基于变形敏感的软级联模型的视频拷贝检测方法及系统
US20170289409A1 (en) * 2016-03-30 2017-10-05 Nec Laboratories America, Inc. Large margin high-order deep learning with auxiliary tasks for video-based anomaly detection
CN106778464A (zh) * 2016-11-09 2017-05-31 深圳市深网视界科技有限公司 一种基于深度学习的行人重识别方法和装置
CN110958319A (zh) * 2019-12-05 2020-04-03 腾讯科技(深圳)有限公司 一种基于区块链的侵权存证管理方法及装置
CN110851761A (zh) * 2020-01-15 2020-02-28 支付宝(杭州)信息技术有限公司 基于区块链的侵权检测方法、装置、设备及存储介质
CN111737522A (zh) * 2020-08-14 2020-10-02 支付宝(杭州)信息技术有限公司 视频匹配方法、基于区块链的侵权存证方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHOU ZHILI: "The Research on Content-Based Video Copy Detection Algorithm", CHINESE MASTER’S THESES FULL-TEXT DATABASE, INFORMATION SCIENCE AND TECHNOLOGY, 15 April 2011 (2011-04-15), XP055900291 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114710713A (zh) * 2022-03-31 2022-07-05 慧之安信息技术股份有限公司 基于深度学习的自动化视频摘要生成方法
CN114710713B (zh) * 2022-03-31 2023-08-01 慧之安信息技术股份有限公司 基于深度学习的自动化视频摘要生成方法
CN114707022A (zh) * 2022-05-31 2022-07-05 浙江大学 视频问答数据集标注方法、装置、存储介质及电子设备
CN114707022B (zh) * 2022-05-31 2022-09-06 浙江大学 视频问答数据集标注方法、装置、存储介质及电子设备
CN116188815A (zh) * 2022-12-12 2023-05-30 北京数美时代科技有限公司 一种视频相似度检测方法、系统、存储介质和电子设备

Also Published As

Publication number Publication date
US20230177084A1 (en) 2023-06-08
US11954152B2 (en) 2024-04-09
TW202207154A (zh) 2022-02-16
CN111737522A (zh) 2020-10-02
CN111737522B (zh) 2021-03-02

Similar Documents

Publication Publication Date Title
WO2022033252A1 (zh) 视频匹配方法、基于区块链的侵权存证方法和装置
Li et al. DLA-MatchNet for few-shot remote sensing image scene classification
US11461392B2 (en) Providing relevant cover frame in response to a video search query
US11949964B2 (en) Generating action tags for digital videos
US10685236B2 (en) Multi-model techniques to generate video metadata
CN109522435B (zh) 一种图像检索方法及装置
US20200167598A1 (en) User identity determining method, apparatus, and device
CN108664526B (zh) 检索的方法和设备
CN108881947B (zh) 一种直播流的侵权检测方法及装置
WO2020114100A1 (zh) 一种信息处理方法、装置和计算机存储介质
CN111182364B (zh) 一种短视频版权检测方法及系统
CN110941989A (zh) 图像校验、视频校验方法、装置、设备及存储介质
EP4209959A1 (en) Target identification method and apparatus, and electronic device
CN115115825B (zh) 图像中的对象检测方法、装置、计算机设备和存储介质
CN108268598A (zh) 一种基于视频图像数据的分析系统及分析方法
CN110619349A (zh) 植物图像分类方法及装置
CN111738173B (zh) 视频片段检测方法、装置、电子设备及存储介质
CN111008294B (zh) 交通图像处理、图像检索方法及装置
CN104077555A (zh) 一种识别图片搜索中坏例的方法和装置
CN112101154A (zh) 视频分类方法、装置、计算机设备和存储介质
WO2022237065A1 (zh) 分类模型的训练方法、视频分类方法及相关设备
CN112784691B (zh) 一种目标检测模型训练方法、目标检测方法和装置
Zhou et al. A practical spatial re-ranking method for instance search from videos
CN110275990B (zh) Kv存储的键与值的生成方法及装置
CN113283468A (zh) 基于三维形状知识图谱的三维模型检索方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21855305

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21855305

Country of ref document: EP

Kind code of ref document: A1