CN115238127A

CN115238127A - Video consistency comparison method based on labels

Info

Publication number: CN115238127A
Application number: CN202210812641.1A
Authority: CN
Inventors: 吴奕刚; 王伟明; 孙伟涛; 孙彦龙
Original assignee: Hangzhou Arcvideo Technology Co ltd
Current assignee: Hangzhou Arcvideo Technology Co ltd
Priority date: 2022-07-11
Filing date: 2022-07-11
Publication date: 2022-10-25

Abstract

The invention relates to a video analysis technology and discloses a video consistency comparison method based on labels, which comprises the steps of extracting labels and extracting sound aiming at the labels in a video; comparing the labels, namely comparing the hash values of the extracted labels; thereby determining the consistency of the tagged video. The invention only needs to process the sound data, does not need to process the video, and greatly improves the processing speed and concurrency; the ASR and keyword extraction technology is mature, and the technical difficulty is low; only the hash value and the time information need to be recorded, and the storage requirement is low; the comparison adopts a hash value comparison mode, so that the speed is high and the performance is excellent; the comparison algorithm is updated without extracting the video again, so that the comparison algorithm is convenient to adjust and upgrade in combination with the result and the service.

Description

Video consistency comparison method based on labels

Technical Field

The invention relates to a video analysis technology, in particular to a video consistency comparison method based on labels.

Background

Before the internet era, people enjoy video and audio through traditional players such as radios, televisions and the like, and cannot enjoy the video and audio randomly and anytime. With the advent of the internet era, digital technology has greatly reduced the cost of publishing, copying, storing, and propagating text, images, audio and video content, so that information can be nearly freely propagated, shared, and used. However, the copyright protection system suffers from unprecedented impact, and digital contents, especially video and audio, spread on the internet have a great deal of piracy and infringement problems. The phenomenon also attracts more and more authors and distributors, internet copyright protection becomes a hot topic in the internet era, and meanwhile, internet video and audio copyright protection technology also becomes a hot point of research.

The internet video and audio copyright protection needs technical means and legal measures to prevent and strike illegal copying, plagiarism, embezzlement, falsification, distribution and other infringement behaviors.

The existing mainstream protection technology of internet video and audio copyright comprises the following steps: auditing of the video by watermarking as in the prior art; the hidden watermark is added to audio, picture or video in the form of digital data, but cannot be seen in normal conditions. The emerging watermark affects the user viewing experience. The hidden watermark has the problems of poor robustness, need of adopting a special detection device and complex technical upgrading.

In the prior art, features of pictures or sounds in a video are extracted, and the features of the video are converted into information character strings to form a digital gene. And carrying out consistency comparison through a sample matching technology of the target video and the digital gene library. At present, the problems that the technical difficulty is high, the gene needs to be provided again when the technology is upgraded and the like exist.

For example, in the prior art CN202210454687.0, the technical difficulty is large, and the technical upgrade is complex.

Disclosure of Invention

The invention provides a tag video consistency comparison method, which aims at solving the problems of great technical difficulty and complex technical upgrading of the tag video consistency comparison method in the prior art.

In order to solve the technical problems, the invention is solved by the following technical scheme:

the video consistency comparison method based on the label comprises the following steps:

extracting a label, namely extracting sound aiming at the label in the video;

comparing the labels, namely comparing the hash values of the extracted labels; thereby determining the consistency of the tagged video.

Preferably, the extraction method of the label comprises the following steps:

step 1, acquiring a video, namely acquiring the video to be extracted in an uploading or local scanning mode;

step 2, decoding the video, namely decoding the extracted video through a decoder and obtaining video decoding data; the video decoding data comprises sound data and video time point information corresponding to the sound data;

step 3, recognizing the sound data, namely outputting time deviation information and structured data of a recognized text result to the decoded sound data through an automatic voice recognition technology; the time offset information includes a start time and an end time; the structured data of the text result comprises an absolute start time offset ST of the text relative to the video and an absolute end time offset ET of the text relative to the video;

step 4, identifying the segmentation of the text, and segmenting the text identification result;

step 5, extracting keywords, namely extracting the keywords of the segmented recognition text through a TF-IDF algorithm; the number of extracted keywords is at most N;

step 6, merging the character string data, merging the extracted keywords into the character string data in a character string connection mode;

step 7, calculating a paragraph hash value, and obtaining the paragraph hash value through an MD5 hash algorithm for the combined character string data;

step 8, forming records of the paragraph hash value and the time point information start time offset ST and the end time offset ET corresponding to the segments, and storing the records in a database;

and 9, finally forming a video corresponding to a plurality of label extraction result times and hash value records in the database by repeating the steps 4 to 7.

Preferably, the method for aligning tags comprises:

s1, acquiring label result information, and respectively inquiring a plurality of pieces of label result information corresponding to a first video and a second video to be compared from a database; the label result information comprises time deviation and a hash value;

s2, sorting the label results, and sorting the label result information in a descending order according to the starting time in the time information;

and S3, judging the consistency of the matching result, and comparing the first video hash value with the second video hash value to determine the consistency of the matching result.

Preferably, step 4 identifies a segment of text that is segmented by periods, question marks and exclamations.

Preferably, the judging of the consistency of the matching results comprises:

searching a starting position and an ending position of a subsequence corresponding to the two video hash value sequences and completely matched with the first video and the second video;

acquiring the starting time of the starting position and the ending time of the ending position;

and calculating the duration corresponding to the matching result according to the starting time of the starting position and the ending time of the ending position, wherein if the duration corresponding to the matching result is greater than the threshold of the deviation, the matching result is valid, and all valid matching results are consistent positions.

Preferably, the threshold value of the time length deviation of the matching result is not less than 5 seconds.

In order to solve the above technical problem, the present invention further provides a storage medium, which includes a storage medium implemented based on a tag video consistency comparison method.

In order to solve the technical problem, the invention further provides electronic equipment which comprises the electronic equipment realized based on the tag video consistency comparison method.

Due to the adoption of the technical scheme, the invention has the remarkable technical effects that:

the invention only needs to process the sound data, does not need to process the video, and greatly improves the processing speed and concurrency;

the invention adopts ASR and keyword extraction technology, the technology is mature, and the technical difficulty is low;

the invention only needs to record the hash value and the time information, and has low storage requirement; the invention adopts a hash value comparison mode, and has high speed and excellent performance;

the invention does not need to extract the video again when the comparison algorithm is updated, and is convenient for adjustment and upgrade of the comparison algorithm in combination with the result and the service.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of the extraction of tags of the present invention;

FIG. 3 is a flowchart of the alignment of tags according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and examples.

Example 1

extracting a label, namely extracting sound aiming at the label in the video;

The extraction method of the label comprises the following steps:

The label alignment method comprises the following steps:

And 4, identifying the segmentation of the text, wherein the segmentation is carried out through periods, question marks and exclamation marks.

The judgment of the consistency of the matching results comprises the following steps:

and calculating the duration corresponding to the matching result according to the starting time of the starting position and the ending time of the ending position, wherein if the duration corresponding to the matching result is greater than the deviation threshold value, the matching result is valid, and all valid matching results are consistent positions.

The threshold value of the time length deviation of the matching result is not less than 5 seconds.

Example 2

On the basis of embodiment 1, the threshold value of the time length deviation of the matching result of the present embodiment is 5 seconds.

Example 3

On the basis of embodiment 1, this embodiment is a storage medium.

Example 4

On the basis of embodiment 1, this embodiment is an electronic apparatus.

Claims

1. The video consistency comparison method based on the label comprises the following steps:

extracting a label, namely extracting sound aiming at the label in the video;

2. The tag-based video consistency comparison method according to claim 1, wherein the tag extraction method comprises:

step 3, recognizing the sound data, namely outputting time deviation information and structured data of a recognition text result to the decoded sound data through an automatic speech recognition technology; the time offset information includes a start time and an end time; the structured data of the text result comprises an absolute start time offset ST of the text relative to the video and an absolute end time offset ET of the text relative to the video;

step 5, extracting keywords, namely extracting the keywords of the segmented identification text through a TF-IDF algorithm; the number of extracted keywords is at most N;

3. The tag-based video consistency comparison method according to claim 1, wherein the tag-based video consistency comparison method comprises:

4. The tag-based video consistency comparison method of claim 2, wherein, in step 4, the segments of the text are identified and are segmented by periods, question marks and exclamation marks.

5. The tag-based video consistency comparison method according to claim 3, wherein the judgment of the consistency of the matching results comprises:

6. The tag-based video consistency comparison method according to claim 5, wherein a threshold value of a time length deviation of the matching result is not less than 5 seconds.

7. A storage medium comprising a storage medium implemented by the tag-based video consistency comparison method of claims 1 to 6.

8. An electronic device, comprising an electronic device implemented by the tag-based video consistency comparison method according to claims 1 to 6.