CN115238127A - Video consistency comparison method based on labels - Google Patents

Video consistency comparison method based on labels Download PDF

Info

Publication number
CN115238127A
CN115238127A CN202210812641.1A CN202210812641A CN115238127A CN 115238127 A CN115238127 A CN 115238127A CN 202210812641 A CN202210812641 A CN 202210812641A CN 115238127 A CN115238127 A CN 115238127A
Authority
CN
China
Prior art keywords
video
label
time
consistency
hash value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210812641.1A
Other languages
Chinese (zh)
Inventor
吴奕刚
王伟明
孙伟涛
孙彦龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Arcvideo Technology Co ltd
Original Assignee
Hangzhou Arcvideo Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Arcvideo Technology Co ltd filed Critical Hangzhou Arcvideo Technology Co ltd
Priority to CN202210812641.1A priority Critical patent/CN115238127A/en
Publication of CN115238127A publication Critical patent/CN115238127A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a video analysis technology and discloses a video consistency comparison method based on labels, which comprises the steps of extracting labels and extracting sound aiming at the labels in a video; comparing the labels, namely comparing the hash values of the extracted labels; thereby determining the consistency of the tagged video. The invention only needs to process the sound data, does not need to process the video, and greatly improves the processing speed and concurrency; the ASR and keyword extraction technology is mature, and the technical difficulty is low; only the hash value and the time information need to be recorded, and the storage requirement is low; the comparison adopts a hash value comparison mode, so that the speed is high and the performance is excellent; the comparison algorithm is updated without extracting the video again, so that the comparison algorithm is convenient to adjust and upgrade in combination with the result and the service.

Description

Video consistency comparison method based on labels
Technical Field
The invention relates to a video analysis technology, in particular to a video consistency comparison method based on labels.
Background
Before the internet era, people enjoy video and audio through traditional players such as radios, televisions and the like, and cannot enjoy the video and audio randomly and anytime. With the advent of the internet era, digital technology has greatly reduced the cost of publishing, copying, storing, and propagating text, images, audio and video content, so that information can be nearly freely propagated, shared, and used. However, the copyright protection system suffers from unprecedented impact, and digital contents, especially video and audio, spread on the internet have a great deal of piracy and infringement problems. The phenomenon also attracts more and more authors and distributors, internet copyright protection becomes a hot topic in the internet era, and meanwhile, internet video and audio copyright protection technology also becomes a hot point of research.
The internet video and audio copyright protection needs technical means and legal measures to prevent and strike illegal copying, plagiarism, embezzlement, falsification, distribution and other infringement behaviors.
The existing mainstream protection technology of internet video and audio copyright comprises the following steps: auditing of the video by watermarking as in the prior art; the hidden watermark is added to audio, picture or video in the form of digital data, but cannot be seen in normal conditions. The emerging watermark affects the user viewing experience. The hidden watermark has the problems of poor robustness, need of adopting a special detection device and complex technical upgrading.
In the prior art, features of pictures or sounds in a video are extracted, and the features of the video are converted into information character strings to form a digital gene. And carrying out consistency comparison through a sample matching technology of the target video and the digital gene library. At present, the problems that the technical difficulty is high, the gene needs to be provided again when the technology is upgraded and the like exist.
For example, in the prior art CN202210454687.0, the technical difficulty is large, and the technical upgrade is complex.
Disclosure of Invention
The invention provides a tag video consistency comparison method, which aims at solving the problems of great technical difficulty and complex technical upgrading of the tag video consistency comparison method in the prior art.
In order to solve the technical problems, the invention is solved by the following technical scheme:
the video consistency comparison method based on the label comprises the following steps:
extracting a label, namely extracting sound aiming at the label in the video;
comparing the labels, namely comparing the hash values of the extracted labels; thereby determining the consistency of the tagged video.
Preferably, the extraction method of the label comprises the following steps:
step 1, acquiring a video, namely acquiring the video to be extracted in an uploading or local scanning mode;
step 2, decoding the video, namely decoding the extracted video through a decoder and obtaining video decoding data; the video decoding data comprises sound data and video time point information corresponding to the sound data;
step 3, recognizing the sound data, namely outputting time deviation information and structured data of a recognized text result to the decoded sound data through an automatic voice recognition technology; the time offset information includes a start time and an end time; the structured data of the text result comprises an absolute start time offset ST of the text relative to the video and an absolute end time offset ET of the text relative to the video;
step 4, identifying the segmentation of the text, and segmenting the text identification result;
step 5, extracting keywords, namely extracting the keywords of the segmented recognition text through a TF-IDF algorithm; the number of extracted keywords is at most N;
step 6, merging the character string data, merging the extracted keywords into the character string data in a character string connection mode;
step 7, calculating a paragraph hash value, and obtaining the paragraph hash value through an MD5 hash algorithm for the combined character string data;
step 8, forming records of the paragraph hash value and the time point information start time offset ST and the end time offset ET corresponding to the segments, and storing the records in a database;
and 9, finally forming a video corresponding to a plurality of label extraction result times and hash value records in the database by repeating the steps 4 to 7.
Preferably, the method for aligning tags comprises:
s1, acquiring label result information, and respectively inquiring a plurality of pieces of label result information corresponding to a first video and a second video to be compared from a database; the label result information comprises time deviation and a hash value;
s2, sorting the label results, and sorting the label result information in a descending order according to the starting time in the time information;
and S3, judging the consistency of the matching result, and comparing the first video hash value with the second video hash value to determine the consistency of the matching result.
Preferably, step 4 identifies a segment of text that is segmented by periods, question marks and exclamations.
Preferably, the judging of the consistency of the matching results comprises:
searching a starting position and an ending position of a subsequence corresponding to the two video hash value sequences and completely matched with the first video and the second video;
acquiring the starting time of the starting position and the ending time of the ending position;
and calculating the duration corresponding to the matching result according to the starting time of the starting position and the ending time of the ending position, wherein if the duration corresponding to the matching result is greater than the threshold of the deviation, the matching result is valid, and all valid matching results are consistent positions.
Preferably, the threshold value of the time length deviation of the matching result is not less than 5 seconds.
In order to solve the above technical problem, the present invention further provides a storage medium, which includes a storage medium implemented based on a tag video consistency comparison method.
In order to solve the technical problem, the invention further provides electronic equipment which comprises the electronic equipment realized based on the tag video consistency comparison method.
Due to the adoption of the technical scheme, the invention has the remarkable technical effects that:
the invention only needs to process the sound data, does not need to process the video, and greatly improves the processing speed and concurrency;
the invention adopts ASR and keyword extraction technology, the technology is mature, and the technical difficulty is low;
the invention only needs to record the hash value and the time information, and has low storage requirement; the invention adopts a hash value comparison mode, and has high speed and excellent performance;
the invention does not need to extract the video again when the comparison algorithm is updated, and is convenient for adjustment and upgrade of the comparison algorithm in combination with the result and the service.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of the extraction of tags of the present invention;
FIG. 3 is a flowchart of the alignment of tags according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and examples.
Example 1
The video consistency comparison method based on the label comprises the following steps:
extracting a label, namely extracting sound aiming at the label in the video;
comparing the labels, namely comparing the hash values of the extracted labels; thereby determining the consistency of the tagged video.
The extraction method of the label comprises the following steps:
step 1, acquiring a video, namely acquiring the video to be extracted in an uploading or local scanning mode;
step 2, decoding the video, namely decoding the extracted video through a decoder and obtaining video decoding data; the video decoding data comprises sound data and video time point information corresponding to the sound data;
step 3, recognizing the sound data, namely outputting time deviation information and structured data of a recognized text result to the decoded sound data through an automatic voice recognition technology; the time offset information includes a start time and an end time; the structured data of the text result comprises an absolute start time offset ST of the text relative to the video and an absolute end time offset ET of the text relative to the video;
step 4, identifying the segmentation of the text, and segmenting the text identification result;
step 5, extracting keywords, namely extracting the keywords of the segmented recognition text through a TF-IDF algorithm; the number of extracted keywords is at most N;
step 6, merging the character string data, merging the extracted keywords into the character string data in a character string connection mode;
step 7, calculating a paragraph hash value, and obtaining the paragraph hash value through an MD5 hash algorithm for the combined character string data;
step 8, forming records of the paragraph hash value and the time point information start time offset ST and the end time offset ET corresponding to the segments, and storing the records in a database;
and 9, finally forming a video corresponding to a plurality of label extraction result times and hash value records in the database by repeating the steps 4 to 7.
The label alignment method comprises the following steps:
s1, acquiring label result information, and respectively inquiring a plurality of pieces of label result information corresponding to a first video and a second video to be compared from a database; the label result information comprises time deviation and a hash value;
s2, sorting the label results, and sorting the label result information in a descending order according to the starting time in the time information;
and S3, judging the consistency of the matching result, and comparing the first video hash value with the second video hash value to determine the consistency of the matching result.
And 4, identifying the segmentation of the text, wherein the segmentation is carried out through periods, question marks and exclamation marks.
The judgment of the consistency of the matching results comprises the following steps:
searching a starting position and an ending position of a subsequence corresponding to the two video hash value sequences and completely matched with the first video and the second video;
acquiring the starting time of the starting position and the ending time of the ending position;
and calculating the duration corresponding to the matching result according to the starting time of the starting position and the ending time of the ending position, wherein if the duration corresponding to the matching result is greater than the deviation threshold value, the matching result is valid, and all valid matching results are consistent positions.
The threshold value of the time length deviation of the matching result is not less than 5 seconds.
Example 2
On the basis of embodiment 1, the threshold value of the time length deviation of the matching result of the present embodiment is 5 seconds.
Example 3
On the basis of embodiment 1, this embodiment is a storage medium.
Example 4
On the basis of embodiment 1, this embodiment is an electronic apparatus.

Claims (8)

1. The video consistency comparison method based on the label comprises the following steps:
extracting a label, namely extracting sound aiming at the label in the video;
comparing the labels, namely comparing the hash values of the extracted labels; thereby determining the consistency of the tagged video.
2. The tag-based video consistency comparison method according to claim 1, wherein the tag extraction method comprises:
step 1, acquiring a video, namely acquiring the video to be extracted in an uploading or local scanning mode;
step 2, decoding the video, namely decoding the extracted video through a decoder and obtaining video decoding data; the video decoding data comprises sound data and video time point information corresponding to the sound data;
step 3, recognizing the sound data, namely outputting time deviation information and structured data of a recognition text result to the decoded sound data through an automatic speech recognition technology; the time offset information includes a start time and an end time; the structured data of the text result comprises an absolute start time offset ST of the text relative to the video and an absolute end time offset ET of the text relative to the video;
step 4, identifying the segmentation of the text, and segmenting the text identification result;
step 5, extracting keywords, namely extracting the keywords of the segmented identification text through a TF-IDF algorithm; the number of extracted keywords is at most N;
step 6, merging the character string data, merging the extracted keywords into the character string data in a character string connection mode;
step 7, calculating a paragraph hash value, and obtaining the paragraph hash value through an MD5 hash algorithm for the combined character string data;
step 8, forming records of the paragraph hash value and the time point information start time offset ST and the end time offset ET corresponding to the segments, and storing the records in a database;
and 9, finally forming a video corresponding to a plurality of label extraction result times and hash value records in the database by repeating the steps 4 to 7.
3. The tag-based video consistency comparison method according to claim 1, wherein the tag-based video consistency comparison method comprises:
s1, acquiring label result information, and respectively inquiring a plurality of pieces of label result information corresponding to a first video and a second video to be compared from a database; the label result information comprises time deviation and a hash value;
s2, sorting the label results, and sorting the label result information in a descending order according to the starting time in the time information;
and S3, judging the consistency of the matching result, and comparing the first video hash value with the second video hash value to determine the consistency of the matching result.
4. The tag-based video consistency comparison method of claim 2, wherein, in step 4, the segments of the text are identified and are segmented by periods, question marks and exclamation marks.
5. The tag-based video consistency comparison method according to claim 3, wherein the judgment of the consistency of the matching results comprises:
searching a starting position and an ending position of a subsequence corresponding to the two video hash value sequences and completely matched with the first video and the second video;
acquiring the starting time of the starting position and the ending time of the ending position;
and calculating the duration corresponding to the matching result according to the starting time of the starting position and the ending time of the ending position, wherein if the duration corresponding to the matching result is greater than the threshold of the deviation, the matching result is valid, and all valid matching results are consistent positions.
6. The tag-based video consistency comparison method according to claim 5, wherein a threshold value of a time length deviation of the matching result is not less than 5 seconds.
7. A storage medium comprising a storage medium implemented by the tag-based video consistency comparison method of claims 1 to 6.
8. An electronic device, comprising an electronic device implemented by the tag-based video consistency comparison method according to claims 1 to 6.
CN202210812641.1A 2022-07-11 2022-07-11 Video consistency comparison method based on labels Pending CN115238127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210812641.1A CN115238127A (en) 2022-07-11 2022-07-11 Video consistency comparison method based on labels

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210812641.1A CN115238127A (en) 2022-07-11 2022-07-11 Video consistency comparison method based on labels

Publications (1)

Publication Number Publication Date
CN115238127A true CN115238127A (en) 2022-10-25

Family

ID=83670507

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210812641.1A Pending CN115238127A (en) 2022-07-11 2022-07-11 Video consistency comparison method based on labels

Country Status (1)

Country Link
CN (1) CN115238127A (en)

Similar Documents

Publication Publication Date Title
US9226047B2 (en) Systems and methods for performing semantic analysis of media objects
EP2321964B1 (en) Method and apparatus for detecting near-duplicate videos using perceptual video signatures
Cano et al. Audio fingerprinting: concepts and applications
KR101171536B1 (en) Temporal segment based extraction and robust matching of video fingerprints
US8503523B2 (en) Forming a representation of a video item and use thereof
WO2008097051A1 (en) Method for searching specific person included in digital data, and method and apparatus for producing copyright report for the specific person
US20100063978A1 (en) Apparatus and method for inserting/extracting nonblind watermark using features of digital media data
US20020059208A1 (en) Information providing apparatus and method, and recording medium
US20170185675A1 (en) Fingerprinting and matching of content of a multi-media file
WO2017067400A1 (en) Video file identification method and device
JP2008166914A (en) Method and apparatus for synchronizing data stream of content with meta data
WO2003096337A2 (en) Watermark embedding and retrieval
US20070201764A1 (en) Apparatus and method for detecting key caption from moving picture to provide customized broadcast service
US9367744B2 (en) Systems and methods of fingerprinting and identifying media contents
US20070220265A1 (en) Searching for a scaling factor for watermark detection
US20080256576A1 (en) Method and Apparatus for Detecting Content Item Boundaries
CN111274450A (en) Video identification method
CN115238127A (en) Video consistency comparison method based on labels
CN109101964B (en) Method, device and storage medium for determining head and tail areas in multimedia file
KR100930529B1 (en) Harmful video screening system and method through video identification
Duong et al. Movie synchronization by audio landmark matching
US20060092327A1 (en) Story segmentation method for video
JP2002014973A (en) Video retrieving system and method, and recording medium with video retrieving program recorded thereon
GB2617681A (en) Non-fingerprint-based automatic content recognition
Klein et al. Identifying Source Videos for Video Clips Based on Video Fingerprints and Embeddings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination