CN116644212A - Video detection method, device, equipment and readable storage medium - Google Patents
Video detection method, device, equipment and readable storage medium Download PDFInfo
- Publication number
- CN116644212A CN116644212A CN202310908926.XA CN202310908926A CN116644212A CN 116644212 A CN116644212 A CN 116644212A CN 202310908926 A CN202310908926 A CN 202310908926A CN 116644212 A CN116644212 A CN 116644212A
- Authority
- CN
- China
- Prior art keywords
- video
- text
- detected
- information
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 38
- 239000012634 fragment Substances 0.000 claims abstract description 121
- 238000000034 method Methods 0.000 claims description 25
- 238000012545 processing Methods 0.000 claims description 13
- 238000012937 correction Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 2
- 238000004891 communication Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 244000025254 Cannabis sativa Species 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 3
- 238000012015 optical character recognition Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 235000013311 vegetables Nutrition 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7834—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/71—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/635—Overlay text, e.g. embedded captions in a TV program
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
The application discloses a video detection method, a device, equipment and a readable storage medium. After determining the video to be detected and the video to be compared; firstly, determining text fragment information of a video to be detected; each piece of text segment information of the video to be detected comprises a piece of text information and a time stamp of the text information; then, obtaining text segment information of the video to be compared; each text segment information of the video to be compared comprises a text message and a time stamp of the text message; and finally, comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated. Based on the scheme, the number of the text fragments of the video is far smaller than that of the image frames of the video, and the text fragment information only comprises the text information and the time stamp thereof, so that the comparison of the text fragment information is simpler than the comparison of the image frames, and therefore, the video detection efficiency can be improved by adopting the scheme.
Description
Technical Field
The present application relates to the field of video processing technologies, and in particular, to a video detection method, apparatus, device, and readable storage medium.
Background
With the rapid development of social networks, video is becoming one of the dominant content modalities of the mobile internet. Because the video has the characteristics of strong participation, high propagation value and the like, the uploading amount of the video is larger and larger; it is therefore necessary to detect the video to determine whether the detected video constitutes a duplicate video with the uploaded video.
Currently, the video is detected from the picture aspect of the video by using an artificial intelligence technology to determine whether the detected video and the uploaded video form a repeated video, however, the video detection mode needs to compare the image frames of the two videos, and the number of the image frames of the video is numerous, so that the efficiency of video detection is low.
Therefore, how to provide a video detection method to improve the efficiency of video detection is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the present application provides a video detection method, apparatus, device and readable storage medium. The specific scheme is as follows:
a method of video detection, the method comprising:
determining a video to be detected and a video to be compared;
determining text segment information of the video to be detected, wherein each text segment information of the video to be detected comprises one text information and a time stamp of the text information;
Acquiring text segment information of the video to be compared, wherein each text segment information of the video to be compared comprises a text message and a time stamp of the text message;
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated or not.
Optionally, the determining text segment information of the video to be detected includes:
determining each sub-audio fragment in the video to be detected;
determining text segment information corresponding to each sub-audio segment according to each sub-audio segment, wherein the text segment information corresponding to each sub-audio segment comprises text information corresponding to the sub-audio segment and a time stamp of the text information corresponding to the sub-audio segment;
and combining the text segment information corresponding to each sub-audio segment into the text segment information of the video to be detected.
Optionally, the determining the text segment information corresponding to the sub-audio segment includes:
determining a text corresponding to the sub-audio fragment;
determining a starting time stamp corresponding to a first word in a text corresponding to the sub-audio fragment;
Determining text information corresponding to the sub-audio clips based on the text corresponding to the sub-audio clips;
determining a starting time stamp corresponding to a first word in a text corresponding to the sub-audio fragment as a time stamp of text information corresponding to the sub-audio fragment; and combining the text information corresponding to the sub-audio fragments and the time stamps of the text information corresponding to the sub-audio fragments to obtain the text fragment information corresponding to the sub-audio fragments.
Optionally, the determining the text corresponding to the sub-audio clip includes:
performing voice recognition on the sub-audio fragments to obtain first texts corresponding to the sub-audio fragments;
identifying subtitles in the video clips corresponding to the sub audio clips to obtain a second text corresponding to the sub audio clips;
and carrying out alignment and correction processing on the first text and the second text to obtain the text corresponding to the sub-audio fragment.
Optionally, the determining the start timestamp corresponding to the first word in the text corresponding to the sub-audio clip includes:
determining a starting time stamp of each word in the first text in the sub-audio clip;
and determining the starting time stamp of the first word in the text corresponding to the sub-audio fragment in the sub-audio fragment based on the starting time stamp of each word in the first text in the sub-audio fragment.
Optionally, the comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated includes:
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether target text information exists between the video to be detected and the video to be compared, wherein the target text information is the text information which appears in the video to be detected and the video to be compared;
if the video to be detected does not exist, determining that the video to be detected and the video to be compared are not repeated;
if so, determining a timestamp deviation corresponding to the target text information based on the timestamp of the target text information in the video to be detected and the timestamp of the target text information in the video to be compared for each target text information; and determining whether the video to be detected and the video to be compared are repeated or not based on the timestamp deviation corresponding to each target text message.
Optionally, the determining whether the video to be detected and the video to be compared are repeated based on the timestamp deviation corresponding to each target text information includes:
Determining the number of target text messages corresponding to one time stamp deviation;
calculating the ratio of the maximum number to the number of the text segment information of the video to be detected;
if the ratio exceeds a preset threshold, determining that the video to be detected and the video to be compared are repeated;
and if the ratio does not exceed the preset threshold, determining that the video to be detected and the video to be compared are not repeated.
Optionally, the video to be compared is a video in a preset video library, and if it is determined that the video to be detected and the video to be compared are repeated, the method further includes:
and outputting a storage path of the video to be compared in the video library.
Optionally, if it is determined that the video to be detected is not repeated with each video in the preset video library, the method further includes:
and storing the video to be detected and the text fragment information of the video to be detected into the video library.
A video detection device, the device comprising:
the video determining unit is used for determining a video to be detected and a video to be compared;
a text segment information determining unit, configured to determine text segment information of the video to be detected, where each text segment information of the video to be detected includes one text information, and a timestamp of the text information;
A text segment information obtaining unit, configured to obtain text segment information of the video to be compared, where each text segment information of the video to be compared includes a text information and a timestamp of the text information;
and the comparison unit is used for comparing the text segment information of the video to be detected with the text segment information of the video to be compared and determining whether the video to be detected and the video to be compared are repeated or not.
Optionally, the text segment information determining unit includes:
a sub-audio fragment determining unit, configured to determine each sub-audio fragment in the video to be detected;
a sub-audio fragment processing unit, configured to determine, for each sub-audio fragment, text fragment information corresponding to the sub-audio fragment, where the text fragment information corresponding to each sub-audio fragment includes text information corresponding to the sub-audio fragment, and a timestamp of the text information corresponding to the sub-audio fragment; and combining the text segment information corresponding to each sub-audio segment into the text segment information of the video to be detected.
Optionally, the sub-audio clip processing unit includes:
a text determining unit, configured to determine a text corresponding to the sub-audio segment;
A starting time stamp determining unit, configured to determine a starting time stamp corresponding to a first word in a text corresponding to the sub-audio segment;
the text information determining unit is used for determining text information corresponding to the sub-audio fragments based on the text corresponding to the sub-audio fragments;
the time stamp determining unit is used for determining a starting time stamp corresponding to a first word in the text corresponding to the sub-audio fragment as a time stamp of text information corresponding to the sub-audio fragment; and combining the text information corresponding to the sub-audio fragments and the time stamps of the text information corresponding to the sub-audio fragments to obtain the text fragment information corresponding to the sub-audio fragments.
Optionally, the text determining unit includes:
the voice recognition unit is used for carrying out voice recognition on the sub-audio fragments to obtain a first text corresponding to the sub-audio fragments;
the subtitle identification unit is used for identifying the subtitle in the video segment corresponding to the sub-audio segment to obtain a second text corresponding to the sub-audio segment;
and the alignment and correction processing unit is used for performing alignment and correction processing on the first text and the second text to obtain the text corresponding to the sub-audio fragment.
Optionally, the start timestamp determining unit is specifically configured to:
determining a starting time stamp of each word in the first text in the sub-audio clip;
and determining the starting time stamp of the first word in the text corresponding to the sub-audio fragment in the sub-audio fragment based on the starting time stamp of each word in the first text in the sub-audio fragment.
Optionally, the comparing unit is specifically configured to:
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether target text information exists between the video to be detected and the video to be compared, wherein the target text information is the text information which appears in the video to be detected and the video to be compared;
if the video to be detected does not exist, determining that the video to be detected and the video to be compared are not repeated;
if so, determining a timestamp deviation corresponding to the target text information based on the timestamp of the target text information in the video to be detected and the timestamp of the target text information in the video to be compared for each target text information; and determining whether the video to be detected and the video to be compared are repeated or not based on the timestamp deviation corresponding to each target text message.
Optionally, the comparing unit is specifically configured to:
determining the number of target text messages corresponding to one time stamp deviation;
calculating the ratio of the maximum number to the number of the text segment information of the video to be detected;
if the ratio exceeds a preset threshold, determining that the video to be detected and the video to be compared are repeated;
and if the ratio does not exceed the preset threshold, determining that the video to be detected and the video to be compared are not repeated.
Optionally, the video to be compared is a video in a preset video library, and if it is determined that the video to be detected and the video to be compared are repeated, the device further includes:
and the output unit is used for outputting the storage path of the video to be compared in the video library.
Optionally, if it is determined that the video to be detected is not repeated with each video in the preset video library, the apparatus further includes:
and the storage unit is used for storing the video to be detected and the text fragment information of the video to be detected into the video library.
A video detection device comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the video detection method as described above.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a video detection method as described above.
By means of the technical scheme, the application discloses a video detection method, a device, equipment and a readable storage medium. After determining the video to be detected and the video to be compared; firstly, determining text fragment information of a video to be detected; each piece of text segment information of the video to be detected comprises a piece of text information and a time stamp of the text information; then, obtaining text segment information of the video to be compared; each text segment information of the video to be compared comprises a text message and a time stamp of the text message; and finally, comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated. Based on the scheme, the number of the text fragments of the video is far smaller than that of the image frames of the video, and the text fragment information only comprises the text information and the time stamp thereof, so that the comparison of the text fragment information is simpler than the comparison of the image frames, and therefore, the video detection efficiency can be improved by adopting the scheme.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 is a schematic flow chart of a video detection method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a video detection device according to an embodiment of the present application;
fig. 3 is a block diagram of a hardware structure of a video detection device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Next, the video detection method provided by the present application will be described by the following embodiments.
Referring to fig. 1, fig. 1 is a schematic flow chart of a video detection method according to an embodiment of the present application, where the method may include:
step S101: and determining the video to be detected and the video to be compared.
In the application, the video to be detected and the video to be compared can be videos with any time length or any format, and the application is not limited in any way. In some situations, the purpose of the video detection is to determine whether the video to be detected can be stored in the preset video library, and in such situations, if the video to be detected is not repeated with the video in the preset video library, the video to be detected is stored in the preset video library so as to reduce the invalid occupation of the storage space of the preset video library, and in this case, the video to be compared can be any video stored in the preset video library.
Step S102: and determining the text segment information of the video to be detected, wherein each text segment information of the video to be detected comprises one text information and a time stamp of the text information.
It should be noted that, the text information may represent the text corresponding to the video segment, and the timestamp of the text information may represent at which time in the video segment the text corresponding to the video segment starts to appear.
In the application, the video to be detected can be divided into a plurality of video segments, and text segment information corresponding to each video segment is determined for each video segment, and the specific implementation manner will be described in detail through the following embodiments, which are not described here.
Step S103: and acquiring the text segment information of the video to be compared, wherein each text segment information of the video to be compared comprises a text message and a time stamp of the text message.
In the application, the preset video library can store videos and text fragment information of the videos, and the text fragment information of the videos to be compared can be obtained from the preset video library. It should be noted that, a video identifier may be added to each piece of text segment information of each video, so that the piece of text segment information of different videos may be distinguished by different video identifiers.
Step S104: comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated or not.
In the application, by comparing the text segment information of the video to be detected with the text segment information of the video to be compared, the similarity between the video to be detected and the video to be compared can be determined, and whether the video to be detected and the video to be compared are repeated or not can be determined based on the similarity between the video to be detected and the video to be compared, and the specific implementation manner will be described in detail through the following embodiments, which will not be described here.
The embodiment discloses a video detection method. After determining the video to be detected and the video to be compared; firstly, determining text fragment information of a video to be detected; each piece of text segment information of the video to be detected comprises a piece of text information and a time stamp of the text information; then, obtaining text segment information of the video to be compared; each text segment information of the video to be compared comprises a text message and a time stamp of the text message; and finally, comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated. Based on the scheme, the number of the text fragments of the video is far smaller than that of the image frames of the video, and the text fragment information only comprises the text information and the time stamp thereof, so that the comparison of the text fragment information is simpler than the comparison of the image frames, and therefore, the video detection efficiency can be improved by adopting the scheme.
In another embodiment of the present application, a specific implementation manner of determining the text segment information of the video to be detected in step S102 is described, and the manner may include the following steps:
Step S201: and determining each sub-audio fragment in the video to be detected.
In the application, the audio information of the video to be detected can be extracted first, and then the effective audio fragments can be extracted from the audio information of the video to be detected according to the VAD (Voice activity detection) technology to be used as each sub-audio fragment in the video to be detected.
Step S202: and determining text segment information corresponding to each sub-audio segment according to each sub-audio segment, wherein the text segment information corresponding to each sub-audio segment comprises the text information corresponding to the sub-audio segment and a time stamp of the text information corresponding to the sub-audio segment.
It should be noted that, a specific implementation manner of determining the text segment information corresponding to the sub-audio segment will be described in detail through the following embodiments, which will not be described here.
Step S203: and combining the text segment information corresponding to each sub-audio segment into the text segment information of the video to be detected.
In another embodiment of the present application, a specific implementation manner of determining the text segment information corresponding to the sub-audio segment in step S202 is described, where the method may include the following steps:
Step S301: and determining the text corresponding to the sub-audio fragment.
As an implementation manner, the specific implementation manner of determining the text corresponding to the sub-audio clip may include the following steps:
step S3011: and carrying out voice recognition on the sub-audio fragments to obtain a first text corresponding to the sub-audio fragments.
In the application, the sub-audio segment can be subjected to speech recognition by using an ASR (Automatic Speech Recognition ) technology to obtain a first text corresponding to the audio segment.
Step S3012: and identifying the subtitles in the video clips corresponding to the sub-audio clips to obtain a second text corresponding to the sub-audio clips.
In the application, the subtitle recognition can be carried out on the image frames in the video clips corresponding to the sub-audio clips by utilizing OCR (Optical Character Recognition ) to obtain the second text corresponding to the sub-audio clips.
Step S3013: and carrying out alignment and correction processing on the first text and the second text to obtain the text corresponding to the sub-audio fragment.
Considering the difference between the implementation principles of the ASR technology and the OCR technology, the first text corresponding to the obtained sub-audio fragment and the second text corresponding to the sub-audio fragment may not be completely consistent, so in the application, the first text and the second text need to be aligned and corrected to obtain the text corresponding to the sub-audio fragment.
As an implementation manner, in the present application, alignment of the first text and the second text may be implemented by using a text editing distance algorithm, and after the alignment of the first text and the second text, the first text and the second text may be corrected based on a preset correction rule, so as to obtain corrected text as a text corresponding to the sub-audio segment. The preset correction rules include, but are not limited to, alignment of sentence head positions, homophones with second text, homophones with first text, first text missing based on second text complement, second text missing based on first text complement, cutting off redundant words at the tail of the first text, english translation into Chinese subtitles with first text, and the like.
For ease of understanding, assume that the first text is: i turn to the fact that I have not yet sent out that some waiting for the woolen vegetable will be in the grass to assist I in ok; the second text is: i do not leave until I's certain waiting party is good in the grass and pick up I's assistance
Firstly, aligning the first text and the second text by adopting a text editing distance, and then, the following result is obtained:
i's turn not yet going out to have a waiting for the vegetable to be in the grass to help I's ok-y
I do not leave until I's certain waiting party is good in the grass and pick up I's assistance
After the first text and the second text are aligned, the first text and the second text may be corrected based on a preset correction rule, which is specifically as follows:
rule one, alignment of sentence head position
As in the example above, this will be identified as the position of the sentence head.
Rule two, homophones are based on the second text
As in the above examples "somewhere in dish" and "somewhere in zei" will be subject to "somewhere in zei".
Rule III, the homotypic word is based on the first text
As in the examples above, "give me" and "pick me" will take control of "give me".
Rule four, first text word loss based on second text completion
Based on "waiting for meeting" to complement "waiting for meeting" in the above example "
Rule five, second text word loss based on first text completion
The "woolen" is completed after the "me not yet departing" in the first text based on the "me not yet departing" in the first text as in the above example "
Rule six, the redundant word of the tail part of the first text is truncated
The first text tail is truncated as in the example above
Rule seven, english translation into Chinese subtitle based on first text
As in the above example, "ok" controls.
Correcting the first text and the second text based on a preset correction rule, and obtaining corrected text as follows: i have not yet started until you have a certain waiting meeting, and have a good help for I to look in the grass
Step S302: and determining a starting time stamp corresponding to a first word in the text corresponding to the sub-audio fragment.
As an implementation manner, the implementation manner of determining the start timestamp corresponding to the first word in the text corresponding to the sub-audio segment may be: determining a starting time stamp of each word in the first text in the sub-audio clip; and determining the starting time stamp of the first word in the text corresponding to the sub-audio fragment in the sub-audio fragment based on the starting time stamp of each word in the first text in the sub-audio fragment.
In the application, the first text and the corresponding sub-audio fragment can be forcedly aligned based on the forceful alignment technology in voice recognition, so as to determine the starting time stamp of each word in the first text in the sub-audio fragment. After determining the text corresponding to the sub-audio segment, determining which word in the first text corresponds to the first word in the text, and taking the starting time stamp of the word in the sub-audio segment as the starting time stamp corresponding to the first word in the text corresponding to the sub-audio segment.
For easy understanding, based on the above example, the first word in the text corresponding to the sub-audio clip is "me" in "me", and then it is determined that the start timestamp corresponding to "me" in the first text is the start timestamp corresponding to "me" in the text corresponding to the sub-audio clip.
Step S303: and determining text information corresponding to the sub-audio fragments based on the text corresponding to the sub-audio fragments.
As an implementation manner, a hash value of the text corresponding to the sub-audio segment may be calculated as the text information corresponding to the sub-audio segment.
Step S304: and determining a starting time stamp corresponding to a first word in the text corresponding to the sub-audio fragment as a time stamp of the text information corresponding to the sub-audio fragment, and combining the text information corresponding to the sub-audio fragment and the time stamp of the text information corresponding to the sub-audio fragment to obtain the text fragment information corresponding to the sub-audio fragment.
In another embodiment of the present application, a detailed description is given of a specific implementation manner of comparing the text segment information of the video to be detected and the text segment information of the video to be compared in step S104 to determine whether the video to be detected and the video to be compared are repeated, where the method may include the following steps:
Step S401: comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether target text information exists between the video to be detected and the video to be compared, wherein the target text information is the text information which appears in the video to be detected and the video to be compared; if not, executing step S402; if so, step S403 is performed;
for ease of understanding, it is assumed that the text clip information of the video to be detected includes two pieces, one of which is abc, whose corresponding time stamp is 10, and the other of which is bcd, whose corresponding time stamp is 20;
the text segment information of the video to be compared comprises three pieces, wherein one piece of text information is abc, the corresponding time stamp of the one piece of text information is 60, the other piece of text information is bcd, the corresponding time stamp of the other piece of text information is 70, the other piece of text information is abc, and the corresponding time stamp of the other piece of text information is 90;
wherein the text information abc and bcd are target text information.
Step S402: and determining that the video to be detected and the video to be compared are not repeated.
Step S403: determining a timestamp deviation corresponding to the target text information based on the timestamp of the target text information in the video to be detected and the timestamp of the target text information in the video to be compared aiming at each target text information; and determining whether the video to be detected and the video to be compared are repeated or not based on the timestamp deviation corresponding to each target text message.
The determining whether the video to be detected and the video to be compared are repeated based on the timestamp deviation corresponding to each target text information comprises the following steps: determining the number of target text messages corresponding to one time stamp deviation; calculating the ratio of the maximum number to the number of the text segment information of the video to be detected; if the ratio exceeds a preset threshold, determining that the video to be detected and the video to be compared are repeated; and if the ratio does not exceed the preset threshold, determining that the video to be detected and the video to be compared are not repeated.
For ease of understanding, taking the example of step S401 as an example, assume that the preset threshold value is 80%, the timestamp deviation of the text information abc is 50 and 80, and the timestamp deviation of the text information bcd is 50. And if the number of the target text messages corresponding to the timestamp deviation 50 is 2, the number of the target text messages corresponding to the timestamp deviation 80 is 1, the maximum number of the target text messages corresponding to the timestamp deviation is 2, the number of the text segment messages of the video to be detected is 2, the ratio of the maximum number to the number of the text segment messages of the video to be detected is 100%, and the ratio is greater than a preset threshold, and the video to be detected and the video to be compared are determined to be repeated.
In some situations, the purpose of the video detection is to determine whether the video to be detected can be stored in a preset video library, and in such situations, the video to be compared is a video in the preset video library, and if it is determined that the video to be detected and the video to be compared are repeated, a storage path of the video to be compared in the video library is output, so as to prompt a user that similar videos are already stored in the video library. If the video to be detected is determined not to be repeated with each video in the preset video library, the video to be detected and the text fragment information of the video to be detected can be stored in the video library.
The video detection device disclosed in the embodiments of the present application will be described below, and the video detection device described below and the video detection method described above may be referred to correspondingly.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a video detection device according to an embodiment of the present application. As shown in fig. 2, the video detecting apparatus may include:
a video determining unit 11, configured to determine a video to be detected and a video to be compared;
a text segment information determining unit 12, configured to determine text segment information of the video to be detected, where each text segment information of the video to be detected includes one text information, and a timestamp of the text information;
A text segment information obtaining unit 13, configured to obtain text segment information of the video to be compared, where each text segment information of the video to be compared includes a text information and a timestamp of the text information;
and the comparing unit 14 is used for comparing the text segment information of the video to be detected and the text segment information of the video to be compared and determining whether the video to be detected and the video to be compared are repeated.
As an embodiment, the text segment information determining unit includes:
a sub-audio fragment determining unit, configured to determine each sub-audio fragment in the video to be detected;
a sub-audio fragment processing unit, configured to determine, for each sub-audio fragment, text fragment information corresponding to the sub-audio fragment, where the text fragment information corresponding to each sub-audio fragment includes text information corresponding to the sub-audio fragment, and a timestamp of the text information corresponding to the sub-audio fragment; and combining the text segment information corresponding to each sub-audio segment into the text segment information of the video to be detected.
As an embodiment, the sub-audio clip processing unit includes:
A text determining unit, configured to determine a text corresponding to the sub-audio segment;
a starting time stamp determining unit, configured to determine a starting time stamp corresponding to a first word in a text corresponding to the sub-audio segment;
the text information determining unit is used for determining text information corresponding to the sub-audio fragments based on the text corresponding to the sub-audio fragments;
the time stamp determining unit is used for determining a starting time stamp corresponding to a first word in the text corresponding to the sub-audio fragment as a time stamp of text information corresponding to the sub-audio fragment; and combining the text information corresponding to the sub-audio fragments and the time stamps of the text information corresponding to the sub-audio fragments to obtain the text fragment information corresponding to the sub-audio fragments.
As an embodiment, the text determining unit includes:
the voice recognition unit is used for carrying out voice recognition on the sub-audio fragments to obtain a first text corresponding to the sub-audio fragments;
the subtitle identification unit is used for identifying the subtitle in the video segment corresponding to the sub-audio segment to obtain a second text corresponding to the sub-audio segment;
and the alignment and correction processing unit is used for performing alignment and correction processing on the first text and the second text to obtain the text corresponding to the sub-audio fragment.
As an embodiment, the start time stamp determining unit is specifically configured to:
determining a starting time stamp of each word in the first text in the sub-audio clip;
and determining the starting time stamp of the first word in the text corresponding to the sub-audio fragment in the sub-audio fragment based on the starting time stamp of each word in the first text in the sub-audio fragment.
As an embodiment, the comparing unit is specifically configured to:
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether target text information exists between the video to be detected and the video to be compared, wherein the target text information is the text information which appears in the video to be detected and the video to be compared;
if the video to be detected does not exist, determining that the video to be detected and the video to be compared are not repeated;
if so, determining a timestamp deviation corresponding to the target text information based on the timestamp of the target text information in the video to be detected and the timestamp of the target text information in the video to be compared for each target text information; and determining whether the video to be detected and the video to be compared are repeated or not based on the timestamp deviation corresponding to each target text message.
As an embodiment, the comparing unit is specifically configured to:
determining the number of target text messages corresponding to one time stamp deviation;
calculating the ratio of the maximum number to the number of the text segment information of the video to be detected;
if the ratio exceeds a preset threshold, determining that the video to be detected and the video to be compared are repeated;
and if the ratio does not exceed the preset threshold, determining that the video to be detected and the video to be compared are not repeated.
As an implementation manner, the video to be compared is a video in a preset video library, and if it is determined that the video to be detected and the video to be compared are repeated, the apparatus further includes:
and the output unit is used for outputting the storage path of the video to be compared in the video library.
As an implementation manner, if it is determined that the video to be detected is not repeated with each video in the preset video library, the apparatus further includes:
and the storage unit is used for storing the video to be detected and the text fragment information of the video to be detected into the video library.
Referring to fig. 3, fig. 3 is a block diagram of a hardware structure of a video detection device according to an embodiment of the present application, and referring to fig. 3, the hardware structure of the video detection device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
In the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete the communication with each other through the communication bus 4;
processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application, etc.;
the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;
wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:
determining a video to be detected and a video to be compared;
determining text segment information of the video to be detected, wherein each text segment information of the video to be detected comprises one text information and a time stamp of the text information;
acquiring text segment information of the video to be compared, wherein each text segment information of the video to be compared comprises a text message and a time stamp of the text message;
Comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated or not.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The embodiment of the present application also provides a readable storage medium storing a program adapted to be executed by a processor, the program being configured to:
determining a video to be detected and a video to be compared;
determining text segment information of the video to be detected, wherein each text segment information of the video to be detected comprises one text information and a time stamp of the text information;
acquiring text segment information of the video to be compared, wherein each text segment information of the video to be compared comprises a text message and a time stamp of the text message;
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated or not.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (12)
1. A method of video detection, the method comprising:
determining a video to be detected and a video to be compared;
determining text segment information of the video to be detected, wherein each text segment information of the video to be detected comprises one text information and a time stamp of the text information;
acquiring text segment information of the video to be compared, wherein each text segment information of the video to be compared comprises a text message and a time stamp of the text message;
Comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated or not.
2. The method of claim 1, wherein the determining text segment information of the video to be detected comprises:
determining each sub-audio fragment in the video to be detected;
determining text segment information corresponding to each sub-audio segment according to each sub-audio segment, wherein the text segment information corresponding to each sub-audio segment comprises text information corresponding to the sub-audio segment and a time stamp of the text information corresponding to the sub-audio segment;
and combining the text segment information corresponding to each sub-audio segment into the text segment information of the video to be detected.
3. The method of claim 2, wherein the determining text segment information corresponding to the sub-audio segment comprises:
determining a text corresponding to the sub-audio fragment;
determining a starting time stamp corresponding to a first word in a text corresponding to the sub-audio fragment;
determining text information corresponding to the sub-audio clips based on the text corresponding to the sub-audio clips;
Determining a starting time stamp corresponding to a first word in a text corresponding to the sub-audio fragment as a time stamp of text information corresponding to the sub-audio fragment; and combining the text information corresponding to the sub-audio fragments and the time stamps of the text information corresponding to the sub-audio fragments to obtain the text fragment information corresponding to the sub-audio fragments.
4. The method of claim 3, wherein the determining the text corresponding to the sub-audio piece comprises:
performing voice recognition on the sub-audio fragments to obtain first texts corresponding to the sub-audio fragments;
identifying subtitles in the video clips corresponding to the sub audio clips to obtain a second text corresponding to the sub audio clips;
and carrying out alignment and correction processing on the first text and the second text to obtain the text corresponding to the sub-audio fragment.
5. The method of claim 4, wherein determining the start timestamp corresponding to the first word in the text corresponding to the sub-audio piece comprises:
determining a starting time stamp of each word in the first text in the sub-audio clip;
and determining the starting time stamp of the first word in the text corresponding to the sub-audio fragment in the sub-audio fragment based on the starting time stamp of each word in the first text in the sub-audio fragment.
6. The method of claim 1, wherein the comparing the text segment information of the video to be detected with the text segment information of the video to be compared to determine whether the video to be detected and the video to be compared are repeated comprises:
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether target text information exists between the video to be detected and the video to be compared, wherein the target text information is the text information which appears in the video to be detected and the video to be compared;
if the video to be detected does not exist, determining that the video to be detected and the video to be compared are not repeated;
if so, determining a timestamp deviation corresponding to the target text information based on the timestamp of the target text information in the video to be detected and the timestamp of the target text information in the video to be compared for each target text information; and determining whether the video to be detected and the video to be compared are repeated or not based on the timestamp deviation corresponding to each target text message.
7. The method of claim 6, wherein determining whether the video to be detected and the video to be compared are repeated based on the timestamp bias corresponding to each target text message comprises:
Determining the number of target text messages corresponding to one time stamp deviation;
calculating the ratio of the maximum number to the number of the text segment information of the video to be detected;
if the ratio exceeds a preset threshold, determining that the video to be detected and the video to be compared are repeated;
and if the ratio does not exceed the preset threshold, determining that the video to be detected and the video to be compared are not repeated.
8. The method of claim 1, wherein the video to be compared is a video in a preset video library, and if it is determined that the video to be detected is repeated with the video to be compared, the method further comprises:
and outputting a storage path of the video to be compared in the video library.
9. The method of claim 8, wherein if it is determined that none of the videos to be detected and the respective videos in the preset video library are repeated, the method further comprises:
and storing the video to be detected and the text fragment information of the video to be detected into the video library.
10. A video detection apparatus, the apparatus comprising:
the video determining unit is used for determining a video to be detected and a video to be compared;
A text segment information determining unit, configured to determine text segment information of the video to be detected, where each text segment information of the video to be detected includes one text information, and a timestamp of the text information;
a text segment information obtaining unit, configured to obtain text segment information of the video to be compared, where each text segment information of the video to be compared includes a text information and a timestamp of the text information;
and the comparison unit is used for comparing the text segment information of the video to be detected with the text segment information of the video to be compared and determining whether the video to be detected and the video to be compared are repeated or not.
11. A video detection device comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the respective steps of the video detection method according to any one of claims 1 to 9.
12. A readable storage medium having stored thereon a computer program, which, when executed by a processor, implements the steps of the video detection method according to any of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310908926.XA CN116644212B (en) | 2023-07-24 | 2023-07-24 | Video detection method, device, equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310908926.XA CN116644212B (en) | 2023-07-24 | 2023-07-24 | Video detection method, device, equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116644212A true CN116644212A (en) | 2023-08-25 |
CN116644212B CN116644212B (en) | 2023-12-01 |
Family
ID=87640302
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310908926.XA Active CN116644212B (en) | 2023-07-24 | 2023-07-24 | Video detection method, device, equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116644212B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201121829D0 (en) * | 2010-12-20 | 2012-02-01 | Vaclik Paul P | A method of making text data associated with video data searchable |
US20150082349A1 (en) * | 2013-09-13 | 2015-03-19 | Arris Enterprises, Inc. | Content Based Video Content Segmentation |
CN104506933A (en) * | 2014-12-23 | 2015-04-08 | 方正宽带网络服务有限公司 | Method and device for verifying sameness of video files |
CN106973305A (en) * | 2017-03-20 | 2017-07-21 | 广东小天才科技有限公司 | The detection method and device of harmful content in a kind of video |
CN109905772A (en) * | 2019-03-12 | 2019-06-18 | 腾讯科技(深圳)有限公司 | Video clip querying method, device, computer equipment and storage medium |
CN110602566A (en) * | 2019-09-06 | 2019-12-20 | Oppo广东移动通信有限公司 | Matching method, terminal and readable storage medium |
CN110874526A (en) * | 2018-12-29 | 2020-03-10 | 北京安天网络安全技术有限公司 | File similarity detection method and device, electronic equipment and storage medium |
CN111143584A (en) * | 2019-12-20 | 2020-05-12 | 三盟科技股份有限公司 | Audio-visual content retrieval method and system |
CN112951275A (en) * | 2021-02-26 | 2021-06-11 | 北京百度网讯科技有限公司 | Voice quality inspection method and device, electronic equipment and medium |
CN113591530A (en) * | 2021-02-24 | 2021-11-02 | 腾讯科技(深圳)有限公司 | Video detection method and device, electronic equipment and storage medium |
CN115361377A (en) * | 2022-08-19 | 2022-11-18 | 中国联合网络通信集团有限公司 | File uploading method, user terminal, network disk server, equipment and medium |
CN116012753A (en) * | 2022-12-21 | 2023-04-25 | 平安银行股份有限公司 | Video processing method, device, computer equipment and computer readable storage medium |
-
2023
- 2023-07-24 CN CN202310908926.XA patent/CN116644212B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201121829D0 (en) * | 2010-12-20 | 2012-02-01 | Vaclik Paul P | A method of making text data associated with video data searchable |
US20150082349A1 (en) * | 2013-09-13 | 2015-03-19 | Arris Enterprises, Inc. | Content Based Video Content Segmentation |
CN104506933A (en) * | 2014-12-23 | 2015-04-08 | 方正宽带网络服务有限公司 | Method and device for verifying sameness of video files |
CN106973305A (en) * | 2017-03-20 | 2017-07-21 | 广东小天才科技有限公司 | The detection method and device of harmful content in a kind of video |
CN110874526A (en) * | 2018-12-29 | 2020-03-10 | 北京安天网络安全技术有限公司 | File similarity detection method and device, electronic equipment and storage medium |
CN109905772A (en) * | 2019-03-12 | 2019-06-18 | 腾讯科技(深圳)有限公司 | Video clip querying method, device, computer equipment and storage medium |
CN110602566A (en) * | 2019-09-06 | 2019-12-20 | Oppo广东移动通信有限公司 | Matching method, terminal and readable storage medium |
CN111143584A (en) * | 2019-12-20 | 2020-05-12 | 三盟科技股份有限公司 | Audio-visual content retrieval method and system |
CN113591530A (en) * | 2021-02-24 | 2021-11-02 | 腾讯科技(深圳)有限公司 | Video detection method and device, electronic equipment and storage medium |
CN112951275A (en) * | 2021-02-26 | 2021-06-11 | 北京百度网讯科技有限公司 | Voice quality inspection method and device, electronic equipment and medium |
CN115361377A (en) * | 2022-08-19 | 2022-11-18 | 中国联合网络通信集团有限公司 | File uploading method, user terminal, network disk server, equipment and medium |
CN116012753A (en) * | 2022-12-21 | 2023-04-25 | 平安银行股份有限公司 | Video processing method, device, computer equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116644212B (en) | 2023-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4580885B2 (en) | Scene information extraction method, scene extraction method, and extraction apparatus | |
KR101768509B1 (en) | On-line voice translation method and device | |
KR101199747B1 (en) | Word recognition method and recording medium of program recognition word and information process device | |
CN109817210B (en) | Voice writing method, device, terminal and storage medium | |
CN110674396B (en) | Text information processing method and device, electronic equipment and readable storage medium | |
JP5845764B2 (en) | Information processing apparatus and information processing program | |
CN111159546B (en) | Event pushing method, event pushing device, computer readable storage medium and computer equipment | |
CN109145149B (en) | Information alignment method, device, equipment and readable storage medium | |
CN112382295B (en) | Speech recognition method, device, equipment and readable storage medium | |
US20240169972A1 (en) | Synchronization method and apparatus for audio and text, device, and medium | |
WO2022166808A1 (en) | Text restoration method and apparatus, and electronic device | |
CN116644212B (en) | Video detection method, device, equipment and readable storage medium | |
CN111950267B (en) | Text triplet extraction method and device, electronic equipment and storage medium | |
CN113923479A (en) | Audio and video editing method and device | |
CN117336572A (en) | Video abstract generation method, device, computer equipment and storage medium | |
TWI699663B (en) | Segmentation method, segmentation system and non-transitory computer-readable medium | |
JP6358744B2 (en) | Speech recognition error correction device | |
CN116017088A (en) | Video subtitle processing method, device, electronic equipment and storage medium | |
CN114501159A (en) | Subtitle editing method and device, electronic equipment and storage medium | |
CN114373446A (en) | Conference language determination method and device and electronic equipment | |
CN116631447B (en) | Noise extraction method, device, equipment and readable storage medium | |
CN113688625A (en) | Language identification method and device | |
CN113271247B (en) | Information processing method, apparatus, device and storage medium | |
JP7105500B2 (en) | Computer-implemented Automatic Acquisition Method for Element Nouns in Chinese Patent Documents for Patent Documents Without Intercharacter Spaces | |
CN110717091B (en) | Entry data expansion method and device based on face recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |