CN116644212A - Video detection method, device, equipment and readable storage medium - Google Patents

Video detection method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN116644212A
CN116644212A CN202310908926.XA CN202310908926A CN116644212A CN 116644212 A CN116644212 A CN 116644212A CN 202310908926 A CN202310908926 A CN 202310908926A CN 116644212 A CN116644212 A CN 116644212A
Authority
CN
China
Prior art keywords
video
text
detected
information
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310908926.XA
Other languages
Chinese (zh)
Other versions
CN116644212B (en
Inventor
潘青华
丁杰
汪锦想
于振华
胡国平
刘聪
魏思
王士进
刘权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202310908926.XA priority Critical patent/CN116644212B/en
Publication of CN116644212A publication Critical patent/CN116644212A/en
Application granted granted Critical
Publication of CN116644212B publication Critical patent/CN116644212B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The application discloses a video detection method, a device, equipment and a readable storage medium. After determining the video to be detected and the video to be compared; firstly, determining text fragment information of a video to be detected; each piece of text segment information of the video to be detected comprises a piece of text information and a time stamp of the text information; then, obtaining text segment information of the video to be compared; each text segment information of the video to be compared comprises a text message and a time stamp of the text message; and finally, comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated. Based on the scheme, the number of the text fragments of the video is far smaller than that of the image frames of the video, and the text fragment information only comprises the text information and the time stamp thereof, so that the comparison of the text fragment information is simpler than the comparison of the image frames, and therefore, the video detection efficiency can be improved by adopting the scheme.

Description

Video detection method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of video processing technologies, and in particular, to a video detection method, apparatus, device, and readable storage medium.
Background
With the rapid development of social networks, video is becoming one of the dominant content modalities of the mobile internet. Because the video has the characteristics of strong participation, high propagation value and the like, the uploading amount of the video is larger and larger; it is therefore necessary to detect the video to determine whether the detected video constitutes a duplicate video with the uploaded video.
Currently, the video is detected from the picture aspect of the video by using an artificial intelligence technology to determine whether the detected video and the uploaded video form a repeated video, however, the video detection mode needs to compare the image frames of the two videos, and the number of the image frames of the video is numerous, so that the efficiency of video detection is low.
Therefore, how to provide a video detection method to improve the efficiency of video detection is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the present application provides a video detection method, apparatus, device and readable storage medium. The specific scheme is as follows:
a method of video detection, the method comprising:
determining a video to be detected and a video to be compared;
determining text segment information of the video to be detected, wherein each text segment information of the video to be detected comprises one text information and a time stamp of the text information;
Acquiring text segment information of the video to be compared, wherein each text segment information of the video to be compared comprises a text message and a time stamp of the text message;
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated or not.
Optionally, the determining text segment information of the video to be detected includes:
determining each sub-audio fragment in the video to be detected;
determining text segment information corresponding to each sub-audio segment according to each sub-audio segment, wherein the text segment information corresponding to each sub-audio segment comprises text information corresponding to the sub-audio segment and a time stamp of the text information corresponding to the sub-audio segment;
and combining the text segment information corresponding to each sub-audio segment into the text segment information of the video to be detected.
Optionally, the determining the text segment information corresponding to the sub-audio segment includes:
determining a text corresponding to the sub-audio fragment;
determining a starting time stamp corresponding to a first word in a text corresponding to the sub-audio fragment;
Determining text information corresponding to the sub-audio clips based on the text corresponding to the sub-audio clips;
determining a starting time stamp corresponding to a first word in a text corresponding to the sub-audio fragment as a time stamp of text information corresponding to the sub-audio fragment; and combining the text information corresponding to the sub-audio fragments and the time stamps of the text information corresponding to the sub-audio fragments to obtain the text fragment information corresponding to the sub-audio fragments.
Optionally, the determining the text corresponding to the sub-audio clip includes:
performing voice recognition on the sub-audio fragments to obtain first texts corresponding to the sub-audio fragments;
identifying subtitles in the video clips corresponding to the sub audio clips to obtain a second text corresponding to the sub audio clips;
and carrying out alignment and correction processing on the first text and the second text to obtain the text corresponding to the sub-audio fragment.
Optionally, the determining the start timestamp corresponding to the first word in the text corresponding to the sub-audio clip includes:
determining a starting time stamp of each word in the first text in the sub-audio clip;
and determining the starting time stamp of the first word in the text corresponding to the sub-audio fragment in the sub-audio fragment based on the starting time stamp of each word in the first text in the sub-audio fragment.
Optionally, the comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated includes:
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether target text information exists between the video to be detected and the video to be compared, wherein the target text information is the text information which appears in the video to be detected and the video to be compared;
if the video to be detected does not exist, determining that the video to be detected and the video to be compared are not repeated;
if so, determining a timestamp deviation corresponding to the target text information based on the timestamp of the target text information in the video to be detected and the timestamp of the target text information in the video to be compared for each target text information; and determining whether the video to be detected and the video to be compared are repeated or not based on the timestamp deviation corresponding to each target text message.
Optionally, the determining whether the video to be detected and the video to be compared are repeated based on the timestamp deviation corresponding to each target text information includes:
Determining the number of target text messages corresponding to one time stamp deviation;
calculating the ratio of the maximum number to the number of the text segment information of the video to be detected;
if the ratio exceeds a preset threshold, determining that the video to be detected and the video to be compared are repeated;
and if the ratio does not exceed the preset threshold, determining that the video to be detected and the video to be compared are not repeated.
Optionally, the video to be compared is a video in a preset video library, and if it is determined that the video to be detected and the video to be compared are repeated, the method further includes:
and outputting a storage path of the video to be compared in the video library.
Optionally, if it is determined that the video to be detected is not repeated with each video in the preset video library, the method further includes:
and storing the video to be detected and the text fragment information of the video to be detected into the video library.
A video detection device, the device comprising:
the video determining unit is used for determining a video to be detected and a video to be compared;
a text segment information determining unit, configured to determine text segment information of the video to be detected, where each text segment information of the video to be detected includes one text information, and a timestamp of the text information;
A text segment information obtaining unit, configured to obtain text segment information of the video to be compared, where each text segment information of the video to be compared includes a text information and a timestamp of the text information;
and the comparison unit is used for comparing the text segment information of the video to be detected with the text segment information of the video to be compared and determining whether the video to be detected and the video to be compared are repeated or not.
Optionally, the text segment information determining unit includes:
a sub-audio fragment determining unit, configured to determine each sub-audio fragment in the video to be detected;
a sub-audio fragment processing unit, configured to determine, for each sub-audio fragment, text fragment information corresponding to the sub-audio fragment, where the text fragment information corresponding to each sub-audio fragment includes text information corresponding to the sub-audio fragment, and a timestamp of the text information corresponding to the sub-audio fragment; and combining the text segment information corresponding to each sub-audio segment into the text segment information of the video to be detected.
Optionally, the sub-audio clip processing unit includes:
a text determining unit, configured to determine a text corresponding to the sub-audio segment;
A starting time stamp determining unit, configured to determine a starting time stamp corresponding to a first word in a text corresponding to the sub-audio segment;
the text information determining unit is used for determining text information corresponding to the sub-audio fragments based on the text corresponding to the sub-audio fragments;
the time stamp determining unit is used for determining a starting time stamp corresponding to a first word in the text corresponding to the sub-audio fragment as a time stamp of text information corresponding to the sub-audio fragment; and combining the text information corresponding to the sub-audio fragments and the time stamps of the text information corresponding to the sub-audio fragments to obtain the text fragment information corresponding to the sub-audio fragments.
Optionally, the text determining unit includes:
the voice recognition unit is used for carrying out voice recognition on the sub-audio fragments to obtain a first text corresponding to the sub-audio fragments;
the subtitle identification unit is used for identifying the subtitle in the video segment corresponding to the sub-audio segment to obtain a second text corresponding to the sub-audio segment;
and the alignment and correction processing unit is used for performing alignment and correction processing on the first text and the second text to obtain the text corresponding to the sub-audio fragment.
Optionally, the start timestamp determining unit is specifically configured to:
determining a starting time stamp of each word in the first text in the sub-audio clip;
and determining the starting time stamp of the first word in the text corresponding to the sub-audio fragment in the sub-audio fragment based on the starting time stamp of each word in the first text in the sub-audio fragment.
Optionally, the comparing unit is specifically configured to:
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether target text information exists between the video to be detected and the video to be compared, wherein the target text information is the text information which appears in the video to be detected and the video to be compared;
if the video to be detected does not exist, determining that the video to be detected and the video to be compared are not repeated;
if so, determining a timestamp deviation corresponding to the target text information based on the timestamp of the target text information in the video to be detected and the timestamp of the target text information in the video to be compared for each target text information; and determining whether the video to be detected and the video to be compared are repeated or not based on the timestamp deviation corresponding to each target text message.
Optionally, the comparing unit is specifically configured to:
determining the number of target text messages corresponding to one time stamp deviation;
calculating the ratio of the maximum number to the number of the text segment information of the video to be detected;
if the ratio exceeds a preset threshold, determining that the video to be detected and the video to be compared are repeated;
and if the ratio does not exceed the preset threshold, determining that the video to be detected and the video to be compared are not repeated.
Optionally, the video to be compared is a video in a preset video library, and if it is determined that the video to be detected and the video to be compared are repeated, the device further includes:
and the output unit is used for outputting the storage path of the video to be compared in the video library.
Optionally, if it is determined that the video to be detected is not repeated with each video in the preset video library, the apparatus further includes:
and the storage unit is used for storing the video to be detected and the text fragment information of the video to be detected into the video library.
A video detection device comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the video detection method as described above.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a video detection method as described above.
By means of the technical scheme, the application discloses a video detection method, a device, equipment and a readable storage medium. After determining the video to be detected and the video to be compared; firstly, determining text fragment information of a video to be detected; each piece of text segment information of the video to be detected comprises a piece of text information and a time stamp of the text information; then, obtaining text segment information of the video to be compared; each text segment information of the video to be compared comprises a text message and a time stamp of the text message; and finally, comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated. Based on the scheme, the number of the text fragments of the video is far smaller than that of the image frames of the video, and the text fragment information only comprises the text information and the time stamp thereof, so that the comparison of the text fragment information is simpler than the comparison of the image frames, and therefore, the video detection efficiency can be improved by adopting the scheme.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 is a schematic flow chart of a video detection method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a video detection device according to an embodiment of the present application;
fig. 3 is a block diagram of a hardware structure of a video detection device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Next, the video detection method provided by the present application will be described by the following embodiments.
Referring to fig. 1, fig. 1 is a schematic flow chart of a video detection method according to an embodiment of the present application, where the method may include:
step S101: and determining the video to be detected and the video to be compared.
In the application, the video to be detected and the video to be compared can be videos with any time length or any format, and the application is not limited in any way. In some situations, the purpose of the video detection is to determine whether the video to be detected can be stored in the preset video library, and in such situations, if the video to be detected is not repeated with the video in the preset video library, the video to be detected is stored in the preset video library so as to reduce the invalid occupation of the storage space of the preset video library, and in this case, the video to be compared can be any video stored in the preset video library.
Step S102: and determining the text segment information of the video to be detected, wherein each text segment information of the video to be detected comprises one text information and a time stamp of the text information.
It should be noted that, the text information may represent the text corresponding to the video segment, and the timestamp of the text information may represent at which time in the video segment the text corresponding to the video segment starts to appear.
In the application, the video to be detected can be divided into a plurality of video segments, and text segment information corresponding to each video segment is determined for each video segment, and the specific implementation manner will be described in detail through the following embodiments, which are not described here.
Step S103: and acquiring the text segment information of the video to be compared, wherein each text segment information of the video to be compared comprises a text message and a time stamp of the text message.
In the application, the preset video library can store videos and text fragment information of the videos, and the text fragment information of the videos to be compared can be obtained from the preset video library. It should be noted that, a video identifier may be added to each piece of text segment information of each video, so that the piece of text segment information of different videos may be distinguished by different video identifiers.
Step S104: comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated or not.
In the application, by comparing the text segment information of the video to be detected with the text segment information of the video to be compared, the similarity between the video to be detected and the video to be compared can be determined, and whether the video to be detected and the video to be compared are repeated or not can be determined based on the similarity between the video to be detected and the video to be compared, and the specific implementation manner will be described in detail through the following embodiments, which will not be described here.
The embodiment discloses a video detection method. After determining the video to be detected and the video to be compared; firstly, determining text fragment information of a video to be detected; each piece of text segment information of the video to be detected comprises a piece of text information and a time stamp of the text information; then, obtaining text segment information of the video to be compared; each text segment information of the video to be compared comprises a text message and a time stamp of the text message; and finally, comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated. Based on the scheme, the number of the text fragments of the video is far smaller than that of the image frames of the video, and the text fragment information only comprises the text information and the time stamp thereof, so that the comparison of the text fragment information is simpler than the comparison of the image frames, and therefore, the video detection efficiency can be improved by adopting the scheme.
In another embodiment of the present application, a specific implementation manner of determining the text segment information of the video to be detected in step S102 is described, and the manner may include the following steps:
Step S201: and determining each sub-audio fragment in the video to be detected.
In the application, the audio information of the video to be detected can be extracted first, and then the effective audio fragments can be extracted from the audio information of the video to be detected according to the VAD (Voice activity detection) technology to be used as each sub-audio fragment in the video to be detected.
Step S202: and determining text segment information corresponding to each sub-audio segment according to each sub-audio segment, wherein the text segment information corresponding to each sub-audio segment comprises the text information corresponding to the sub-audio segment and a time stamp of the text information corresponding to the sub-audio segment.
It should be noted that, a specific implementation manner of determining the text segment information corresponding to the sub-audio segment will be described in detail through the following embodiments, which will not be described here.
Step S203: and combining the text segment information corresponding to each sub-audio segment into the text segment information of the video to be detected.
In another embodiment of the present application, a specific implementation manner of determining the text segment information corresponding to the sub-audio segment in step S202 is described, where the method may include the following steps:
Step S301: and determining the text corresponding to the sub-audio fragment.
As an implementation manner, the specific implementation manner of determining the text corresponding to the sub-audio clip may include the following steps:
step S3011: and carrying out voice recognition on the sub-audio fragments to obtain a first text corresponding to the sub-audio fragments.
In the application, the sub-audio segment can be subjected to speech recognition by using an ASR (Automatic Speech Recognition ) technology to obtain a first text corresponding to the audio segment.
Step S3012: and identifying the subtitles in the video clips corresponding to the sub-audio clips to obtain a second text corresponding to the sub-audio clips.
In the application, the subtitle recognition can be carried out on the image frames in the video clips corresponding to the sub-audio clips by utilizing OCR (Optical Character Recognition ) to obtain the second text corresponding to the sub-audio clips.
Step S3013: and carrying out alignment and correction processing on the first text and the second text to obtain the text corresponding to the sub-audio fragment.
Considering the difference between the implementation principles of the ASR technology and the OCR technology, the first text corresponding to the obtained sub-audio fragment and the second text corresponding to the sub-audio fragment may not be completely consistent, so in the application, the first text and the second text need to be aligned and corrected to obtain the text corresponding to the sub-audio fragment.
As an implementation manner, in the present application, alignment of the first text and the second text may be implemented by using a text editing distance algorithm, and after the alignment of the first text and the second text, the first text and the second text may be corrected based on a preset correction rule, so as to obtain corrected text as a text corresponding to the sub-audio segment. The preset correction rules include, but are not limited to, alignment of sentence head positions, homophones with second text, homophones with first text, first text missing based on second text complement, second text missing based on first text complement, cutting off redundant words at the tail of the first text, english translation into Chinese subtitles with first text, and the like.
For ease of understanding, assume that the first text is: i turn to the fact that I have not yet sent out that some waiting for the woolen vegetable will be in the grass to assist I in ok; the second text is: i do not leave until I's certain waiting party is good in the grass and pick up I's assistance
Firstly, aligning the first text and the second text by adopting a text editing distance, and then, the following result is obtained:
i's turn not yet going out to have a waiting for the vegetable to be in the grass to help I's ok-y
I do not leave until I's certain waiting party is good in the grass and pick up I's assistance
After the first text and the second text are aligned, the first text and the second text may be corrected based on a preset correction rule, which is specifically as follows:
rule one, alignment of sentence head position
As in the example above, this will be identified as the position of the sentence head.
Rule two, homophones are based on the second text
As in the above examples "somewhere in dish" and "somewhere in zei" will be subject to "somewhere in zei".
Rule III, the homotypic word is based on the first text
As in the examples above, "give me" and "pick me" will take control of "give me".
Rule four, first text word loss based on second text completion
Based on "waiting for meeting" to complement "waiting for meeting" in the above example "
Rule five, second text word loss based on first text completion
The "woolen" is completed after the "me not yet departing" in the first text based on the "me not yet departing" in the first text as in the above example "
Rule six, the redundant word of the tail part of the first text is truncated
The first text tail is truncated as in the example above
Rule seven, english translation into Chinese subtitle based on first text
As in the above example, "ok" controls.
Correcting the first text and the second text based on a preset correction rule, and obtaining corrected text as follows: i have not yet started until you have a certain waiting meeting, and have a good help for I to look in the grass
Step S302: and determining a starting time stamp corresponding to a first word in the text corresponding to the sub-audio fragment.
As an implementation manner, the implementation manner of determining the start timestamp corresponding to the first word in the text corresponding to the sub-audio segment may be: determining a starting time stamp of each word in the first text in the sub-audio clip; and determining the starting time stamp of the first word in the text corresponding to the sub-audio fragment in the sub-audio fragment based on the starting time stamp of each word in the first text in the sub-audio fragment.
In the application, the first text and the corresponding sub-audio fragment can be forcedly aligned based on the forceful alignment technology in voice recognition, so as to determine the starting time stamp of each word in the first text in the sub-audio fragment. After determining the text corresponding to the sub-audio segment, determining which word in the first text corresponds to the first word in the text, and taking the starting time stamp of the word in the sub-audio segment as the starting time stamp corresponding to the first word in the text corresponding to the sub-audio segment.
For easy understanding, based on the above example, the first word in the text corresponding to the sub-audio clip is "me" in "me", and then it is determined that the start timestamp corresponding to "me" in the first text is the start timestamp corresponding to "me" in the text corresponding to the sub-audio clip.
Step S303: and determining text information corresponding to the sub-audio fragments based on the text corresponding to the sub-audio fragments.
As an implementation manner, a hash value of the text corresponding to the sub-audio segment may be calculated as the text information corresponding to the sub-audio segment.
Step S304: and determining a starting time stamp corresponding to a first word in the text corresponding to the sub-audio fragment as a time stamp of the text information corresponding to the sub-audio fragment, and combining the text information corresponding to the sub-audio fragment and the time stamp of the text information corresponding to the sub-audio fragment to obtain the text fragment information corresponding to the sub-audio fragment.
In another embodiment of the present application, a detailed description is given of a specific implementation manner of comparing the text segment information of the video to be detected and the text segment information of the video to be compared in step S104 to determine whether the video to be detected and the video to be compared are repeated, where the method may include the following steps:
Step S401: comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether target text information exists between the video to be detected and the video to be compared, wherein the target text information is the text information which appears in the video to be detected and the video to be compared; if not, executing step S402; if so, step S403 is performed;
for ease of understanding, it is assumed that the text clip information of the video to be detected includes two pieces, one of which is abc, whose corresponding time stamp is 10, and the other of which is bcd, whose corresponding time stamp is 20;
the text segment information of the video to be compared comprises three pieces, wherein one piece of text information is abc, the corresponding time stamp of the one piece of text information is 60, the other piece of text information is bcd, the corresponding time stamp of the other piece of text information is 70, the other piece of text information is abc, and the corresponding time stamp of the other piece of text information is 90;
wherein the text information abc and bcd are target text information.
Step S402: and determining that the video to be detected and the video to be compared are not repeated.
Step S403: determining a timestamp deviation corresponding to the target text information based on the timestamp of the target text information in the video to be detected and the timestamp of the target text information in the video to be compared aiming at each target text information; and determining whether the video to be detected and the video to be compared are repeated or not based on the timestamp deviation corresponding to each target text message.
The determining whether the video to be detected and the video to be compared are repeated based on the timestamp deviation corresponding to each target text information comprises the following steps: determining the number of target text messages corresponding to one time stamp deviation; calculating the ratio of the maximum number to the number of the text segment information of the video to be detected; if the ratio exceeds a preset threshold, determining that the video to be detected and the video to be compared are repeated; and if the ratio does not exceed the preset threshold, determining that the video to be detected and the video to be compared are not repeated.
For ease of understanding, taking the example of step S401 as an example, assume that the preset threshold value is 80%, the timestamp deviation of the text information abc is 50 and 80, and the timestamp deviation of the text information bcd is 50. And if the number of the target text messages corresponding to the timestamp deviation 50 is 2, the number of the target text messages corresponding to the timestamp deviation 80 is 1, the maximum number of the target text messages corresponding to the timestamp deviation is 2, the number of the text segment messages of the video to be detected is 2, the ratio of the maximum number to the number of the text segment messages of the video to be detected is 100%, and the ratio is greater than a preset threshold, and the video to be detected and the video to be compared are determined to be repeated.
In some situations, the purpose of the video detection is to determine whether the video to be detected can be stored in a preset video library, and in such situations, the video to be compared is a video in the preset video library, and if it is determined that the video to be detected and the video to be compared are repeated, a storage path of the video to be compared in the video library is output, so as to prompt a user that similar videos are already stored in the video library. If the video to be detected is determined not to be repeated with each video in the preset video library, the video to be detected and the text fragment information of the video to be detected can be stored in the video library.
The video detection device disclosed in the embodiments of the present application will be described below, and the video detection device described below and the video detection method described above may be referred to correspondingly.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a video detection device according to an embodiment of the present application. As shown in fig. 2, the video detecting apparatus may include:
a video determining unit 11, configured to determine a video to be detected and a video to be compared;
a text segment information determining unit 12, configured to determine text segment information of the video to be detected, where each text segment information of the video to be detected includes one text information, and a timestamp of the text information;
A text segment information obtaining unit 13, configured to obtain text segment information of the video to be compared, where each text segment information of the video to be compared includes a text information and a timestamp of the text information;
and the comparing unit 14 is used for comparing the text segment information of the video to be detected and the text segment information of the video to be compared and determining whether the video to be detected and the video to be compared are repeated.
As an embodiment, the text segment information determining unit includes:
a sub-audio fragment determining unit, configured to determine each sub-audio fragment in the video to be detected;
a sub-audio fragment processing unit, configured to determine, for each sub-audio fragment, text fragment information corresponding to the sub-audio fragment, where the text fragment information corresponding to each sub-audio fragment includes text information corresponding to the sub-audio fragment, and a timestamp of the text information corresponding to the sub-audio fragment; and combining the text segment information corresponding to each sub-audio segment into the text segment information of the video to be detected.
As an embodiment, the sub-audio clip processing unit includes:
A text determining unit, configured to determine a text corresponding to the sub-audio segment;
a starting time stamp determining unit, configured to determine a starting time stamp corresponding to a first word in a text corresponding to the sub-audio segment;
the text information determining unit is used for determining text information corresponding to the sub-audio fragments based on the text corresponding to the sub-audio fragments;
the time stamp determining unit is used for determining a starting time stamp corresponding to a first word in the text corresponding to the sub-audio fragment as a time stamp of text information corresponding to the sub-audio fragment; and combining the text information corresponding to the sub-audio fragments and the time stamps of the text information corresponding to the sub-audio fragments to obtain the text fragment information corresponding to the sub-audio fragments.
As an embodiment, the text determining unit includes:
the voice recognition unit is used for carrying out voice recognition on the sub-audio fragments to obtain a first text corresponding to the sub-audio fragments;
the subtitle identification unit is used for identifying the subtitle in the video segment corresponding to the sub-audio segment to obtain a second text corresponding to the sub-audio segment;
and the alignment and correction processing unit is used for performing alignment and correction processing on the first text and the second text to obtain the text corresponding to the sub-audio fragment.
As an embodiment, the start time stamp determining unit is specifically configured to:
determining a starting time stamp of each word in the first text in the sub-audio clip;
and determining the starting time stamp of the first word in the text corresponding to the sub-audio fragment in the sub-audio fragment based on the starting time stamp of each word in the first text in the sub-audio fragment.
As an embodiment, the comparing unit is specifically configured to:
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether target text information exists between the video to be detected and the video to be compared, wherein the target text information is the text information which appears in the video to be detected and the video to be compared;
if the video to be detected does not exist, determining that the video to be detected and the video to be compared are not repeated;
if so, determining a timestamp deviation corresponding to the target text information based on the timestamp of the target text information in the video to be detected and the timestamp of the target text information in the video to be compared for each target text information; and determining whether the video to be detected and the video to be compared are repeated or not based on the timestamp deviation corresponding to each target text message.
As an embodiment, the comparing unit is specifically configured to:
determining the number of target text messages corresponding to one time stamp deviation;
calculating the ratio of the maximum number to the number of the text segment information of the video to be detected;
if the ratio exceeds a preset threshold, determining that the video to be detected and the video to be compared are repeated;
and if the ratio does not exceed the preset threshold, determining that the video to be detected and the video to be compared are not repeated.
As an implementation manner, the video to be compared is a video in a preset video library, and if it is determined that the video to be detected and the video to be compared are repeated, the apparatus further includes:
and the output unit is used for outputting the storage path of the video to be compared in the video library.
As an implementation manner, if it is determined that the video to be detected is not repeated with each video in the preset video library, the apparatus further includes:
and the storage unit is used for storing the video to be detected and the text fragment information of the video to be detected into the video library.
Referring to fig. 3, fig. 3 is a block diagram of a hardware structure of a video detection device according to an embodiment of the present application, and referring to fig. 3, the hardware structure of the video detection device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
In the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete the communication with each other through the communication bus 4;
processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application, etc.;
the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;
wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:
determining a video to be detected and a video to be compared;
determining text segment information of the video to be detected, wherein each text segment information of the video to be detected comprises one text information and a time stamp of the text information;
acquiring text segment information of the video to be compared, wherein each text segment information of the video to be compared comprises a text message and a time stamp of the text message;
Comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated or not.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The embodiment of the present application also provides a readable storage medium storing a program adapted to be executed by a processor, the program being configured to:
determining a video to be detected and a video to be compared;
determining text segment information of the video to be detected, wherein each text segment information of the video to be detected comprises one text information and a time stamp of the text information;
acquiring text segment information of the video to be compared, wherein each text segment information of the video to be compared comprises a text message and a time stamp of the text message;
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated or not.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

1. A method of video detection, the method comprising:
determining a video to be detected and a video to be compared;
determining text segment information of the video to be detected, wherein each text segment information of the video to be detected comprises one text information and a time stamp of the text information;
acquiring text segment information of the video to be compared, wherein each text segment information of the video to be compared comprises a text message and a time stamp of the text message;
Comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether the video to be detected and the video to be compared are repeated or not.
2. The method of claim 1, wherein the determining text segment information of the video to be detected comprises:
determining each sub-audio fragment in the video to be detected;
determining text segment information corresponding to each sub-audio segment according to each sub-audio segment, wherein the text segment information corresponding to each sub-audio segment comprises text information corresponding to the sub-audio segment and a time stamp of the text information corresponding to the sub-audio segment;
and combining the text segment information corresponding to each sub-audio segment into the text segment information of the video to be detected.
3. The method of claim 2, wherein the determining text segment information corresponding to the sub-audio segment comprises:
determining a text corresponding to the sub-audio fragment;
determining a starting time stamp corresponding to a first word in a text corresponding to the sub-audio fragment;
determining text information corresponding to the sub-audio clips based on the text corresponding to the sub-audio clips;
Determining a starting time stamp corresponding to a first word in a text corresponding to the sub-audio fragment as a time stamp of text information corresponding to the sub-audio fragment; and combining the text information corresponding to the sub-audio fragments and the time stamps of the text information corresponding to the sub-audio fragments to obtain the text fragment information corresponding to the sub-audio fragments.
4. The method of claim 3, wherein the determining the text corresponding to the sub-audio piece comprises:
performing voice recognition on the sub-audio fragments to obtain first texts corresponding to the sub-audio fragments;
identifying subtitles in the video clips corresponding to the sub audio clips to obtain a second text corresponding to the sub audio clips;
and carrying out alignment and correction processing on the first text and the second text to obtain the text corresponding to the sub-audio fragment.
5. The method of claim 4, wherein determining the start timestamp corresponding to the first word in the text corresponding to the sub-audio piece comprises:
determining a starting time stamp of each word in the first text in the sub-audio clip;
and determining the starting time stamp of the first word in the text corresponding to the sub-audio fragment in the sub-audio fragment based on the starting time stamp of each word in the first text in the sub-audio fragment.
6. The method of claim 1, wherein the comparing the text segment information of the video to be detected with the text segment information of the video to be compared to determine whether the video to be detected and the video to be compared are repeated comprises:
comparing the text segment information of the video to be detected with the text segment information of the video to be compared, and determining whether target text information exists between the video to be detected and the video to be compared, wherein the target text information is the text information which appears in the video to be detected and the video to be compared;
if the video to be detected does not exist, determining that the video to be detected and the video to be compared are not repeated;
if so, determining a timestamp deviation corresponding to the target text information based on the timestamp of the target text information in the video to be detected and the timestamp of the target text information in the video to be compared for each target text information; and determining whether the video to be detected and the video to be compared are repeated or not based on the timestamp deviation corresponding to each target text message.
7. The method of claim 6, wherein determining whether the video to be detected and the video to be compared are repeated based on the timestamp bias corresponding to each target text message comprises:
Determining the number of target text messages corresponding to one time stamp deviation;
calculating the ratio of the maximum number to the number of the text segment information of the video to be detected;
if the ratio exceeds a preset threshold, determining that the video to be detected and the video to be compared are repeated;
and if the ratio does not exceed the preset threshold, determining that the video to be detected and the video to be compared are not repeated.
8. The method of claim 1, wherein the video to be compared is a video in a preset video library, and if it is determined that the video to be detected is repeated with the video to be compared, the method further comprises:
and outputting a storage path of the video to be compared in the video library.
9. The method of claim 8, wherein if it is determined that none of the videos to be detected and the respective videos in the preset video library are repeated, the method further comprises:
and storing the video to be detected and the text fragment information of the video to be detected into the video library.
10. A video detection apparatus, the apparatus comprising:
the video determining unit is used for determining a video to be detected and a video to be compared;
A text segment information determining unit, configured to determine text segment information of the video to be detected, where each text segment information of the video to be detected includes one text information, and a timestamp of the text information;
a text segment information obtaining unit, configured to obtain text segment information of the video to be compared, where each text segment information of the video to be compared includes a text information and a timestamp of the text information;
and the comparison unit is used for comparing the text segment information of the video to be detected with the text segment information of the video to be compared and determining whether the video to be detected and the video to be compared are repeated or not.
11. A video detection device comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the respective steps of the video detection method according to any one of claims 1 to 9.
12. A readable storage medium having stored thereon a computer program, which, when executed by a processor, implements the steps of the video detection method according to any of claims 1 to 9.
CN202310908926.XA 2023-07-24 2023-07-24 Video detection method, device, equipment and readable storage medium Active CN116644212B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310908926.XA CN116644212B (en) 2023-07-24 2023-07-24 Video detection method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310908926.XA CN116644212B (en) 2023-07-24 2023-07-24 Video detection method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN116644212A true CN116644212A (en) 2023-08-25
CN116644212B CN116644212B (en) 2023-12-01

Family

ID=87640302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310908926.XA Active CN116644212B (en) 2023-07-24 2023-07-24 Video detection method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN116644212B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201121829D0 (en) * 2010-12-20 2012-02-01 Vaclik Paul P A method of making text data associated with video data searchable
US20150082349A1 (en) * 2013-09-13 2015-03-19 Arris Enterprises, Inc. Content Based Video Content Segmentation
CN104506933A (en) * 2014-12-23 2015-04-08 方正宽带网络服务有限公司 Method and device for verifying sameness of video files
CN106973305A (en) * 2017-03-20 2017-07-21 广东小天才科技有限公司 The detection method and device of harmful content in a kind of video
CN109905772A (en) * 2019-03-12 2019-06-18 腾讯科技(深圳)有限公司 Video clip querying method, device, computer equipment and storage medium
CN110602566A (en) * 2019-09-06 2019-12-20 Oppo广东移动通信有限公司 Matching method, terminal and readable storage medium
CN110874526A (en) * 2018-12-29 2020-03-10 北京安天网络安全技术有限公司 File similarity detection method and device, electronic equipment and storage medium
CN111143584A (en) * 2019-12-20 2020-05-12 三盟科技股份有限公司 Audio-visual content retrieval method and system
CN112951275A (en) * 2021-02-26 2021-06-11 北京百度网讯科技有限公司 Voice quality inspection method and device, electronic equipment and medium
CN113591530A (en) * 2021-02-24 2021-11-02 腾讯科技(深圳)有限公司 Video detection method and device, electronic equipment and storage medium
CN115361377A (en) * 2022-08-19 2022-11-18 中国联合网络通信集团有限公司 File uploading method, user terminal, network disk server, equipment and medium
CN116012753A (en) * 2022-12-21 2023-04-25 平安银行股份有限公司 Video processing method, device, computer equipment and computer readable storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201121829D0 (en) * 2010-12-20 2012-02-01 Vaclik Paul P A method of making text data associated with video data searchable
US20150082349A1 (en) * 2013-09-13 2015-03-19 Arris Enterprises, Inc. Content Based Video Content Segmentation
CN104506933A (en) * 2014-12-23 2015-04-08 方正宽带网络服务有限公司 Method and device for verifying sameness of video files
CN106973305A (en) * 2017-03-20 2017-07-21 广东小天才科技有限公司 The detection method and device of harmful content in a kind of video
CN110874526A (en) * 2018-12-29 2020-03-10 北京安天网络安全技术有限公司 File similarity detection method and device, electronic equipment and storage medium
CN109905772A (en) * 2019-03-12 2019-06-18 腾讯科技(深圳)有限公司 Video clip querying method, device, computer equipment and storage medium
CN110602566A (en) * 2019-09-06 2019-12-20 Oppo广东移动通信有限公司 Matching method, terminal and readable storage medium
CN111143584A (en) * 2019-12-20 2020-05-12 三盟科技股份有限公司 Audio-visual content retrieval method and system
CN113591530A (en) * 2021-02-24 2021-11-02 腾讯科技(深圳)有限公司 Video detection method and device, electronic equipment and storage medium
CN112951275A (en) * 2021-02-26 2021-06-11 北京百度网讯科技有限公司 Voice quality inspection method and device, electronic equipment and medium
CN115361377A (en) * 2022-08-19 2022-11-18 中国联合网络通信集团有限公司 File uploading method, user terminal, network disk server, equipment and medium
CN116012753A (en) * 2022-12-21 2023-04-25 平安银行股份有限公司 Video processing method, device, computer equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN116644212B (en) 2023-12-01

Similar Documents

Publication Publication Date Title
JP4580885B2 (en) Scene information extraction method, scene extraction method, and extraction apparatus
KR101768509B1 (en) On-line voice translation method and device
KR101199747B1 (en) Word recognition method and recording medium of program recognition word and information process device
CN109817210B (en) Voice writing method, device, terminal and storage medium
CN110674396B (en) Text information processing method and device, electronic equipment and readable storage medium
JP5845764B2 (en) Information processing apparatus and information processing program
CN111159546B (en) Event pushing method, event pushing device, computer readable storage medium and computer equipment
CN109145149B (en) Information alignment method, device, equipment and readable storage medium
CN112382295B (en) Speech recognition method, device, equipment and readable storage medium
US20240169972A1 (en) Synchronization method and apparatus for audio and text, device, and medium
WO2022166808A1 (en) Text restoration method and apparatus, and electronic device
CN116644212B (en) Video detection method, device, equipment and readable storage medium
CN111950267B (en) Text triplet extraction method and device, electronic equipment and storage medium
CN113923479A (en) Audio and video editing method and device
CN117336572A (en) Video abstract generation method, device, computer equipment and storage medium
TWI699663B (en) Segmentation method, segmentation system and non-transitory computer-readable medium
JP6358744B2 (en) Speech recognition error correction device
CN116017088A (en) Video subtitle processing method, device, electronic equipment and storage medium
CN114501159A (en) Subtitle editing method and device, electronic equipment and storage medium
CN114373446A (en) Conference language determination method and device and electronic equipment
CN116631447B (en) Noise extraction method, device, equipment and readable storage medium
CN113688625A (en) Language identification method and device
CN113271247B (en) Information processing method, apparatus, device and storage medium
JP7105500B2 (en) Computer-implemented Automatic Acquisition Method for Element Nouns in Chinese Patent Documents for Patent Documents Without Intercharacter Spaces
CN110717091B (en) Entry data expansion method and device based on face recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant