CN114143613A - Video subtitle time alignment method, system and storage medium - Google Patents

Video subtitle time alignment method, system and storage medium Download PDF

Info

Publication number
CN114143613A
CN114143613A CN202111470116.8A CN202111470116A CN114143613A CN 114143613 A CN114143613 A CN 114143613A CN 202111470116 A CN202111470116 A CN 202111470116A CN 114143613 A CN114143613 A CN 114143613A
Authority
CN
China
Prior art keywords
paragraph
ocr recognition
recognition result
sentence
substring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111470116.8A
Other languages
Chinese (zh)
Other versions
CN114143613B (en
Inventor
程梓益
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Moviebook Science And Technology Co ltd
Original Assignee
Beijing Moviebook Science And Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Moviebook Science And Technology Co ltd filed Critical Beijing Moviebook Science And Technology Co ltd
Priority to CN202111470116.8A priority Critical patent/CN114143613B/en
Publication of CN114143613A publication Critical patent/CN114143613A/en
Application granted granted Critical
Publication of CN114143613B publication Critical patent/CN114143613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

The application discloses a video subtitle time alignment method, a system and a storage medium. The method comprises the steps of firstly, obtaining an original video with subtitles and a description text, wherein the content of the description text corresponds to the content of the subtitles in the original video; intercepting a subtitle area in an original video according to a preset frame taking interval time to obtain a subtitle area image set; inputting the subtitle region image set into an OCR recognition model for OCR recognition to obtain an OCR recognition result with a time stamp; matching the OCR recognition result with each paragraph of the description text through a common substring algorithm, and determining a first sentence and a last sentence of the OCR recognition result in each paragraph; and determining the duration of each paragraph of the description text in the original video according to the timestamp corresponding to the first sentence and the last sentence of each paragraph. Therefore, the technical scheme provided by the embodiment of the application improves the accuracy of time matching between the video subtitles and the description text.

Description

Video subtitle time alignment method, system and storage medium
Technical Field
The present invention relates to the field of multimedia technologies, and in particular, to a method, a system, and a storage medium for video subtitle time alignment.
Background
With the continuous development of internet technology and multimedia technology, video is popular among many users as one of information carriers. In order to better display video content, subtitles corresponding to video are generally displayed simultaneously when a user watches the video, and description text corresponding to the video subtitles also exists, however, the description text is generally divided into several or even more than ten text segments.
In the prior art, when paragraphs in description texts are time-matched with video subtitles, a common method is to use OCR to recognize characters of a current frame in a video, record the current time, and then match the current time with a corresponding text, but the common method cannot automatically complete the task because of wrongly written characters, the existence of uncommon words and the interference of a video background.
Disclosure of Invention
Based on this, the embodiments of the present application provide a method, a system, and a storage medium for video subtitle time alignment, which can improve the accuracy of time matching between a video subtitle and a description text.
In a first aspect, a method for video subtitle time alignment is provided, the method including:
acquiring an original video with subtitles and a description text, wherein the content of the description text corresponds to the content of the subtitles in the original video;
intercepting a subtitle region in the original video according to a preset frame taking interval time to obtain a subtitle region image set, wherein the subtitle region image set comprises a corresponding timestamp in the original video;
inputting the subtitle region image set into an OCR recognition model for OCR recognition to obtain an OCR recognition result with a time stamp;
matching the OCR recognition result with each paragraph of the description text through a common substring algorithm, and determining a head sentence and a tail sentence of the OCR recognition result in each paragraph;
and determining the duration of each paragraph of the description text in the original video according to the timestamp corresponding to the first sentence and the last sentence of each paragraph.
Optionally, matching the OCR recognition result with each paragraph of the description text by using a common substring algorithm, and determining a first sentence of each paragraph of the OCR recognition result, including:
comparing the OCR recognition result with the target paragraph to find out all continuous common substrings, and selecting a first substring from the common substrings, wherein the first substring is used for representing a first continuous common substring;
when the first sub-string is in a starting character range in the target paragraph, performing character comparison on an OCR recognition result corresponding to the first sub-string and characters in the starting character range;
when the substring obtained by character comparison is smaller than the first sentence threshold value, the substring obtained by current character comparison is used as the first sentence in the target paragraph;
and traversing each paragraph of the description text, and determining the first sentence of each paragraph of the OCR recognition result.
Optionally, the character comparison between the OCR recognition result and the target paragraph to find out all the continuous common substrings, and after selecting the first substring, the method further includes:
and when the first substring is in the end character range of the target paragraph, taking the time stamp of the OCR recognition result corresponding to the first substring as the starting time of the next paragraph of the target paragraph.
Optionally, matching the OCR recognition result with each paragraph of the description text by a common substring algorithm, and determining a tail sentence of the OCR recognition result in each paragraph, including:
comparing the OCR recognition result with the target paragraph to find out all continuous common substrings, and selecting a tail substring from the common substrings, wherein the tail substring is used for representing the last continuous common substring;
when the tail string is in the ending character range in the target paragraph, performing character comparison on the OCR recognition result corresponding to the tail string and characters in the ending character range;
when the substring obtained by character comparison is smaller than the clause threshold, the substring obtained by current character comparison is used as the clause in the target paragraph;
and traversing each paragraph of the description text, and determining the tail sentence of the OCR recognition result in each paragraph.
Optionally, the character comparison between the OCR recognition result and the target paragraph to find out all the continuous common substrings, and after selecting the tail substring, the method further includes:
and when the tail substring is in the range of the initial characters in the target paragraph, taking the time stamp of the OCR recognition result corresponding to the tail substring as the end time of a section on the target paragraph.
Optionally, inputting the subtitle region image set into an OCR recognition model for OCR recognition to obtain a time-stamped OCR recognition result, including:
and checking the OCR recognition result, and matching and storing the OCR recognition result which contains Chinese and has the confidence coefficient larger than a preset threshold value.
Optionally, determining a duration of the description text corresponding to the original video according to timestamps corresponding to a beginning sentence and an end sentence of the description text, further including:
and when the timestamp corresponding to the first sentence is overlapped with the timestamp corresponding to the last sentence, taking the duration after the time ranges are combined as an output result.
Optionally, the description text includes wrongly written words and/or uncommon words.
In a second aspect, there is provided a video subtitle time alignment apparatus, including:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an original video with subtitles and a description text, and the content of the description text corresponds to the content of the subtitles in the original video;
the capturing module is used for capturing a subtitle region in the original video according to a preset frame capturing interval time to obtain a subtitle region image set, wherein the subtitle region image set comprises a corresponding timestamp in the original video;
the recognition module is used for inputting the subtitle region image set into an OCR recognition model for OCR recognition to obtain an OCR recognition result with a time stamp;
the matching module is used for matching the OCR recognition result with each paragraph of the description text through a common substring algorithm to determine a head sentence and a tail sentence of the OCR recognition result in each paragraph;
and the determining module is used for determining the duration of each paragraph of the description text in the original video according to the timestamp corresponding to the head sentence and the tail sentence of each paragraph.
In a third aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the video subtitle time alignment method according to any one of the above-mentioned first aspects.
According to the technical scheme provided by the embodiment of the application, an original video with subtitles and a description text are obtained, wherein the content of the description text corresponds to the content of the subtitles in the original video; intercepting a subtitle area in an original video according to a preset frame taking interval time to obtain a subtitle area image set; inputting the subtitle region image set into an OCR recognition model for OCR recognition to obtain an OCR recognition result with a time stamp; matching the OCR recognition result with each paragraph of the description text through a common substring algorithm, and determining a first sentence and a last sentence of the OCR recognition result in each paragraph; and determining the duration of each paragraph of the description text in the original video according to the timestamp corresponding to the first sentence and the last sentence of each paragraph. Therefore, the technical scheme provided by the embodiment of the application solves the problem of time matching of the video subtitles caused by the existence of wrongly written characters, rarely written characters and video background interference, and improves the accuracy of time matching of the video subtitles and the description text.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
Fig. 1 is a flowchart illustrating steps of a video subtitle time alignment method according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of an original video with subtitles according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a descriptive text provided by an embodiment of the present application;
fig. 4 is a schematic diagram of a subtitle region image including wrongly written characters according to an embodiment of the present application;
FIG. 5 is a flowchart of steps provided in an alternative embodiment of the present application;
fig. 6 is a block diagram of a video subtitle time alignment system according to an embodiment of the present application.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
To facilitate understanding of the present embodiment, a detailed description will be first given of a video subtitle time alignment method disclosed in the embodiments of the present application.
First, an application scenario in which the embodiment of the present application is located is introduced: giving a video and a description text, the text content corresponds to the subtitles in the video, but the text has wrongly written characters, the text has been divided into segments of about 200 characters each, in order to automatically mark the duration of each segment of text in the video.
Referring to fig. 1, a flowchart of a video subtitle time alignment method provided by an embodiment of the present application is shown, where the method may include the following steps:
step 101, obtaining an original video with subtitles and a description text.
Wherein the content of the descriptive text corresponds to the content of the subtitles in the original video.
In the embodiment of the present application, an original video with subtitles and a description text are obtained as shown in fig. 2 and fig. 3, the text content corresponds to the subtitles of the video, where LF represents a line break displayed by a text editor, as shown in fig. 4, there are many wrongly written characters in the text, as shown in the figure, "mark" of the text should correspond to a "peak" of the subtitles, and there may be a case that matching is unsuccessful when matching is performed by an existing text matching method due to the existence of the wrongly written characters.
And step 102, intercepting a subtitle area in the original video according to a preset frame taking interval time to obtain a subtitle area image set.
And 103, inputting the subtitle region image set into an OCR recognition model for OCR recognition to obtain an OCR recognition result with a time stamp.
The subtitle region image set includes a corresponding timestamp in the original video, and the preset frame taking interval time may be one second.
In the embodiment of the application, a complete original video with subtitles is input, one frame is taken every second, the subtitle area is intercepted every frame, OCR recognition is input, for the output result of the OCR, whether Chinese is contained or not is checked, the confidence coefficient is larger than 0.99, and all historical OCR results are saved for duplication elimination.
And 104, matching the OCR recognition result with each paragraph of the description text through a common substring algorithm, and determining a head sentence and a tail sentence of the OCR recognition result in each paragraph.
In the embodiment of the application, the result of OCR recognition is matched with the current text paragraph, whether the OCR result is in the text or not needs to be checked, and a specific position needs to be determined.
The common substring algorithm principle is as follows: inputting a character string A and a character string B, and comparing each character in A with the character in B in turn to find out all continuous substrings. For example, input a ═ acccdc ', B ═ ACGSBCDEF', and output 'AC' and 'BCD'.
In the embodiment of the application, character comparison is carried out on an OCR recognition result and a target paragraph, all continuous common substrings are found out, a first substring is selected, and the first substring is used for representing a first continuous common substring; when the first substring is in the initial character range of the target paragraph, character comparison is carried out on the OCR recognition result corresponding to the first substring and characters in the initial character range; when the substring obtained by character comparison is smaller than the first sentence threshold value, the substring obtained by current character comparison is used as the first sentence in the target paragraph; and traversing each paragraph of the description text, and determining the first sentence of each paragraph of the OCR recognition result.
In an optional embodiment, when the OCR recognition result is compared with the target paragraph to find out all the continuous common substrings, and after the first substring is selected, when the first substring is within the end character range of the target paragraph, the time stamp of the OCR recognition result corresponding to the first substring is used as the start time of the next paragraph of the target paragraph.
In the embodiment of the application, character comparison is carried out on an OCR recognition result and a target paragraph, all continuous common substrings are found out, and a tail substring is selected and used for representing the last continuous common substring; when the tail string is in the ending character range in the target paragraph, character comparison is carried out on the OCR recognition result corresponding to the tail string and characters in the ending character range; when the substring obtained by character comparison is smaller than the clause threshold, the substring obtained by current character comparison is used as the clause in the target paragraph; and traversing each paragraph of the description text to determine the tail sentence of each paragraph of the OCR recognition result.
In an optional embodiment, the OCR recognition result is compared with the target paragraph to find out all the continuous common substrings, and after the tail substring is selected, when the tail substring is within the range of the starting character in the target paragraph, the time stamp of the OCR recognition result corresponding to the tail substring is used as the ending time of the segment on the target paragraph.
And 105, determining the duration of each paragraph of the description text in the original video according to the corresponding time stamp of the head sentence and the tail sentence of each paragraph.
And when the timestamp corresponding to the first sentence is overlapped with the timestamp corresponding to the last sentence, taking the duration after the time ranges are combined as an output result.
As shown in fig. 5, a flow of a video subtitle matching method based on a common substring algorithm according to an alternative embodiment of the present application is given below, where when a starting character range and an ending character range are both 25 characters, a first sentence threshold and a last sentence threshold are both 4 characters, are set in the common substring algorithm:
(1) inputting a complete video, taking a frame every second, intercepting a caption area every frame, inputting OCR recognition, checking whether Chinese is contained in an output result of the OCR and the confidence coefficient is more than 0.99, and storing all historical OCR results for duplication removal.
(2) Matching the OCR recognition result with the current text paragraph requires checking if the OCR result is in the text and determining the specific location. Here a common sub-string algorithm is used for matching,
(3) for the output result of the common substring algorithm, only the first and the last substring are taken, then the positions of the two common substrings in the text are respectively searched, if the two substrings are not in the range of the first 25 characters or the last 25 characters of the text, the OCR result is considered to be useless, and the OCR result is discarded;
(4) if the first sub-string is in the range of 25 characters at the beginning of the text, the OCR result is considered to be possibly useful, the OCR result and the 25 characters at the beginning of the text are further subjected to common sub-string, then the first common sub-string is taken, and if the initial position of the first common sub-string in the 25 characters at the beginning of the text is less than 4, the OCR result is considered to be matched with the first sentence of the text;
(5) similarly to (4), if the last sub-string is within 25 characters of the end of the text, the OCR result is considered to be possibly useful, the OCR result and the 25 characters of the end of the text are further evaluated to be a common sub-string, then the last common sub-string is taken, and if the end distance from the end position of the text is less than 4, the OCR result is considered to be matched with the last sentence of the text;
(6) on the basis of (4) and (5), if the first sentence of the text is matched, recording the current time as the starting time, and if the last sentence of the text is matched, reading the next text; if the beginning of the second text is matched, the previous text is considered to be ended, and the current time is recorded as the ending time of the previous text and is also the starting time of the current text.
(7) And for the final result, performing a post-processing process, merging completely repeated contents, and merging a plurality of time ranges corresponding to the same text to obtain a final output result.
In conclusion, the method realizes the video subtitle time alignment task based on the common substring algorithm, has higher robustness, and can well process the situations that OCR is interfered by video background, wrongly written or mispronounced words and rarely-used words.
Referring to fig. 6, a block diagram of a video subtitle time alignment system 200 according to an embodiment of the present application is shown. As shown in fig. 6, the system 200 may include: the system comprises an acquisition module 201, an interception module 202, a recognition module 203, a matching module 204 and a determination module 205.
An obtaining module 201, configured to obtain an original video with subtitles and a description text, where content of the description text corresponds to content of the subtitles in the original video;
the capturing module 202 is configured to capture a subtitle region in an original video according to a preset frame capture interval to obtain a subtitle region image set, where the subtitle region image set includes a corresponding timestamp in the original video;
the recognition module 203 is used for inputting the subtitle region image set into an OCR recognition model for OCR recognition to obtain an OCR recognition result with a time stamp;
the matching module 204 is used for matching the OCR recognition result with each paragraph of the description text through a common substring algorithm to determine a head sentence and a tail sentence of the OCR recognition result in each paragraph;
and the determining module 205 is configured to determine the duration of each paragraph of the description text in the original video according to the timestamp corresponding to the beginning and the end of each paragraph.
For specific limitations of the video subtitle time alignment system, reference may be made to the above limitations of the video subtitle time alignment method, which is not described herein again. The modules in the video subtitle time alignment system may be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment of the present application, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned video subtitle time alignment method.
The implementation principle and technical effect of the computer-readable storage medium provided by this embodiment are similar to those of the above-described method embodiment, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in M forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (SyMchliMk) DRAM (SLDRAM), RaMbus (RaMbus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the claims. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for video subtitle time alignment, the method comprising:
acquiring an original video with subtitles and a description text, wherein the content of the description text corresponds to the content of the subtitles in the original video;
intercepting a subtitle region in the original video according to a preset frame taking interval time to obtain a subtitle region image set, wherein the subtitle region image set comprises a corresponding timestamp in the original video;
inputting the subtitle region image set into an OCR recognition model for OCR recognition to obtain an OCR recognition result with a time stamp;
matching the OCR recognition result with each paragraph of the description text through a common substring algorithm, and determining a head sentence and a tail sentence of the OCR recognition result in each paragraph;
and determining the duration of each paragraph of the description text in the original video according to the timestamp corresponding to the first sentence and the last sentence of each paragraph.
2. The method of claim 1, wherein matching the OCR recognition result with each paragraph of the description text by a common substring algorithm to determine a first sentence of each paragraph of the OCR recognition result comprises:
comparing the OCR recognition result with the target paragraph to find out all continuous common substrings, and selecting a first substring from the common substrings, wherein the first substring is used for representing a first continuous common substring;
when the first sub-string is in a starting character range in the target paragraph, performing character comparison on an OCR recognition result corresponding to the first sub-string and characters in the starting character range;
when the substring obtained by character comparison is smaller than the first sentence threshold value, the substring obtained by current character comparison is used as the first sentence in the target paragraph;
and traversing each paragraph of the description text, and determining the first sentence of each paragraph of the OCR recognition result.
3. The method of claim 2, wherein the character comparison of the OCR recognition result with the target paragraph to find all the consecutive common substrings, and after selecting the first substring, further comprising:
and when the first substring is in the end character range of the target paragraph, taking the time stamp of the OCR recognition result corresponding to the first substring as the starting time of the next paragraph of the target paragraph.
4. The method of claim 1, wherein matching the OCR recognition result with each paragraph of the description text by a common substring algorithm to determine a final sentence of each paragraph of the OCR recognition result comprises:
comparing the OCR recognition result with the target paragraph to find out all continuous common substrings, and selecting a tail substring from the common substrings, wherein the tail substring is used for representing the last continuous common substring;
when the tail string is in the ending character range in the target paragraph, performing character comparison on the OCR recognition result corresponding to the tail string and characters in the ending character range;
when the substring obtained by character comparison is smaller than the clause threshold, the substring obtained by current character comparison is used as the clause in the target paragraph;
and traversing each paragraph of the description text, and determining the tail sentence of the OCR recognition result in each paragraph.
5. The method of claim 4, wherein character-comparing the OCR recognition result with the target paragraph to find all the consecutive common substrings, and after selecting the tail substring, further comprising:
and when the tail substring is in the range of the initial characters in the target paragraph, taking the time stamp of the OCR recognition result corresponding to the tail substring as the end time of a section on the target paragraph.
6. The method of claim 1, wherein inputting the set of images of the subtitle region into an OCR recognition model for OCR recognition to obtain a time-stamped OCR recognition result comprises:
and checking the OCR recognition result, and matching and storing the OCR recognition result which contains Chinese and has the confidence coefficient larger than a preset threshold value.
7. The method of claim 1, wherein the corresponding duration of the descriptive text in the original video is determined according to the timestamps corresponding to the first sentence and the last sentence of the descriptive text, respectively, and further comprising:
and when the timestamp corresponding to the first sentence is overlapped with the timestamp corresponding to the last sentence, taking the duration after the time ranges are combined as an output result.
8. The method according to claim 1, wherein the description text includes wrongly written words and/or uncommon words.
9. A video subtitle time alignment system, the system comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an original video with subtitles and a description text, and the content of the description text corresponds to the content of the subtitles in the original video;
the capturing module is used for capturing a subtitle region in the original video according to a preset frame capturing interval time to obtain a subtitle region image set, wherein the subtitle region image set comprises a corresponding timestamp in the original video;
the recognition module is used for inputting the subtitle region image set into an OCR recognition model for OCR recognition to obtain an OCR recognition result with a time stamp;
the matching module is used for matching the OCR recognition result with each paragraph of the description text through a common substring algorithm to determine a head sentence and a tail sentence of the OCR recognition result in each paragraph;
and the determining module is used for determining the duration of each paragraph of the description text in the original video according to the timestamp corresponding to the head sentence and the tail sentence of each paragraph.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the video subtitle time alignment method according to any one of claims 1 to 8.
CN202111470116.8A 2021-12-03 2021-12-03 Video subtitle time alignment method, system and storage medium Active CN114143613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111470116.8A CN114143613B (en) 2021-12-03 2021-12-03 Video subtitle time alignment method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111470116.8A CN114143613B (en) 2021-12-03 2021-12-03 Video subtitle time alignment method, system and storage medium

Publications (2)

Publication Number Publication Date
CN114143613A true CN114143613A (en) 2022-03-04
CN114143613B CN114143613B (en) 2023-07-21

Family

ID=80387606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111470116.8A Active CN114143613B (en) 2021-12-03 2021-12-03 Video subtitle time alignment method, system and storage medium

Country Status (1)

Country Link
CN (1) CN114143613B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833638A (en) * 2012-07-26 2012-12-19 北京数视宇通技术有限公司 Automatic video segmentation and annotation method and system based on caption information
CN104980790A (en) * 2015-06-30 2015-10-14 北京奇艺世纪科技有限公司 Voice subtitle generating method and apparatus, and playing method and apparatus
CN104978961A (en) * 2015-05-25 2015-10-14 腾讯科技(深圳)有限公司 Audio processing method, device and terminal
CN108268539A (en) * 2016-12-31 2018-07-10 上海交通大学 Video matching system based on text analyzing
CN108683924A (en) * 2018-05-30 2018-10-19 北京奇艺世纪科技有限公司 A kind of method and apparatus of video processing
CN109803173A (en) * 2017-11-16 2019-05-24 腾讯科技(深圳)有限公司 A kind of video transcoding method, device and storage equipment
CN113052169A (en) * 2021-03-15 2021-06-29 北京小米移动软件有限公司 Video subtitle recognition method, device, medium, and electronic device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833638A (en) * 2012-07-26 2012-12-19 北京数视宇通技术有限公司 Automatic video segmentation and annotation method and system based on caption information
CN104978961A (en) * 2015-05-25 2015-10-14 腾讯科技(深圳)有限公司 Audio processing method, device and terminal
CN104980790A (en) * 2015-06-30 2015-10-14 北京奇艺世纪科技有限公司 Voice subtitle generating method and apparatus, and playing method and apparatus
CN108268539A (en) * 2016-12-31 2018-07-10 上海交通大学 Video matching system based on text analyzing
CN109803173A (en) * 2017-11-16 2019-05-24 腾讯科技(深圳)有限公司 A kind of video transcoding method, device and storage equipment
CN108683924A (en) * 2018-05-30 2018-10-19 北京奇艺世纪科技有限公司 A kind of method and apparatus of video processing
CN113052169A (en) * 2021-03-15 2021-06-29 北京小米移动软件有限公司 Video subtitle recognition method, device, medium, and electronic device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XU-CHENG YIN,ZE-YU ZUO.ET.AL.: "Text Detection, Tracking and Recognition in Video: A Comprehensive Survey", 《 IEEE TRANSACTIONS ON IMAGE PROCESSING》, pages 2752 *
李娟: "图像及视频文字检测提取算法", 《中国优秀硕士论文全文数据库电子期刊》 *

Also Published As

Publication number Publication date
CN114143613B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN112995754B (en) Subtitle quality detection method and device, computer equipment and storage medium
CN113052169A (en) Video subtitle recognition method, device, medium, and electronic device
US11776248B2 (en) Systems and methods for automated document image orientation correction
CN111813998B (en) Video data processing method, device, equipment and storage medium
CN108989875B (en) Method and device for generating bullet screen file
CN112995749A (en) Method, device and equipment for processing video subtitles and storage medium
JP2008077454A (en) Title extraction device, image reading device, title extraction method, and title extraction program
CN111222409A (en) Vehicle brand labeling method, device and system
US20220189174A1 (en) A method and system for matching clips with videos via media analysis
CN109919017B (en) Face recognition optimization method, device, computer equipment and storage medium
CN111368061B (en) Short text filtering method, device, medium and computer equipment
CN109656474B (en) Data storage method and device, computer equipment and storage medium
CN114143613B (en) Video subtitle time alignment method, system and storage medium
CN109657210B (en) Text accuracy rate calculation method and device based on semantic analysis and computer equipment
CN115686455A (en) Application development method, device and equipment based on spreadsheet and storage medium
CN114298060A (en) Subtitle translation quality detection method, device, equipment and medium
CN114222193B (en) Video subtitle time alignment model training method and system
CN113449655A (en) Method and device for recognizing cover image, storage medium and recognition equipment
CN109525890B (en) MV subtitle transplanting method and device based on subtitle recognition
CN112463791A (en) Nuclear power station document data acquisition method and device, computer equipment and storage medium
CN108133214B (en) Information search method based on picture correction and mobile terminal
CN112417847A (en) News content safety monitoring method, system, device and storage medium
CN109710904B (en) Text accuracy rate calculation method and device based on semantic analysis and computer equipment
CN110717091B (en) Entry data expansion method and device based on face recognition
CN102591852A (en) Automatic typesetting method and automatic typesetting system for patent images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A video subtitle time alignment method, system, and storage medium

Effective date of registration: 20231113

Granted publication date: 20230721

Pledgee: Shanghai Pudong Development Bank Co.,Ltd. Xuhui sub branch

Pledgor: BEIJING MOVIEBOOK SCIENCE AND TECHNOLOGY Co.,Ltd.|Beijing qingmou Management Consulting Co.,Ltd.|Shanghai Yingpu Technology Co.,Ltd.

Registration number: Y2023310000727

PE01 Entry into force of the registration of the contract for pledge of patent right