CN112911373A - Method, device and equipment for generating video subtitles and storage medium - Google Patents
Method, device and equipment for generating video subtitles and storage medium Download PDFInfo
- Publication number
- CN112911373A CN112911373A CN202110132044.XA CN202110132044A CN112911373A CN 112911373 A CN112911373 A CN 112911373A CN 202110132044 A CN202110132044 A CN 202110132044A CN 112911373 A CN112911373 A CN 112911373A
- Authority
- CN
- China
- Prior art keywords
- subtitle
- video
- generating
- style
- topic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000008929 regeneration Effects 0.000 claims abstract description 12
- 238000011069 regeneration method Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 11
- 230000004044 response Effects 0.000 claims abstract description 4
- 230000011218 segmentation Effects 0.000 claims description 63
- 239000013598 vector Substances 0.000 claims description 62
- 238000001514 detection method Methods 0.000 claims description 8
- 230000001172 regenerating effect Effects 0.000 claims description 6
- 230000006798 recombination Effects 0.000 claims description 4
- 238000005215 recombination Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Studio Circuits (AREA)
Abstract
The invention relates to a method, a device, equipment and a storage medium for generating video subtitles, wherein the method comprises the following steps: in response to a monitored subtitle regeneration instruction, intercepting a subtitle picture according to a subtitle position in a video; extracting a subtitle background from the subtitle picture; inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style; and superposing the captions with the target style and the caption background, and splicing the captions and the caption background into the video for display, so that real-time and dynamic display in a style required by a user is realized, the video can be suitable for different users, and the adaptability of the video is improved.
Description
Technical Field
The invention relates to the technical field of video playing, in particular to a method, a device, equipment and a storage medium for generating video subtitles.
Background
As an important medium for information transfer, video plays an important role in human life. In general, most videos are configured with subtitles, and the subtitles are displayed in the videos while the videos are played.
In the prior art, subtitles in a video are usually displayed in a fixed form in the video, and for some users, the subtitles in the video may not be watched any more because the users are not interested in the video, or the video is evaluated to be low, so that the playing rate of the video is affected. Therefore, how to implement personalized setting of video subtitles and improve the adaptability of videos is a technical problem to be urgently solved by technical personnel in the field.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for generating video subtitles, which can solve the technical problem of low video adaptability caused by the fact that the video subtitles cannot be set individually.
The technical scheme for solving the technical problems is as follows:
a method for generating video subtitles comprises the following steps:
in response to a monitored subtitle regeneration instruction, intercepting a subtitle picture according to a subtitle position in a video;
extracting a subtitle background from the subtitle picture;
inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style;
and overlapping the caption with the target style and the caption background, and splicing the overlapped caption and the caption background into the video for displaying.
Further, in the above method for generating a video caption, inputting caption content in a video into a pre-trained multi-style caption generating model for processing to obtain a caption with a target style, the method includes:
encoding the subtitle content by using an encoder of the multi-style subtitle generating model to obtain a subtitle vector, and recombining the subtitle vector with a preset topic word segmentation characteristic vector to obtain a recombined vector;
and inputting the recombination vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the target style subtitle.
Further, in the above method for generating a video caption, the topic segmentation feature vector is set as follows:
extracting the topic word segmentation feature vector from preset topic words and setting the topic word segmentation feature vector;
extracting the topic word segmentation feature vector from a user-defined topic word segmentation, and setting the topic word segmentation feature vector; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or the user-defined topic segmentation is obtained by self-creation of a user in a self-creation mode.
Further, in the above method for generating video subtitles, the subtitle position in the video is obtained as follows:
if the type of the video is a plug-in subtitle video, extracting a subtitle file from the plug-in subtitle video, analyzing the subtitle file, and acquiring the position of the subtitle;
and if the type of the video is the embedded subtitle video, taking the preset position of the embedded subtitle video as the subtitle position, or acquiring the subtitle position by utilizing a pre-trained text detection model.
Further, in the above method for generating video subtitles, the subtitle content in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, and analyzing the subtitle files to obtain subtitle content;
and if the type of the video is the embedded subtitle video, acquiring the subtitle content by utilizing a pre-trained text detection model.
The present invention also provides a device for generating video subtitles, comprising:
the intercepting module is used for responding to the monitored subtitle regeneration instruction and intercepting a subtitle picture according to the position of a subtitle in a video;
the extraction module is used for extracting the subtitle background from the subtitle picture;
the subtitle regenerating module is used for inputting subtitle contents in the video into a multi-style subtitle generating model trained in advance to be processed to obtain subtitles in a target style;
and the splicing module is used for overlapping the caption with the target style and the caption background and splicing the overlapped caption with the video for display.
Further, in the above apparatus for generating a video subtitle, the subtitle regenerating module is specifically configured to:
encoding the subtitle content by using an encoder of the multi-style subtitle generating model to obtain a subtitle vector, and recombining the subtitle vector with a preset topic word segmentation characteristic vector to obtain a recombined vector;
and inputting the recombination vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the target style subtitle.
Further, in the above apparatus for generating a video caption, the topic segmentation feature vector is set as follows:
extracting the topic word segmentation feature vector from preset topic words and setting the topic word segmentation feature vector;
extracting the topic word segmentation feature vector from a user-defined topic word segmentation, and setting the topic word segmentation feature vector; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or the user-defined topic segmentation is obtained by self-creation of a user in a self-creation mode.
The present invention also provides a device for generating a video subtitle, comprising: a processor and a memory;
the processor is configured to execute a program for generating a video subtitle stored in the memory, so as to implement the method for generating a video subtitle according to any one of the above descriptions.
The present invention also provides a storage medium storing one or more programs that when executed implement any of the above-described methods for generating video subtitles.
The invention has the beneficial effects that:
intercepting a subtitle picture according to a subtitle position in a video by responding to a monitored subtitle regeneration instruction; extracting a caption background from a caption picture; inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style; the target style subtitles and the subtitle background are overlapped and spliced into the video for display, real-time and dynamic display in the style required by the user is achieved, the video can be suitable for different users, and the adaptability of the video is improved.
Drawings
Fig. 1 is a flowchart of a method for generating a video subtitle according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a video subtitle generating apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a video subtitle generating apparatus according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a flowchart of a method for generating a video subtitle according to an embodiment of the present invention, and as shown in fig. 1, the method for generating a video subtitle according to the embodiment may specifically include the following steps:
100. in response to a monitored subtitle regeneration instruction, intercepting a subtitle picture according to a subtitle position in a video;
in a specific implementation process, when a user watches a video, if the subtitle in the video cannot meet the requirement of the user, a subtitle regeneration instruction can be input, so that the user can respond after receiving the subtitle regeneration instruction and intercept a subtitle picture according to the position of the subtitle in the video.
In practical applications, the type of the video may be plug-in subtitle video or embedded subtitle video. Therefore, in this embodiment, the subtitle position in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, analyzing the subtitle files and obtaining the position of the subtitle;
if the type of the video is the embedded subtitle video, the subtitle and the video are integrated, and the subtitle position cannot be extracted, but the subtitle position in the embedded subtitle video is usually a fixed position, so the preset position of the embedded subtitle video can be used as the subtitle position in the video. If the position of the subtitle in the embedded subtitle video is not a fixed position, a text in the video can be detected by using a pre-trained text detection model, and the position of the subtitle in the video is further obtained.
After the subtitle position in the video is obtained, the subtitle picture can be intercepted according to the subtitle position in the video by utilizing an image interception technology.
101. Extracting a caption background from the intercepted caption picture;
in this embodiment, the subtitles and the subtitle backgrounds in the picture can be separated, and the picture with the subtitle background removed is obtained.
102. Inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style;
in a specific implementation process, the subtitle content in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, analyzing the subtitle files, and acquiring subtitle content in the video;
and if the type of the video is the embedded subtitle video, acquiring subtitle content in the video by using a pre-trained text detection model. For example, subtitle content in a video may be acquired based on Optical Character Recognition (OCR) techniques.
In a specific implementation process, the multi-style subtitle generating model of the embodiment can be obtained after training based on the confrontation generating network in advance, so that after subtitle content in a video is obtained, a coder of the multi-style subtitle generating model can be used for coding the subtitle content to obtain a subtitle vector, the subtitle vector is recombined with a preset topic word segmentation feature vector to obtain a recombined vector, the recombined vector is input into the confrontation generating network corresponding to the multi-style subtitle generating model to obtain a target style subtitle, and therefore the subtitle in the video is more personalized and presented to a user, and more unique experience is brought to a video appreciator. For example, in a children video, cartoon-style subtitles can be generated in a personalized manner, and the effect brought by the video is improved.
In some embodiments, a topic segmentation feature vector may be extracted from preset topic segmentation, and the topic segmentation feature vector may be set.
In some embodiments, in order to further meet the requirements of different users, a topic segmentation feature vector can be extracted from the custom topic segmentation, and the topic segmentation feature vector is set. Specifically, for the preset topic segmentation, only part of the styles may not meet the user requirements, and thus, the user only needs to adjust the preset topic segmentation by a small amount to meet the user requirements.
In some embodiments, the user may also create the custom topic participle by himself, specifically, the user triggers a self-creation instruction, and the user self-creates the custom topic participle in a self-creation mode. For example, the user may use his/her drawing as a subtitle style, and in the self-creation mode, the user uploads his/her drawing as a defined topic word segmentation, and sets a topic word segmentation feature vector after extracting the topic word segmentation feature vector from the defined topic word segmentation feature vector.
103. And overlapping the captions and the caption backgrounds in the target style, and splicing the captions and the caption backgrounds into a video for displaying.
After the captions in the target style are obtained, the captions in the target style and the caption background can be overlapped to form an image containing personalized captions, and the image is spliced into the video to be displayed, so that the captions in the video can be displayed in a style required by a user in a real-time and dynamic mode, and the phenomenon that the user does not watch the video any more because the user does not interest the captions in the video or the phenomenon that the play rate of the video is influenced because the evaluation of the video is low is reduced.
It should be noted that, in this embodiment, it is preferable to generate the target-style subtitles at the video playing end, so that network transmission overhead consumption is reduced, influence of network bandwidth and delay on the video playing end and the remote control end in a certain degree is avoided, and stability of generating the target-style subtitles is improved.
In the method for generating the video subtitles according to the embodiment, the subtitle picture is captured according to the subtitle position in the video by responding to the monitored subtitle regeneration instruction; extracting a caption background from a caption picture; inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style; the target style subtitles and the subtitle background are overlapped and spliced into the video for display, real-time and dynamic display in the style required by the user is achieved, the video can be suitable for different users, and the adaptability of the video is improved.
In a specific implementation process, when a user watches a video, identity information of the user can be recorded through a camera, a fingerprint identification component and the like, and subtitles of a target style used by the user are associated with the identity information of the user, so that subtitle libraries of different users are established, and therefore subtitles frequently used by the user can be directly called from different subtitle libraries when the user watches the video next time.
Fig. 2 is a schematic structural diagram of a video subtitle generating apparatus according to an embodiment of the present invention, and as shown in fig. 2, the video subtitle generating apparatus according to this embodiment may specifically include an intercepting module 20, an extracting module 21, a subtitle regenerating module 22, and a splicing module 23.
The intercepting module 20 is configured to respond to the monitored subtitle regeneration instruction and intercept a subtitle picture according to a subtitle position in the video;
in this embodiment, the subtitle position in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, analyzing the subtitle files and obtaining the position of the subtitle;
and if the type of the video is the embedded subtitle video, taking the preset position of the embedded subtitle video as the subtitle position, or acquiring the subtitle position by using a pre-trained text detection model.
The extraction module 21 is configured to extract a subtitle background from a subtitle picture;
the subtitle regenerating module 22 is used for inputting the subtitle content in the video into a multi-style subtitle generating model trained in advance for processing to obtain the subtitle with the target style;
in this embodiment, the subtitle content in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, analyzing the subtitle files and obtaining subtitle content;
and if the type of the video is the embedded subtitle video, acquiring subtitle content by using a pre-trained text detection model.
In a specific implementation process, a coder of a multi-style caption generating model can be used for coding caption contents to obtain a caption vector, and the caption vector is recombined with a preset topic word segmentation feature vector to obtain a recombined vector; and inputting the recombined vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the subtitle with the target style.
In this embodiment, the topic segmentation feature vector is set as follows:
extracting topic word segmentation feature vectors from preset topic words, and setting the topic word segmentation feature vectors;
extracting topic word segmentation feature vectors from the user-defined topic word segmentation, and setting the topic word segmentation feature vectors; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or the user-defined topic segmentation is obtained by self-creation of a user in a self-creation mode.
And the splicing module 23 is configured to superimpose the subtitles in the target style and the subtitle backgrounds, and splice the superimposed subtitles and the subtitle backgrounds into a video for display.
The video subtitle generating device of the embodiment intercepts a subtitle picture according to a subtitle position in a video by responding to a monitored subtitle regeneration instruction; extracting a caption background from a caption picture; inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style; the target style subtitles and the subtitle background are overlapped and spliced into the video for display, real-time and dynamic display in the style required by the user is achieved, the video can be suitable for different users, and the adaptability of the video is improved.
Fig. 3 is a schematic structural diagram of a video subtitle generating device according to an embodiment of the present invention, and as shown in fig. 3, the passing device according to this embodiment may include: a processor 1010 and a memory 1020. Those skilled in the art will appreciate that the device may also include input/output interface 1030, communication interface 1040, and bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The present invention also provides a storage medium storing one or more programs that when executed implement the video subtitle generating method of the above-described embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A method for generating a video subtitle, comprising:
in response to a monitored subtitle regeneration instruction, intercepting a subtitle picture according to a subtitle position in a video;
extracting a subtitle background from the subtitle picture;
inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style;
and overlapping the caption with the target style and the caption background, and splicing the overlapped caption and the caption background into the video for displaying.
2. The method for generating video subtitles according to claim 1, wherein inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles of a target style, comprises:
encoding the subtitle content by using an encoder of the multi-style subtitle generating model to obtain a subtitle vector, and recombining the subtitle vector with a preset topic word segmentation characteristic vector to obtain a recombined vector;
and inputting the recombination vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the target style subtitle.
3. The method of claim 2, wherein the topic segmentation feature vector is set as follows:
extracting the topic word segmentation feature vector from preset topic words and setting the topic word segmentation feature vector;
extracting the topic word segmentation feature vector from a user-defined topic word segmentation, and setting the topic word segmentation feature vector; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or the user-defined topic segmentation is obtained by self-creation of a user in a self-creation mode.
4. The method for generating video subtitles according to claim 1, wherein a subtitle position in the video is obtained as follows:
if the type of the video is a plug-in subtitle video, extracting a subtitle file from the plug-in subtitle video, analyzing the subtitle file, and acquiring the position of the subtitle;
and if the type of the video is the embedded subtitle video, taking the preset position of the embedded subtitle video as the subtitle position, or acquiring the subtitle position by utilizing a pre-trained text detection model.
5. The method for generating video subtitles according to claim 1, wherein subtitle content in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, and analyzing the subtitle files to obtain subtitle content;
and if the type of the video is the embedded subtitle video, acquiring the subtitle content by utilizing a pre-trained text detection model.
6. An apparatus for generating a video subtitle, comprising:
the intercepting module is used for responding to the monitored subtitle regeneration instruction and intercepting a subtitle picture according to the position of a subtitle in a video;
the extraction module is used for extracting the subtitle background from the subtitle picture;
the subtitle regenerating module is used for inputting subtitle contents in the video into a multi-style subtitle generating model trained in advance to be processed to obtain subtitles in a target style;
and the splicing module is used for overlapping the caption with the target style and the caption background and splicing the overlapped caption with the video for display.
7. The apparatus for generating video subtitles according to claim 6, wherein the subtitle regenerating module is specifically configured to:
encoding the subtitle content by using an encoder of the multi-style subtitle generating model to obtain a subtitle vector, and recombining the subtitle vector with a preset topic word segmentation characteristic vector to obtain a recombined vector;
and inputting the recombination vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the target style subtitle.
8. The apparatus for generating a video subtitle according to claim 7, wherein the topic segmentation feature vector is set as follows:
extracting the topic word segmentation feature vector from preset topic words and setting the topic word segmentation feature vector;
extracting the topic word segmentation feature vector from a user-defined topic word segmentation, and setting the topic word segmentation feature vector; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or the user-defined topic segmentation is obtained by self-creation of a user in a self-creation mode.
9. A video subtitle generating apparatus, comprising: a processor and a memory;
the processor is configured to execute a program for generating a video subtitle stored in the memory to implement the method for generating a video subtitle according to any one of claims 1-5.
10. A storage medium storing one or more programs which, when executed, implement the method for generating video subtitles according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110132044.XA CN112911373B (en) | 2021-01-31 | 2021-01-31 | Video subtitle generating method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110132044.XA CN112911373B (en) | 2021-01-31 | 2021-01-31 | Video subtitle generating method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112911373A true CN112911373A (en) | 2021-06-04 |
CN112911373B CN112911373B (en) | 2023-05-26 |
Family
ID=76121994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110132044.XA Active CN112911373B (en) | 2021-01-31 | 2021-01-31 | Video subtitle generating method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112911373B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115952255A (en) * | 2022-11-21 | 2023-04-11 | 北京邮电大学 | Multi-modal signal content analysis method and device, electronic equipment and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104715497A (en) * | 2014-12-30 | 2015-06-17 | 上海孩子国科教设备有限公司 | Data replacement method and system |
CN105871681A (en) * | 2015-12-14 | 2016-08-17 | 乐视网信息技术(北京)股份有限公司 | Subtitle adding method and device |
US20190340469A1 (en) * | 2017-03-20 | 2019-11-07 | Intel Corporation | Topic-guided model for image captioning system |
CN110458918A (en) * | 2019-08-16 | 2019-11-15 | 北京百度网讯科技有限公司 | Method and apparatus for output information |
CN110866377A (en) * | 2018-08-08 | 2020-03-06 | 北京优酷科技有限公司 | Text content conversion method and device |
CN111402367A (en) * | 2020-03-27 | 2020-07-10 | 维沃移动通信有限公司 | Image processing method and electronic equipment |
CN111639474A (en) * | 2020-05-26 | 2020-09-08 | 维沃移动通信有限公司 | Document style reconstruction method and device and electronic equipment |
CN112055245A (en) * | 2020-09-11 | 2020-12-08 | 海信视像科技股份有限公司 | Color subtitle realization method and display device |
CN112084841A (en) * | 2020-07-27 | 2020-12-15 | 齐鲁工业大学 | Cross-modal image multi-style subtitle generation method and system |
-
2021
- 2021-01-31 CN CN202110132044.XA patent/CN112911373B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104715497A (en) * | 2014-12-30 | 2015-06-17 | 上海孩子国科教设备有限公司 | Data replacement method and system |
CN105871681A (en) * | 2015-12-14 | 2016-08-17 | 乐视网信息技术(北京)股份有限公司 | Subtitle adding method and device |
US20190340469A1 (en) * | 2017-03-20 | 2019-11-07 | Intel Corporation | Topic-guided model for image captioning system |
CN110866377A (en) * | 2018-08-08 | 2020-03-06 | 北京优酷科技有限公司 | Text content conversion method and device |
CN110458918A (en) * | 2019-08-16 | 2019-11-15 | 北京百度网讯科技有限公司 | Method and apparatus for output information |
CN111402367A (en) * | 2020-03-27 | 2020-07-10 | 维沃移动通信有限公司 | Image processing method and electronic equipment |
CN111639474A (en) * | 2020-05-26 | 2020-09-08 | 维沃移动通信有限公司 | Document style reconstruction method and device and electronic equipment |
CN112084841A (en) * | 2020-07-27 | 2020-12-15 | 齐鲁工业大学 | Cross-modal image multi-style subtitle generation method and system |
CN112055245A (en) * | 2020-09-11 | 2020-12-08 | 海信视像科技股份有限公司 | Color subtitle realization method and display device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115952255A (en) * | 2022-11-21 | 2023-04-11 | 北京邮电大学 | Multi-modal signal content analysis method and device, electronic equipment and storage medium |
CN115952255B (en) * | 2022-11-21 | 2023-12-05 | 北京邮电大学 | Multi-mode signal content analysis method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112911373B (en) | 2023-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109803180B (en) | Video preview generation method and device, computer equipment and storage medium | |
CN110557678B (en) | Video processing method, device and equipment | |
CN107801096B (en) | Video playing control method and device, terminal equipment and storage medium | |
CN110177295B (en) | Subtitle out-of-range processing method and device and electronic equipment | |
KR101916874B1 (en) | Apparatus, method for auto generating a title of video contents, and computer readable recording medium | |
CN111343496A (en) | Video processing method and device | |
CN108521612B (en) | Video abstract generation method, device, server and storage medium | |
CN110876079B (en) | Video processing method, device and equipment | |
US20230291978A1 (en) | Subtitle processing method and apparatus of multimedia file, electronic device, and computer-readable storage medium | |
CN106303303A (en) | Method and device for translating subtitles of media file and electronic equipment | |
US10897658B1 (en) | Techniques for annotating media content | |
US20170147170A1 (en) | Method for generating a user interface presenting a plurality of videos | |
CN106408623A (en) | Character presentation method, device and terminal | |
CN114598893A (en) | Text video implementation method and system, electronic equipment and storage medium | |
CN112422844A (en) | Method, device and equipment for adding special effect in video and readable storage medium | |
CN112911373A (en) | Method, device and equipment for generating video subtitles and storage medium | |
US20170193668A1 (en) | Intelligent Equipment-Based Motion Sensing Control Method, Electronic Device and Intelligent Equipment | |
CN107197339B (en) | Display control method and device of film bullet screen and head-mounted display equipment | |
US20200057890A1 (en) | Method and device for determining inter-cut time range in media item | |
CN112511897A (en) | Video cover setting method, device, equipment and storage medium | |
CN111918074A (en) | Live video fault early warning method and related equipment | |
US20160142456A1 (en) | Method and Device for Acquiring Media File | |
CN112908337B (en) | Method, device, equipment and storage medium for displaying voice recognition text | |
CN114268847A (en) | Video playing method and device, electronic equipment and storage medium | |
KR101403159B1 (en) | Apparatus and method for providing additional information about object |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |