CN112911373A - Method, device and equipment for generating video subtitles and storage medium - Google Patents

Method, device and equipment for generating video subtitles and storage medium Download PDF

Info

Publication number
CN112911373A
CN112911373A CN202110132044.XA CN202110132044A CN112911373A CN 112911373 A CN112911373 A CN 112911373A CN 202110132044 A CN202110132044 A CN 202110132044A CN 112911373 A CN112911373 A CN 112911373A
Authority
CN
China
Prior art keywords
subtitle
video
generating
style
topic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110132044.XA
Other languages
Chinese (zh)
Other versions
CN112911373B (en
Inventor
张晋
刘青松
梁家恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202110132044.XA priority Critical patent/CN112911373B/en
Publication of CN112911373A publication Critical patent/CN112911373A/en
Application granted granted Critical
Publication of CN112911373B publication Critical patent/CN112911373B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Studio Circuits (AREA)

Abstract

The invention relates to a method, a device, equipment and a storage medium for generating video subtitles, wherein the method comprises the following steps: in response to a monitored subtitle regeneration instruction, intercepting a subtitle picture according to a subtitle position in a video; extracting a subtitle background from the subtitle picture; inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style; and superposing the captions with the target style and the caption background, and splicing the captions and the caption background into the video for display, so that real-time and dynamic display in a style required by a user is realized, the video can be suitable for different users, and the adaptability of the video is improved.

Description

Method, device and equipment for generating video subtitles and storage medium
Technical Field
The invention relates to the technical field of video playing, in particular to a method, a device, equipment and a storage medium for generating video subtitles.
Background
As an important medium for information transfer, video plays an important role in human life. In general, most videos are configured with subtitles, and the subtitles are displayed in the videos while the videos are played.
In the prior art, subtitles in a video are usually displayed in a fixed form in the video, and for some users, the subtitles in the video may not be watched any more because the users are not interested in the video, or the video is evaluated to be low, so that the playing rate of the video is affected. Therefore, how to implement personalized setting of video subtitles and improve the adaptability of videos is a technical problem to be urgently solved by technical personnel in the field.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for generating video subtitles, which can solve the technical problem of low video adaptability caused by the fact that the video subtitles cannot be set individually.
The technical scheme for solving the technical problems is as follows:
a method for generating video subtitles comprises the following steps:
in response to a monitored subtitle regeneration instruction, intercepting a subtitle picture according to a subtitle position in a video;
extracting a subtitle background from the subtitle picture;
inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style;
and overlapping the caption with the target style and the caption background, and splicing the overlapped caption and the caption background into the video for displaying.
Further, in the above method for generating a video caption, inputting caption content in a video into a pre-trained multi-style caption generating model for processing to obtain a caption with a target style, the method includes:
encoding the subtitle content by using an encoder of the multi-style subtitle generating model to obtain a subtitle vector, and recombining the subtitle vector with a preset topic word segmentation characteristic vector to obtain a recombined vector;
and inputting the recombination vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the target style subtitle.
Further, in the above method for generating a video caption, the topic segmentation feature vector is set as follows:
extracting the topic word segmentation feature vector from preset topic words and setting the topic word segmentation feature vector;
extracting the topic word segmentation feature vector from a user-defined topic word segmentation, and setting the topic word segmentation feature vector; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or the user-defined topic segmentation is obtained by self-creation of a user in a self-creation mode.
Further, in the above method for generating video subtitles, the subtitle position in the video is obtained as follows:
if the type of the video is a plug-in subtitle video, extracting a subtitle file from the plug-in subtitle video, analyzing the subtitle file, and acquiring the position of the subtitle;
and if the type of the video is the embedded subtitle video, taking the preset position of the embedded subtitle video as the subtitle position, or acquiring the subtitle position by utilizing a pre-trained text detection model.
Further, in the above method for generating video subtitles, the subtitle content in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, and analyzing the subtitle files to obtain subtitle content;
and if the type of the video is the embedded subtitle video, acquiring the subtitle content by utilizing a pre-trained text detection model.
The present invention also provides a device for generating video subtitles, comprising:
the intercepting module is used for responding to the monitored subtitle regeneration instruction and intercepting a subtitle picture according to the position of a subtitle in a video;
the extraction module is used for extracting the subtitle background from the subtitle picture;
the subtitle regenerating module is used for inputting subtitle contents in the video into a multi-style subtitle generating model trained in advance to be processed to obtain subtitles in a target style;
and the splicing module is used for overlapping the caption with the target style and the caption background and splicing the overlapped caption with the video for display.
Further, in the above apparatus for generating a video subtitle, the subtitle regenerating module is specifically configured to:
encoding the subtitle content by using an encoder of the multi-style subtitle generating model to obtain a subtitle vector, and recombining the subtitle vector with a preset topic word segmentation characteristic vector to obtain a recombined vector;
and inputting the recombination vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the target style subtitle.
Further, in the above apparatus for generating a video caption, the topic segmentation feature vector is set as follows:
extracting the topic word segmentation feature vector from preset topic words and setting the topic word segmentation feature vector;
extracting the topic word segmentation feature vector from a user-defined topic word segmentation, and setting the topic word segmentation feature vector; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or the user-defined topic segmentation is obtained by self-creation of a user in a self-creation mode.
The present invention also provides a device for generating a video subtitle, comprising: a processor and a memory;
the processor is configured to execute a program for generating a video subtitle stored in the memory, so as to implement the method for generating a video subtitle according to any one of the above descriptions.
The present invention also provides a storage medium storing one or more programs that when executed implement any of the above-described methods for generating video subtitles.
The invention has the beneficial effects that:
intercepting a subtitle picture according to a subtitle position in a video by responding to a monitored subtitle regeneration instruction; extracting a caption background from a caption picture; inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style; the target style subtitles and the subtitle background are overlapped and spliced into the video for display, real-time and dynamic display in the style required by the user is achieved, the video can be suitable for different users, and the adaptability of the video is improved.
Drawings
Fig. 1 is a flowchart of a method for generating a video subtitle according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a video subtitle generating apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a video subtitle generating apparatus according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a flowchart of a method for generating a video subtitle according to an embodiment of the present invention, and as shown in fig. 1, the method for generating a video subtitle according to the embodiment may specifically include the following steps:
100. in response to a monitored subtitle regeneration instruction, intercepting a subtitle picture according to a subtitle position in a video;
in a specific implementation process, when a user watches a video, if the subtitle in the video cannot meet the requirement of the user, a subtitle regeneration instruction can be input, so that the user can respond after receiving the subtitle regeneration instruction and intercept a subtitle picture according to the position of the subtitle in the video.
In practical applications, the type of the video may be plug-in subtitle video or embedded subtitle video. Therefore, in this embodiment, the subtitle position in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, analyzing the subtitle files and obtaining the position of the subtitle;
if the type of the video is the embedded subtitle video, the subtitle and the video are integrated, and the subtitle position cannot be extracted, but the subtitle position in the embedded subtitle video is usually a fixed position, so the preset position of the embedded subtitle video can be used as the subtitle position in the video. If the position of the subtitle in the embedded subtitle video is not a fixed position, a text in the video can be detected by using a pre-trained text detection model, and the position of the subtitle in the video is further obtained.
After the subtitle position in the video is obtained, the subtitle picture can be intercepted according to the subtitle position in the video by utilizing an image interception technology.
101. Extracting a caption background from the intercepted caption picture;
in this embodiment, the subtitles and the subtitle backgrounds in the picture can be separated, and the picture with the subtitle background removed is obtained.
102. Inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style;
in a specific implementation process, the subtitle content in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, analyzing the subtitle files, and acquiring subtitle content in the video;
and if the type of the video is the embedded subtitle video, acquiring subtitle content in the video by using a pre-trained text detection model. For example, subtitle content in a video may be acquired based on Optical Character Recognition (OCR) techniques.
In a specific implementation process, the multi-style subtitle generating model of the embodiment can be obtained after training based on the confrontation generating network in advance, so that after subtitle content in a video is obtained, a coder of the multi-style subtitle generating model can be used for coding the subtitle content to obtain a subtitle vector, the subtitle vector is recombined with a preset topic word segmentation feature vector to obtain a recombined vector, the recombined vector is input into the confrontation generating network corresponding to the multi-style subtitle generating model to obtain a target style subtitle, and therefore the subtitle in the video is more personalized and presented to a user, and more unique experience is brought to a video appreciator. For example, in a children video, cartoon-style subtitles can be generated in a personalized manner, and the effect brought by the video is improved.
In some embodiments, a topic segmentation feature vector may be extracted from preset topic segmentation, and the topic segmentation feature vector may be set.
In some embodiments, in order to further meet the requirements of different users, a topic segmentation feature vector can be extracted from the custom topic segmentation, and the topic segmentation feature vector is set. Specifically, for the preset topic segmentation, only part of the styles may not meet the user requirements, and thus, the user only needs to adjust the preset topic segmentation by a small amount to meet the user requirements.
In some embodiments, the user may also create the custom topic participle by himself, specifically, the user triggers a self-creation instruction, and the user self-creates the custom topic participle in a self-creation mode. For example, the user may use his/her drawing as a subtitle style, and in the self-creation mode, the user uploads his/her drawing as a defined topic word segmentation, and sets a topic word segmentation feature vector after extracting the topic word segmentation feature vector from the defined topic word segmentation feature vector.
103. And overlapping the captions and the caption backgrounds in the target style, and splicing the captions and the caption backgrounds into a video for displaying.
After the captions in the target style are obtained, the captions in the target style and the caption background can be overlapped to form an image containing personalized captions, and the image is spliced into the video to be displayed, so that the captions in the video can be displayed in a style required by a user in a real-time and dynamic mode, and the phenomenon that the user does not watch the video any more because the user does not interest the captions in the video or the phenomenon that the play rate of the video is influenced because the evaluation of the video is low is reduced.
It should be noted that, in this embodiment, it is preferable to generate the target-style subtitles at the video playing end, so that network transmission overhead consumption is reduced, influence of network bandwidth and delay on the video playing end and the remote control end in a certain degree is avoided, and stability of generating the target-style subtitles is improved.
In the method for generating the video subtitles according to the embodiment, the subtitle picture is captured according to the subtitle position in the video by responding to the monitored subtitle regeneration instruction; extracting a caption background from a caption picture; inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style; the target style subtitles and the subtitle background are overlapped and spliced into the video for display, real-time and dynamic display in the style required by the user is achieved, the video can be suitable for different users, and the adaptability of the video is improved.
In a specific implementation process, when a user watches a video, identity information of the user can be recorded through a camera, a fingerprint identification component and the like, and subtitles of a target style used by the user are associated with the identity information of the user, so that subtitle libraries of different users are established, and therefore subtitles frequently used by the user can be directly called from different subtitle libraries when the user watches the video next time.
Fig. 2 is a schematic structural diagram of a video subtitle generating apparatus according to an embodiment of the present invention, and as shown in fig. 2, the video subtitle generating apparatus according to this embodiment may specifically include an intercepting module 20, an extracting module 21, a subtitle regenerating module 22, and a splicing module 23.
The intercepting module 20 is configured to respond to the monitored subtitle regeneration instruction and intercept a subtitle picture according to a subtitle position in the video;
in this embodiment, the subtitle position in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, analyzing the subtitle files and obtaining the position of the subtitle;
and if the type of the video is the embedded subtitle video, taking the preset position of the embedded subtitle video as the subtitle position, or acquiring the subtitle position by using a pre-trained text detection model.
The extraction module 21 is configured to extract a subtitle background from a subtitle picture;
the subtitle regenerating module 22 is used for inputting the subtitle content in the video into a multi-style subtitle generating model trained in advance for processing to obtain the subtitle with the target style;
in this embodiment, the subtitle content in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, analyzing the subtitle files and obtaining subtitle content;
and if the type of the video is the embedded subtitle video, acquiring subtitle content by using a pre-trained text detection model.
In a specific implementation process, a coder of a multi-style caption generating model can be used for coding caption contents to obtain a caption vector, and the caption vector is recombined with a preset topic word segmentation feature vector to obtain a recombined vector; and inputting the recombined vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the subtitle with the target style.
In this embodiment, the topic segmentation feature vector is set as follows:
extracting topic word segmentation feature vectors from preset topic words, and setting the topic word segmentation feature vectors;
extracting topic word segmentation feature vectors from the user-defined topic word segmentation, and setting the topic word segmentation feature vectors; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or the user-defined topic segmentation is obtained by self-creation of a user in a self-creation mode.
And the splicing module 23 is configured to superimpose the subtitles in the target style and the subtitle backgrounds, and splice the superimposed subtitles and the subtitle backgrounds into a video for display.
The video subtitle generating device of the embodiment intercepts a subtitle picture according to a subtitle position in a video by responding to a monitored subtitle regeneration instruction; extracting a caption background from a caption picture; inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style; the target style subtitles and the subtitle background are overlapped and spliced into the video for display, real-time and dynamic display in the style required by the user is achieved, the video can be suitable for different users, and the adaptability of the video is improved.
Fig. 3 is a schematic structural diagram of a video subtitle generating device according to an embodiment of the present invention, and as shown in fig. 3, the passing device according to this embodiment may include: a processor 1010 and a memory 1020. Those skilled in the art will appreciate that the device may also include input/output interface 1030, communication interface 1040, and bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The present invention also provides a storage medium storing one or more programs that when executed implement the video subtitle generating method of the above-described embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for generating a video subtitle, comprising:
in response to a monitored subtitle regeneration instruction, intercepting a subtitle picture according to a subtitle position in a video;
extracting a subtitle background from the subtitle picture;
inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles in a target style;
and overlapping the caption with the target style and the caption background, and splicing the overlapped caption and the caption background into the video for displaying.
2. The method for generating video subtitles according to claim 1, wherein inputting subtitle content in a video into a pre-trained multi-style subtitle generation model for processing to obtain subtitles of a target style, comprises:
encoding the subtitle content by using an encoder of the multi-style subtitle generating model to obtain a subtitle vector, and recombining the subtitle vector with a preset topic word segmentation characteristic vector to obtain a recombined vector;
and inputting the recombination vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the target style subtitle.
3. The method of claim 2, wherein the topic segmentation feature vector is set as follows:
extracting the topic word segmentation feature vector from preset topic words and setting the topic word segmentation feature vector;
extracting the topic word segmentation feature vector from a user-defined topic word segmentation, and setting the topic word segmentation feature vector; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or the user-defined topic segmentation is obtained by self-creation of a user in a self-creation mode.
4. The method for generating video subtitles according to claim 1, wherein a subtitle position in the video is obtained as follows:
if the type of the video is a plug-in subtitle video, extracting a subtitle file from the plug-in subtitle video, analyzing the subtitle file, and acquiring the position of the subtitle;
and if the type of the video is the embedded subtitle video, taking the preset position of the embedded subtitle video as the subtitle position, or acquiring the subtitle position by utilizing a pre-trained text detection model.
5. The method for generating video subtitles according to claim 1, wherein subtitle content in the video is obtained as follows:
if the type of the video is the plug-in subtitle video, extracting subtitle files from the plug-in subtitle video, and analyzing the subtitle files to obtain subtitle content;
and if the type of the video is the embedded subtitle video, acquiring the subtitle content by utilizing a pre-trained text detection model.
6. An apparatus for generating a video subtitle, comprising:
the intercepting module is used for responding to the monitored subtitle regeneration instruction and intercepting a subtitle picture according to the position of a subtitle in a video;
the extraction module is used for extracting the subtitle background from the subtitle picture;
the subtitle regenerating module is used for inputting subtitle contents in the video into a multi-style subtitle generating model trained in advance to be processed to obtain subtitles in a target style;
and the splicing module is used for overlapping the caption with the target style and the caption background and splicing the overlapped caption with the video for display.
7. The apparatus for generating video subtitles according to claim 6, wherein the subtitle regenerating module is specifically configured to:
encoding the subtitle content by using an encoder of the multi-style subtitle generating model to obtain a subtitle vector, and recombining the subtitle vector with a preset topic word segmentation characteristic vector to obtain a recombined vector;
and inputting the recombination vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the target style subtitle.
8. The apparatus for generating a video subtitle according to claim 7, wherein the topic segmentation feature vector is set as follows:
extracting the topic word segmentation feature vector from preset topic words and setting the topic word segmentation feature vector;
extracting the topic word segmentation feature vector from a user-defined topic word segmentation, and setting the topic word segmentation feature vector; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or the user-defined topic segmentation is obtained by self-creation of a user in a self-creation mode.
9. A video subtitle generating apparatus, comprising: a processor and a memory;
the processor is configured to execute a program for generating a video subtitle stored in the memory to implement the method for generating a video subtitle according to any one of claims 1-5.
10. A storage medium storing one or more programs which, when executed, implement the method for generating video subtitles according to any one of claims 1 to 5.
CN202110132044.XA 2021-01-31 2021-01-31 Video subtitle generating method, device, equipment and storage medium Active CN112911373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110132044.XA CN112911373B (en) 2021-01-31 2021-01-31 Video subtitle generating method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110132044.XA CN112911373B (en) 2021-01-31 2021-01-31 Video subtitle generating method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112911373A true CN112911373A (en) 2021-06-04
CN112911373B CN112911373B (en) 2023-05-26

Family

ID=76121994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110132044.XA Active CN112911373B (en) 2021-01-31 2021-01-31 Video subtitle generating method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112911373B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952255A (en) * 2022-11-21 2023-04-11 北京邮电大学 Multi-modal signal content analysis method and device, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715497A (en) * 2014-12-30 2015-06-17 上海孩子国科教设备有限公司 Data replacement method and system
CN105871681A (en) * 2015-12-14 2016-08-17 乐视网信息技术(北京)股份有限公司 Subtitle adding method and device
US20190340469A1 (en) * 2017-03-20 2019-11-07 Intel Corporation Topic-guided model for image captioning system
CN110458918A (en) * 2019-08-16 2019-11-15 北京百度网讯科技有限公司 Method and apparatus for output information
CN110866377A (en) * 2018-08-08 2020-03-06 北京优酷科技有限公司 Text content conversion method and device
CN111402367A (en) * 2020-03-27 2020-07-10 维沃移动通信有限公司 Image processing method and electronic equipment
CN111639474A (en) * 2020-05-26 2020-09-08 维沃移动通信有限公司 Document style reconstruction method and device and electronic equipment
CN112055245A (en) * 2020-09-11 2020-12-08 海信视像科技股份有限公司 Color subtitle realization method and display device
CN112084841A (en) * 2020-07-27 2020-12-15 齐鲁工业大学 Cross-modal image multi-style subtitle generation method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715497A (en) * 2014-12-30 2015-06-17 上海孩子国科教设备有限公司 Data replacement method and system
CN105871681A (en) * 2015-12-14 2016-08-17 乐视网信息技术(北京)股份有限公司 Subtitle adding method and device
US20190340469A1 (en) * 2017-03-20 2019-11-07 Intel Corporation Topic-guided model for image captioning system
CN110866377A (en) * 2018-08-08 2020-03-06 北京优酷科技有限公司 Text content conversion method and device
CN110458918A (en) * 2019-08-16 2019-11-15 北京百度网讯科技有限公司 Method and apparatus for output information
CN111402367A (en) * 2020-03-27 2020-07-10 维沃移动通信有限公司 Image processing method and electronic equipment
CN111639474A (en) * 2020-05-26 2020-09-08 维沃移动通信有限公司 Document style reconstruction method and device and electronic equipment
CN112084841A (en) * 2020-07-27 2020-12-15 齐鲁工业大学 Cross-modal image multi-style subtitle generation method and system
CN112055245A (en) * 2020-09-11 2020-12-08 海信视像科技股份有限公司 Color subtitle realization method and display device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952255A (en) * 2022-11-21 2023-04-11 北京邮电大学 Multi-modal signal content analysis method and device, electronic equipment and storage medium
CN115952255B (en) * 2022-11-21 2023-12-05 北京邮电大学 Multi-mode signal content analysis method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112911373B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN109803180B (en) Video preview generation method and device, computer equipment and storage medium
CN110557678B (en) Video processing method, device and equipment
CN107801096B (en) Video playing control method and device, terminal equipment and storage medium
CN110177295B (en) Subtitle out-of-range processing method and device and electronic equipment
KR101916874B1 (en) Apparatus, method for auto generating a title of video contents, and computer readable recording medium
CN111343496A (en) Video processing method and device
CN108521612B (en) Video abstract generation method, device, server and storage medium
CN110876079B (en) Video processing method, device and equipment
US20230291978A1 (en) Subtitle processing method and apparatus of multimedia file, electronic device, and computer-readable storage medium
CN106303303A (en) Method and device for translating subtitles of media file and electronic equipment
US10897658B1 (en) Techniques for annotating media content
US20170147170A1 (en) Method for generating a user interface presenting a plurality of videos
CN106408623A (en) Character presentation method, device and terminal
CN114598893A (en) Text video implementation method and system, electronic equipment and storage medium
CN112422844A (en) Method, device and equipment for adding special effect in video and readable storage medium
CN112911373A (en) Method, device and equipment for generating video subtitles and storage medium
US20170193668A1 (en) Intelligent Equipment-Based Motion Sensing Control Method, Electronic Device and Intelligent Equipment
CN107197339B (en) Display control method and device of film bullet screen and head-mounted display equipment
US20200057890A1 (en) Method and device for determining inter-cut time range in media item
CN112511897A (en) Video cover setting method, device, equipment and storage medium
CN111918074A (en) Live video fault early warning method and related equipment
US20160142456A1 (en) Method and Device for Acquiring Media File
CN112908337B (en) Method, device, equipment and storage medium for displaying voice recognition text
CN114268847A (en) Video playing method and device, electronic equipment and storage medium
KR101403159B1 (en) Apparatus and method for providing additional information about object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant