CN112911373B - Video subtitle generating method, device, equipment and storage medium - Google Patents

Video subtitle generating method, device, equipment and storage medium Download PDF

Info

Publication number
CN112911373B
CN112911373B CN202110132044.XA CN202110132044A CN112911373B CN 112911373 B CN112911373 B CN 112911373B CN 202110132044 A CN202110132044 A CN 202110132044A CN 112911373 B CN112911373 B CN 112911373B
Authority
CN
China
Prior art keywords
subtitle
video
caption
user
style
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110132044.XA
Other languages
Chinese (zh)
Other versions
CN112911373A (en
Inventor
张晋
刘青松
梁家恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202110132044.XA priority Critical patent/CN112911373B/en
Publication of CN112911373A publication Critical patent/CN112911373A/en
Application granted granted Critical
Publication of CN112911373B publication Critical patent/CN112911373B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Studio Circuits (AREA)

Abstract

The invention relates to a method, a device, equipment and a storage medium for generating video captions, wherein the method comprises the following steps: responding to the monitored subtitle regeneration instruction, and intercepting subtitle pictures according to subtitle positions in the video; extracting a caption background from the caption picture; inputting the caption content in the video into a pre-trained multi-style caption generating model for processing to obtain a caption with a target style; and overlapping the subtitle of the target style and the subtitle background, splicing the subtitle and the subtitle background into the video for display, and realizing real-time dynamic display in a style required by a user, so that the video can be suitable for different users, and the adaptability of the video is improved.

Description

Video subtitle generating method, device, equipment and storage medium
Technical Field
The present invention relates to the field of video playing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating video subtitles.
Background
Video is an important medium for information transfer, and is important in people's life. In general, most videos are provided with subtitles, and the subtitles are displayed in the video while the video is played.
In the prior art, subtitles in a video are usually displayed in a fixed form in the video, and for some users, the video may not be watched anymore because the subtitles in the video are not interesting, or the video is evaluated to be low, so that the play rate of the video is affected. Therefore, how to implement personalized setting of video subtitles and improve video adaptability is a technical problem to be solved urgently by those skilled in the art.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for generating video subtitles, which can solve the technical problem that the video subtitles cannot be set individually, so that the adaptability of the video is low.
The technical scheme for solving the technical problems is as follows:
a method of generating video subtitles, comprising:
responding to the monitored subtitle regeneration instruction, and intercepting subtitle pictures according to subtitle positions in the video;
extracting a caption background from the caption picture;
inputting the caption content in the video into a pre-trained multi-style caption generating model for processing to obtain a caption with a target style;
and superposing the caption of the target style and the caption background, and splicing the caption and the caption background into the video for display.
Further, in the method for generating video subtitles described above, inputting subtitle content in video into a pre-trained multi-style subtitle generating model for processing to obtain a subtitle of a target style, including:
encoding the caption content by using an encoder of the multi-style caption generating model to obtain a caption vector, and recombining the caption vector with a preset theme word segmentation feature vector to obtain a recombined vector;
inputting the recombined vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the subtitle of the target style.
Further, in the method for generating a video subtitle described above, the topic segmentation feature vector is set as follows:
extracting the topic word segmentation feature vector from preset topic word segmentation, and setting the topic word segmentation feature vector;
extracting the topic word segmentation feature vector from the user-defined topic word segmentation, and setting the topic word segmentation feature vector; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or is obtained by user self-creation in a self-creation mode.
Further, in the method for generating video subtitles described above, the subtitle positions in the video are obtained as follows:
if the video type is the externally-hung subtitle video, extracting a subtitle file from the externally-hung subtitle video, and analyzing the subtitle file to obtain the subtitle position;
and if the type of the video is an embedded subtitle video, taking a preset position of the embedded subtitle video as the subtitle position, or acquiring the subtitle position by using a pre-trained text detection model.
Further, in the method for generating video subtitles described above, the subtitle content in the video is obtained as follows:
if the video type is the externally-hung subtitle video, extracting a subtitle file from the externally-hung subtitle video, and analyzing the subtitle file to obtain the subtitle content;
and if the type of the video is embedded subtitle video, acquiring the subtitle content by using a pre-trained text detection model.
The invention also provides a device for generating the video subtitle, which comprises the following steps:
the intercepting module is used for intercepting caption pictures according to the caption positions in the video in response to the monitored caption regeneration instruction;
the extraction module is used for extracting the caption background from the caption picture;
the subtitle regeneration module is used for inputting subtitle contents in the video into a pre-trained multi-style subtitle generation model for processing to obtain a subtitle with a target style;
and the splicing module is used for superposing the subtitle of the target style and the subtitle background, and splicing the subtitle and the subtitle background into the video for display.
Further, in the above-mentioned video subtitle generating device, the subtitle regenerating module is specifically configured to:
encoding the caption content by using an encoder of the multi-style caption generating model to obtain a caption vector, and recombining the caption vector with a preset theme word segmentation feature vector to obtain a recombined vector;
inputting the recombined vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the subtitle of the target style.
Further, in the video subtitle generating apparatus described above, the topic segmentation feature vector is set as follows:
extracting the topic word segmentation feature vector from preset topic word segmentation, and setting the topic word segmentation feature vector;
extracting the topic word segmentation feature vector from the user-defined topic word segmentation, and setting the topic word segmentation feature vector; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or is obtained by user self-creation in a self-creation mode.
The invention also provides a video subtitle generating device, which comprises: a processor and a memory;
the processor is configured to execute a video subtitle generating program stored in the memory, so as to implement the video subtitle generating method described in any one of the above.
The present invention also provides a storage medium storing one or more programs which when executed implement the method of generating video subtitles as described in any of the above.
The beneficial effects of the invention are as follows:
intercepting a caption picture according to a caption position in a video by responding to a monitored caption regeneration instruction; extracting a caption background from a caption picture; inputting the caption content in the video into a pre-trained multi-style caption generating model for processing to obtain a caption with a target style; the subtitle and the subtitle background in the target style are overlapped and spliced into the video for display, so that the real-time dynamic display in the style required by the user is realized, the video can be suitable for different users, and the adaptability of the video is improved.
Drawings
Fig. 1 is a flowchart of a method for generating video subtitles according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a video subtitle generating apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a video subtitle generating apparatus according to an embodiment of the present invention.
Detailed Description
The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.
Fig. 1 is a flowchart of a method for generating video subtitles according to an embodiment of the present invention, as shown in fig. 1, the method for generating video subtitles according to the embodiment may specifically include the following steps:
100. responding to the monitored subtitle regeneration instruction, and intercepting subtitle pictures according to subtitle positions in the video;
in a specific implementation process, when a user watches a video, if the user considers that the subtitle in the video cannot meet the requirement of the user, a subtitle regeneration instruction can be input, so that after the subtitle regeneration instruction is received, the user can respond and intercept subtitle pictures according to the subtitle position in the video.
In practical applications, the video may be a plug-in subtitle video or an embedded subtitle video. Therefore, in the present embodiment, the subtitle position in the video is acquired as follows:
if the video type is the externally-hung subtitle video, extracting a subtitle file from the externally-hung subtitle video, and analyzing the subtitle file to obtain a subtitle position;
if the type of the video is an embedded subtitle video, the subtitle and the video are integrated, and the subtitle position cannot be extracted, but the subtitle position in the embedded subtitle video is usually a fixed position, so that the preset position of the embedded subtitle video can be used as the subtitle position in the video. If the subtitle position in the embedded subtitle video is not a fixed position, a text in the video can be detected by using a pre-trained text detection model, and the subtitle position in the video can be further acquired.
After the caption position in the video is obtained, the caption picture can be intercepted by utilizing an image interception technology according to the caption position in the video.
101. Extracting a caption background from the intercepted caption picture;
in this embodiment, the caption in the picture and the caption background may be separated, and the picture with the caption background removed may be obtained.
102. Inputting the caption content in the video into a pre-trained multi-style caption generating model for processing to obtain a caption with a target style;
in a specific implementation process, subtitle content in a video is acquired as follows:
if the type of the video is the plug-in subtitle video, extracting a subtitle file from the plug-in subtitle video, and analyzing the subtitle file to obtain subtitle content in the video;
if the type of the video is embedded subtitle video, acquiring subtitle content in the video by using a pre-trained text detection model. For example, subtitle content in a video may be acquired based on optical character recognition (Optical Character Recognition, OCR) technology.
In a specific implementation process, the multi-style subtitle generating model of the embodiment can be obtained after training based on the countermeasure generating network in advance, so that after subtitle content in video is obtained, a coder of the multi-style subtitle generating model can be utilized to code the subtitle content to obtain subtitle vectors, the subtitle vectors are recombined with preset topic word segmentation feature vectors to obtain recombined vectors, the recombined vectors are input into the countermeasure generating network corresponding to the multi-style subtitle generating model to obtain subtitles of a target style, and therefore, the subtitles in the video are more personalized to be presented to users, and more unique experience is brought to video enjoyers. For example, in the video of children, cartoon style subtitles can be generated in a personalized way, and the effect brought by the video is improved.
In some embodiments, the topic word feature vector may be extracted from a preset topic word and set.
In some embodiments, to further meet the needs of different users, the topic word segmentation feature vector may be extracted from the custom topic word segmentation, and the topic word segmentation feature vector may be set. Specifically, for the preset topic word, only a part of styles may not meet the user requirements, so that the user only needs to adjust a small part of the preset topic word to reach the user requirements, and therefore, in the embodiment, the preset topic word can be edited again to obtain the user-defined topic word.
In some embodiments, the user may also create the custom topic word itself, specifically, the user triggers a self-creation instruction, and in the self-creation mode the user self-creates the custom topic word. For example, the user may upload his or her own drawing as a caption style, and in the self-creation mode, the user uploads his or her own drawing as a defined subject word, extracts a subject word feature vector from the defined subject word, and sets the subject word feature vector.
103. And superposing the caption of the target style and the caption background, and splicing the caption and the caption background into a video for display.
After the target style subtitles are obtained, the target style subtitles and the subtitle background can be overlapped to form an image containing the personalized subtitles and spliced into the video for display, so that the subtitles in the video can be dynamically displayed in a user-required style in real time, and the phenomenon that the user does not watch the video any more because the user is not interested in the subtitles in the video or evaluates the video less, thereby influencing the play rate of the video is reduced.
In this embodiment, it is preferable to generate the subtitle of the target style at the video playing end, so that the network transmission overhead consumption is reduced, the influence of the network bandwidth and the delay on a certain extent when the video playing end and the remote control end interact is avoided, and the stability of generating the subtitle of the target style is improved.
According to the method for generating the video subtitles, the subtitle pictures are intercepted according to the subtitle positions in the video by responding to the monitored subtitle regeneration instruction; extracting a caption background from a caption picture; inputting the caption content in the video into a pre-trained multi-style caption generating model for processing to obtain a caption with a target style; the subtitle and the subtitle background in the target style are overlapped and spliced into the video for display, so that the real-time dynamic display in the style required by the user is realized, the video can be suitable for different users, and the adaptability of the video is improved.
In a specific implementation process, when a user watches a video, identity information of the user can be recorded through a camera, a fingerprint identification component and the like, and subtitles of a target style used by the user are associated with the identity information of the user, so that subtitle libraries of different users are built, and when the user watches the video next time, the subtitles frequently used by the user can be directly called from the different subtitle libraries.
Fig. 2 is a schematic structural diagram of a video subtitle generating apparatus according to an embodiment of the present invention, and as shown in fig. 2, the video subtitle generating apparatus in this embodiment may specifically include an interception module 20, an extraction module 21, a subtitle regeneration module 22, and a splicing module 23.
The intercepting module 20 is used for intercepting caption pictures according to caption positions in the video in response to the monitored caption regeneration instruction;
in this embodiment, the subtitle position in the video is acquired as follows:
if the video type is the externally hung subtitle video, extracting a subtitle file from the externally hung subtitle video, and analyzing the subtitle file to obtain a subtitle position;
if the type of the video is the embedded subtitle video, taking the preset position of the embedded subtitle video as the subtitle position, or acquiring the subtitle position by using a pre-trained text detection model.
An extracting module 21, configured to extract a caption background from a caption picture;
the subtitle regeneration module 22 is configured to input subtitle content in a video into a pre-trained multi-style subtitle generation model for processing, so as to obtain a subtitle with a target style;
in this embodiment, the subtitle content in the video is acquired as follows:
if the video type is the externally hung subtitle video, extracting a subtitle file from the externally hung subtitle video, and analyzing the subtitle file to obtain subtitle content;
if the type of the video is embedded subtitle video, acquiring subtitle content by using a pre-trained text detection model.
In a specific implementation process, a multi-style caption generating model encoder can be utilized to encode caption contents to obtain caption vectors, and the caption vectors and preset topic word segmentation feature vectors are recombined to obtain recombined vectors; inputting the recombined vector into a countermeasure generation network corresponding to the multi-style subtitle generation model to obtain the subtitle of the target style.
In this embodiment, the topic word segmentation feature vector is set as follows:
extracting topic word segmentation feature vectors from preset topic word segmentation, and setting the topic word segmentation feature vectors;
extracting topic word segmentation feature vectors from the user-defined topic word segmentation, and setting the topic word segmentation feature vectors; the user-defined topic segmentation is obtained by re-editing the preset topic segmentation, or is obtained by user self-creation in a self-creation mode.
And the splicing module 23 is used for superposing the caption of the target style and the caption background and splicing the caption and the caption background into the video for display.
The generation device of the video caption of the embodiment intercepts caption pictures according to the caption position in the video by responding to the monitored caption regeneration instruction; extracting a caption background from a caption picture; inputting the caption content in the video into a pre-trained multi-style caption generating model for processing to obtain a caption with a target style; the subtitle and the subtitle background in the target style are overlapped and spliced into the video for display, so that the real-time dynamic display in the style required by the user is realized, the video can be suitable for different users, and the adaptability of the video is improved.
Fig. 3 is a schematic structural diagram of a video subtitle generating device according to an embodiment of the present invention, as shown in fig. 3, a traffic device according to this embodiment may include: a processor 1010 and a memory 1020. The device may also include an input/output interface 1030, a communication interface 1040, and a bus 1050, as will be appreciated by those skilled in the art. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.
The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
The present invention also provides a storage medium storing one or more programs which when executed implement the video subtitle generating method of the above embodiments.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the invention. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The present invention is not limited to the above embodiments, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and these modifications and substitutions are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (6)

1. A method for generating video subtitles, comprising:
responding to the monitored subtitle regeneration instruction, and intercepting subtitle pictures according to subtitle positions in the video;
extracting a caption background from the caption picture;
encoding the caption content in the video by using an encoder of a multi-style caption generating model to obtain caption vectors, and recombining the caption vectors with preset topic word segmentation feature vectors to obtain recombined vectors;
inputting the reorganization vector into an countermeasure generation network corresponding to the multi-style subtitle generation model to obtain a subtitle of a target style; the topic word segmentation feature vector is set in the following mode: extracting the topic word segmentation feature vector from the user-defined topic word segmentation, and setting the topic word segmentation feature vector; the self-defined topic segmentation is obtained by re-editing preset topic segmentation, or is obtained by self-creating by a user in a self-creating mode;
overlapping the caption of the target style and the caption background, and splicing the caption and the caption background into the video for display;
when the user watches the video, the identity information of the user is recorded, and the subtitle of the target style used by the user is associated with the identity information of the user.
2. The method for generating video subtitles according to claim 1, wherein the subtitle position in the video is acquired as follows:
if the video type is the externally-hung subtitle video, extracting a subtitle file from the externally-hung subtitle video, and analyzing the subtitle file to obtain the subtitle position;
and if the type of the video is an embedded subtitle video, taking a preset position of the embedded subtitle video as the subtitle position, or acquiring the subtitle position by using a pre-trained text detection model.
3. The method for generating video subtitles according to claim 1, wherein subtitle content in the video is acquired as follows:
if the video type is the externally-hung subtitle video, extracting a subtitle file from the externally-hung subtitle video, and analyzing the subtitle file to obtain the subtitle content;
and if the type of the video is embedded subtitle video, acquiring the subtitle content by using a pre-trained text detection model.
4. A video subtitle generating apparatus, comprising:
the intercepting module is used for intercepting caption pictures according to the caption positions in the video in response to the monitored caption regeneration instruction;
the extraction module is used for extracting the caption background from the caption picture;
the subtitle regeneration module is used for encoding subtitle contents in the video by utilizing an encoder of the multi-style subtitle generation model to obtain subtitle vectors, and recombining the subtitle vectors with preset topic word segmentation feature vectors to obtain recombined vectors; inputting the reorganization vector into an countermeasure generation network corresponding to the multi-style subtitle generation model to obtain a subtitle of a target style; the topic word segmentation feature vector is set in the following mode: extracting the topic word segmentation feature vector from the user-defined topic word segmentation, and setting the topic word segmentation feature vector; the self-defined topic segmentation is obtained by re-editing preset topic segmentation, or is obtained by self-creating by a user in a self-creating mode;
a splicing module for superposing the caption of the target style and the caption background and splicing the caption and the caption background into the video for display,
and the recording module is used for recording the identity information of the user when the user watches the video, and associating the caption of the target style used by the user with the identity information of the user.
5. A video subtitle generating apparatus, comprising: a processor and a memory;
the processor is configured to execute a video subtitle generating program stored in the memory, so as to implement the video subtitle generating method of any one of claims 1 to 3.
6. A storage medium storing one or more programs which when executed by a processor implement the method of generating video subtitles of any of claims 1-3.
CN202110132044.XA 2021-01-31 2021-01-31 Video subtitle generating method, device, equipment and storage medium Active CN112911373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110132044.XA CN112911373B (en) 2021-01-31 2021-01-31 Video subtitle generating method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110132044.XA CN112911373B (en) 2021-01-31 2021-01-31 Video subtitle generating method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112911373A CN112911373A (en) 2021-06-04
CN112911373B true CN112911373B (en) 2023-05-26

Family

ID=76121994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110132044.XA Active CN112911373B (en) 2021-01-31 2021-01-31 Video subtitle generating method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112911373B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952255B (en) * 2022-11-21 2023-12-05 北京邮电大学 Multi-mode signal content analysis method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715497A (en) * 2014-12-30 2015-06-17 上海孩子国科教设备有限公司 Data replacement method and system
CN105871681A (en) * 2015-12-14 2016-08-17 乐视网信息技术(北京)股份有限公司 Subtitle adding method and device
CN110458918A (en) * 2019-08-16 2019-11-15 北京百度网讯科技有限公司 Method and apparatus for output information
CN110866377A (en) * 2018-08-08 2020-03-06 北京优酷科技有限公司 Text content conversion method and device
CN111402367A (en) * 2020-03-27 2020-07-10 维沃移动通信有限公司 Image processing method and electronic equipment
CN111639474A (en) * 2020-05-26 2020-09-08 维沃移动通信有限公司 Document style reconstruction method and device and electronic equipment
CN112055245A (en) * 2020-09-11 2020-12-08 海信视像科技股份有限公司 Color subtitle realization method and display device
CN112084841A (en) * 2020-07-27 2020-12-15 齐鲁工业大学 Cross-modal image multi-style subtitle generation method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018170671A1 (en) * 2017-03-20 2018-09-27 Intel Corporation Topic-guided model for image captioning system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715497A (en) * 2014-12-30 2015-06-17 上海孩子国科教设备有限公司 Data replacement method and system
CN105871681A (en) * 2015-12-14 2016-08-17 乐视网信息技术(北京)股份有限公司 Subtitle adding method and device
CN110866377A (en) * 2018-08-08 2020-03-06 北京优酷科技有限公司 Text content conversion method and device
CN110458918A (en) * 2019-08-16 2019-11-15 北京百度网讯科技有限公司 Method and apparatus for output information
CN111402367A (en) * 2020-03-27 2020-07-10 维沃移动通信有限公司 Image processing method and electronic equipment
CN111639474A (en) * 2020-05-26 2020-09-08 维沃移动通信有限公司 Document style reconstruction method and device and electronic equipment
CN112084841A (en) * 2020-07-27 2020-12-15 齐鲁工业大学 Cross-modal image multi-style subtitle generation method and system
CN112055245A (en) * 2020-09-11 2020-12-08 海信视像科技股份有限公司 Color subtitle realization method and display device

Also Published As

Publication number Publication date
CN112911373A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
US20220236787A1 (en) Augmentation modification based on user interaction with augmented reality scene
US9779775B2 (en) Automatic generation of compilation videos from an original video based on metadata associated with the original video
US10904617B1 (en) Synchronizing a client device with media content for scene-specific notifications
KR102131322B1 (en) Computing device, method, computer program for processing video
CN107291222B (en) Interactive processing method, device and system of virtual reality equipment and virtual reality equipment
US9558784B1 (en) Intelligent video navigation techniques
JP7209851B2 (en) Image deformation control method, device and hardware device
CN110177295B (en) Subtitle out-of-range processing method and device and electronic equipment
US20230291978A1 (en) Subtitle processing method and apparatus of multimedia file, electronic device, and computer-readable storage medium
CN110505498A (en) Processing, playback method, device and the computer-readable medium of video
CN112399249A (en) Multimedia file generation method and device, electronic equipment and storage medium
CN108124170A (en) A kind of video broadcasting method, device and terminal device
KR20160013649A (en) Video display method and user terminal for creating subtitles based on ambient noise
CN112911373B (en) Video subtitle generating method, device, equipment and storage medium
CN112422844A (en) Method, device and equipment for adding special effect in video and readable storage medium
US10936878B2 (en) Method and device for determining inter-cut time range in media item
CN113965665A (en) Method and equipment for determining virtual live broadcast image
CN107197339B (en) Display control method and device of film bullet screen and head-mounted display equipment
CN105049910A (en) Video processing method and device
CN111918074A (en) Live video fault early warning method and related equipment
US20220070501A1 (en) Social video platform for generating and experiencing content
US20220279234A1 (en) Live stream display method and apparatus, electronic device, and readable storage medium
CN113411532A (en) Method, device, terminal and storage medium for recording content
KR20140033667A (en) Apparatus and method for video edit based on object
CN112908337B (en) Method, device, equipment and storage medium for displaying voice recognition text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant