CN110149548B - Video dubbing method, electronic device and readable storage medium - Google Patents

Video dubbing method, electronic device and readable storage medium Download PDF

Info

Publication number
CN110149548B
CN110149548B CN201811122718.2A CN201811122718A CN110149548B CN 110149548 B CN110149548 B CN 110149548B CN 201811122718 A CN201811122718 A CN 201811122718A CN 110149548 B CN110149548 B CN 110149548B
Authority
CN
China
Prior art keywords
video
library
file
playing
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811122718.2A
Other languages
Chinese (zh)
Other versions
CN110149548A (en
Inventor
刘玉杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201811122718.2A priority Critical patent/CN110149548B/en
Publication of CN110149548A publication Critical patent/CN110149548A/en
Application granted granted Critical
Publication of CN110149548B publication Critical patent/CN110149548B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4396Processing of audio elementary streams by muting the audio signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4852End-user interface for client configuration for modifying audio parameters, e.g. switching between mono and stereo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention discloses a video dubbing method, an electronic device and a computer-readable storage medium. The video dubbing method comprises the following steps: playing the video file; processing the subtitle file associated with the video file to read an individualized voice file corresponding to the current subtitle vocabulary from an audio library, wherein the audio library comprises at least one individualized voice file, and the individualized voice file comprises library vocabularies and user voice fragments corresponding to the library vocabularies; performing mute processing on the original audio corresponding to the related video file and the current caption vocabulary; and playing the user voice fragment in the personalized voice file corresponding to the current caption vocabulary. According to the video dubbing method, the electronic device and the computer-readable storage medium provided by the embodiment of the invention, when the video is played, the electronic device can play the audio dubbed by the user, so that the interaction between the electronic device and the user during video playing is enhanced, and the interest of video playing is improved.

Description

Video dubbing method, electronic device and readable storage medium
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a video dubbing method, an electronic device, and a computer-readable storage medium.
Background
At present, most videos such as television dramas and movies are obtained by encapsulating dubbing information of professional dubbing actors and multiple captured images. Although the video dubbed by the professional dubbing actor has high appreciation, the video has poor interactivity with the user and low interest.
Disclosure of Invention
The embodiment of the invention provides a video dubbing method, an electronic device and a computer-readable storage medium.
The video dubbing method of the embodiment of the invention comprises the following steps: playing the video file; processing the subtitle file associated with the video file to read a personalized voice file corresponding to the current subtitle vocabulary from an audio library, wherein the audio library comprises at least one personalized voice file, and the personalized voice file comprises library vocabularies and user voice fragments corresponding to the library vocabularies; mute the original audio associated with the video file and corresponding to the current caption vocabulary; and playing the user voice fragment in the personalized voice file corresponding to the current caption vocabulary.
The video dubbing method comprises the following steps: reading a video and audio library, wherein the video comprises a video file, a subtitle file and original audio, and the audio library comprises library vocabularies and user voice segments corresponding to the library vocabularies; searching the library vocabulary matched with the subtitle file in the audio library so as to generate personalized audio and synchronous associated information of the subtitle file and the personalized audio by using the user voice fragment corresponding to the library vocabulary; associating the video file, the subtitle file and the personalized audio according to the synchronous associated information to form a personalized video; and playing the personalized audio when playing the personalized video.
The electronic device of an embodiment of the present invention includes one or more processors, memory, and one or more programs. Wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs including instructions for performing the video dubbing method described above.
The computer-readable storage medium of an embodiment of the present invention includes a computer program for use in conjunction with an electronic device, the computer program being executable by a processor to perform the instructions of the video dubbing method described above.
According to the video dubbing method, the electronic device and the computer-readable storage medium provided by the embodiment of the invention, when the video is played, the electronic device can play the audio dubbed by the user, so that the interaction between the electronic device and the user during video playing is enhanced, and the interest of video playing is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flow diagram illustrating a video dubbing method in accordance with some embodiments of the present invention.
Fig. 2 is a block diagram of a video dubbing apparatus according to some embodiments of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to some embodiments of the invention.
Fig. 4 is a flow chart illustrating a video dubbing method according to some embodiments of the present invention.
Fig. 5 is a flow chart illustrating a video dubbing method in accordance with some embodiments of the present invention.
Fig. 6 is a flow chart illustrating a video dubbing method according to some embodiments of the present invention.
Fig. 7 is a block diagram of a video dubbing apparatus in accordance with certain implementations of the invention.
Fig. 8 is a block diagram of an identification module of a video dubbing apparatus according to some embodiments of the present invention.
Fig. 9 is a flow chart illustrating a video dubbing method according to some embodiments of the present invention.
Fig. 10 is a flow chart illustrating a video dubbing method in accordance with some embodiments of the present invention.
Fig. 11 is a block diagram of a video dubbing apparatus according to some embodiments of the present invention.
Fig. 12 is a flow chart illustrating a video dubbing method in accordance with some embodiments of the present invention.
Fig. 13 is a flow chart of a video dubbing apparatus according to some embodiments of the present invention.
Fig. 14 is a block diagram of a video dubbing apparatus according to some embodiments of the present invention.
Fig. 15 is a block diagram of a volume determination module of a video dubbing apparatus in accordance with certain embodiments of the present invention.
Fig. 16 is a flow chart of a video dubbing apparatus according to some embodiments of the present invention.
Fig. 17 is a block diagram of a video dubbing apparatus in accordance with some embodiments of the present invention.
FIG. 18 is a schematic diagram of the connection of an electronic device to a computer-readable storage medium according to some embodiments of the invention.
Fig. 19 is a flow chart illustrating a video dubbing method in accordance with some embodiments of the present invention.
Fig. 20 is a block diagram of a video dubbing apparatus in accordance with some embodiments of the present invention.
Fig. 21 is a flow chart illustrating a video dubbing method in accordance with some embodiments of the present invention.
Fig. 22 is a flow chart illustrating a video dubbing method in accordance with some embodiments of the present invention.
Fig. 23 is a block diagram of a video dubbing apparatus according to some embodiments of the present invention.
Fig. 24 is a block diagram of an identification module of a video dubbing apparatus in accordance with certain embodiments of the present invention.
Fig. 25 is a flow chart illustrating a video dubbing method in accordance with some embodiments of the present invention.
Fig. 26 is a block diagram of a matching module of a video dubbing apparatus according to some embodiments of the present invention.
Fig. 27 is a flow chart illustrating a video dubbing method in accordance with some embodiments of the present invention.
Fig. 28 is a block diagram of a video dubbing apparatus in accordance with some embodiments of the present invention.
Fig. 29 is a flow chart illustrating a video dubbing method in accordance with some embodiments of the present invention.
Fig. 30 is a flow diagram illustrating a video dubbing method in accordance with certain implementations of the invention.
Fig. 31 is a block diagram of a video dubbing apparatus in accordance with some embodiments of the present invention.
Fig. 32 is a block diagram of a volume determination module of a video dubbing apparatus in accordance with certain embodiments of the present invention.
Fig. 33 is a flow chart illustrating a video dubbing method in accordance with some embodiments of the present invention.
Fig. 34 is a block diagram of a video dubbing apparatus in accordance with certain implementations of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Referring to fig. 1 and fig. 3 together, the present invention provides a video dubbing method for an electronic device 100. The video dubbing method comprises the following steps:
s12: playing the video file;
s14: processing the subtitle file associated with the video file to read an individualized voice file corresponding to the current subtitle vocabulary from an audio library, wherein the audio library comprises at least one individualized voice file, and the individualized voice file comprises library vocabularies and user voice fragments corresponding to the library vocabularies;
s16: performing mute processing on the original audio corresponding to the related video file and the current caption vocabulary; and
s18: and playing the user voice fragment in the personalized voice file corresponding to the current caption vocabulary.
Referring to fig. 2 and fig. 3, the present invention further provides a video dubbing apparatus 10. The video dubbing apparatus 10 is applicable to the electronic apparatus 100. The video dubbing method according to the embodiment of the present invention can be realized by the video dubbing apparatus 10. The video dubbing apparatus 10 includes a first playback module 12, a first processing module 14, a second processing module 16, and a second playback module 18. Step S12 may be implemented by the first playback module 12. Step S14 may be implemented by the first processing module 14. The step S16 can be implemented by the second processing module 16, and the step S18 can be implemented by the second play module 18.
That is, the first playing module 12 can be used to play video files. The first processing module 14 is configured to process the subtitle file associated with the video file to read a personalized voice file corresponding to the current subtitle vocabulary from an audio library, where the audio library includes at least one personalized voice file, and the personalized voice file includes library vocabularies and user voice snippets corresponding to the library vocabularies. The second processing module 16 may be used to mute the original audio associated with the video file and corresponding to the current subtitle vocabulary. The second playing module 18 may be configured to play the user voice clip in the personalized voice file corresponding to the current subtitle vocabulary.
The electronic device 100 may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, a wearable device (such as a smart watch, a smart bracelet, smart glasses, a smart helmet, etc.), and the like.
The electronic device 100 includes a display screen 40, a processor 20, a memory 30, and an electro-acoustic element 50 (e.g., a speaker, an earphone, etc.). The first playing module 12 may be a display 40 of the electronic device 100 for displaying the playing image. The first processing module 14 and the second processing module 16 may be programs stored in the memory 30, which can implement the functions indicated by step S14 and step S16, respectively, and the processor 20 executes the programs to complete step S14 and step S16. The second playing module 18 may be an electroacoustic element 50 of the electronic device 100, and is used for playing audio.
Video is generally composed of three parts, a video file, a subtitle file, and an audio file. The video file is composed of a plurality of playing images with time stamps (namely playing time points), and the plurality of playing images form a smooth dynamic picture after being played at a frame rate larger than a frame rate which can be distinguished by human eyes. The audio file typically contains audio indicating the content of the person's speech in the playing image and a time stamp (i.e., a playing time point) for matching with the time stamp (i.e., a playing time point) of the playing image to enable the video file and the audio file to be played synchronously. The caption file comprises character information corresponding to the audio frequency and a playing time point corresponding to the character information, and the playing time point of the caption file can be matched with a time stamp of the audio frequency and a time stamp of a playing image so as to realize synchronous playing of the video file, the caption file and the audio file.
Audio files in current video (e.g., television shows, movies, etc.) are obtained by capturing the dubbing of a professional dubbing actor. Although the video dubbed by the dubbing actors has high appreciation, the user can only hear the dubbing of the professional dubbing actors when playing the video, so that the video playing mode is single, the interactivity with the user is poor, and the interestingness is low.
In the video dubbing method according to the embodiment of the present invention, the electronic apparatus 100 collects various voices of the user in daily life and stores the voices in the memory 30, recognizes text information corresponding to the voices, splits the text information to form a plurality of library vocabularies, and splits the voices based on the library vocabularies to form a plurality of user voice snippets, thereby establishing an audio library in which one library vocabulary corresponds to one user voice snippet.
Referring to fig. 4, when the user inputs an instruction to play a video on the electronic device 100, the electronic device 100 first mutes the original audio in the video. And then, extracting a subtitle file from the video, and splitting the subtitle in the subtitle file to obtain a plurality of subtitle vocabularies and playing time points corresponding to the subtitle vocabularies. Subsequently, the electronic device 100 plays the video file, i.e. plays the multi-frame playing image at a certain frame rate. At each playing time point in the playing process of the video file, the electronic device 100 searches for a personalized voice file, i.e., a user voice clip, corresponding to a subtitle vocabulary (i.e., a current subtitle vocabulary) to be played at the current playing time point in the audio library; after finding out the user voice segment, the electroacoustic element 50 of the electronic device 100 plays the user voice segment. At the next playing time point, the electronic device 100 continues to perform operations of searching for the user voice segment corresponding to the caption vocabulary to be played at the next playing time point and playing the searched user voice segment, and the steps are repeated in this way until all the played images in the video file are played.
According to the video dubbing method provided by the embodiment of the invention, when the video is played, the electronic device 100 can play the personalized voice file dubbed by the user, so that the interaction between the electronic device 100 and the user during video playing is enhanced, and the interest of video playing is improved.
Referring to fig. 3, 5 and 6 together, in some embodiments, the audio library may be obtained by the following steps. That is, before playing the video file in step S12, the video dubbing method further includes:
s111: collecting voice input by a user by using the electronic device 100; and
s112: speech is recognized to obtain a library vocabulary and a user speech segment.
Wherein, step S112 further comprises:
s1121: speech recognition of speech to obtain text information;
s1122: disassembling the text information to obtain a plurality of library vocabularies;
s1123: decomposing the voice according to the plurality of library vocabularies to obtain a plurality of user voice fragments respectively corresponding to the plurality of library vocabularies; and
s1124: and storing each user voice fragment in an audio library, wherein the file name of each user voice fragment is a library vocabulary corresponding to the user voice fragment.
Referring to fig. 3, 7 and 8, in some embodiments, the video dubbing apparatus 10 includes an acquisition module 111 and an identification module 112. The identification module 112 includes a first identification unit 1121, a first disassembling unit 1122, a second disassembling unit 1123, and a storage unit 1124. Step S111 may be implemented by the acquisition module 111. Step S112 may be implemented by the identification module 112. Step S1121 may be implemented by the first identifying unit 1121. Step S1122 may be implemented by the first disassembling unit 1122. Step S1123 may be implemented by the second disassembling unit 1123. Step S1124 may be implemented by the storage unit 1124.
That is, the collection module 111 can be used to collect the voice recorded by the user using the electronic device 100. Recognition module 112 may be used to speech recognize the speech to obtain a library vocabulary and a user speech snippet. The first recognition unit 1121 may be used for voice recognition of voice to obtain text information. The first parsing unit 1122 can be used to parse the text message to obtain a plurality of library vocabularies. The second parsing unit 1123 may be configured to parse the speech according to the plurality of library vocabularies to obtain a plurality of user speech segments corresponding to the plurality of library vocabularies, respectively. The storage unit 1124 may be configured to store each user speech segment in the audio library, where a file name of each user speech segment is a library vocabulary corresponding to the user speech segment.
The collecting module 111 may be an acoustic-electric element 60, such as a microphone, disposed on the electronic device 100. The recognition module 112 may be a program stored in the memory 30 to implement the function indicated in step S112, and the processor 20 executes the program to complete step S112. The first recognition unit 1121, the first disassembling unit 1122, the second disassembling unit 1123, and the storage unit 1124 may be subroutines stored in the memory 30 under a program corresponding to the recognition module 112. The processor 20 executing the subroutine may implement steps S1121 through S1124. The storage unit 1124 may be the memory 30 in the electronic device 100.
The triggering condition for the electronic device 100 to collect various voices of the user in daily life may be user input, that is, the user starts the recording function of the electronic device 100, and at this time, the user completes the recording of a plurality of vocabularies by speaking in a concentrated manner. Alternatively, the trigger condition for the electronic device 100 to collect various voices of the user in daily life may be automatically triggered by the electronic device 100. For example, when the electronic device 100 makes or receives a call (e.g., makes and receives a call, a voice call on social software, a video call, etc.), the electronic device 100 starts a recording function to record voice of the user during the call; or, the user sets the start time period of the recording function of the electronic device 100 according to the actual situation of the user, for example, the start time period of the recording function set by the user is 19:00-21:00 every day, the electronic device 100 starts the recording function at 19:00 every day, and closes the recording function at 21:00 every day, and the mode of setting the start time period of the recording function by the user can avoid the problem that the recording function is continuously started to cause high energy consumption of the electronic device 100; or, the electronic device 100 starts the recording function every first predetermined time period and closes the recording function after the second predetermined time period, so that the recording function does not need to be continuously started, and the energy consumption of the electronic device 100 is low.
Referring to fig. 9, after the sound and electricity element 60 on the electronic device 100 collects the voice of the user, the processor 20 first identifies the voice to obtain text information, and then controls the memory 30 to store the identified text information. The process of forming the text information by voice recognition specifically comprises the following steps: first, the processor 20 cuts off the silence from the beginning to the end of the speech to reduce interference with subsequent processing. Processor 20 then divides the speech into multiple frames using a moving window function. Then, the processor 20 performs acoustic feature extraction on each frame of speech, matches the acoustic features of each frame of speech with the acoustic model to determine the state of each frame of speech, combines phonemes based on the states of multiple frames of speech, and finally combines the phonemes into words, thereby realizing the conversion from speech to text information.
After the processor 20 converts the speech into the text information, the text information may be further decomposed by a word segmentation technique, such as a forward maximum matching method, a reverse maximum matching method, a least segmentation method, a bidirectional maximum matching method, etc., to obtain a plurality of library vocabularies. Then, the processor 20 disassembles the speech into a plurality of user speech segments corresponding to the library vocabularies one by one according to the corresponding relationship between the library vocabularies and the speech. Finally, the user voice segments are stored in the directory of the audio library in the memory 30, and the file name of each user voice segment is the library vocabulary corresponding to the user voice segment.
For example, the processor 20 in the electronic device 100 first controls the acousto-electric element 60 to capture the "today's weather is really good" voice uttered by the user. Subsequently, the processor 20 converts the speech of "true good weather today" into the text message of "true good weather today" through speech recognition. Subsequently, the processor 20 disassembles the text information of "today's weather is really good" into a plurality of library vocabularies, such as three library vocabularies of "today", "weather", "really good", based on the word segmentation technique. Subsequently, the processor 20 correspondingly disassembles the speech according to the disassembled library vocabulary to obtain a plurality of user speech segments, for example, the library vocabulary "today" corresponds to the user speech segment "today", the library vocabulary "weather" corresponds to the user speech segment "weather", and the library vocabulary "true" corresponds to the user speech segment "true". Finally, the memory 30 stores three user speech segments "today", "weather", "true" in the audio library, each user speech segment having a file name of the library vocabulary corresponding to the user speech segment. For example, the storage file name of the user voice clip "today" is "today. mp 3", the storage file name of the user voice clip "weather" is "weather. mp 3", and the storage file name of the user voice clip "true good" is "true good. mp 3". The user voice fragment takes the library vocabulary corresponding to the user voice fragment as the file name, so that the user voice fragment can be conveniently searched in the subsequent steps directly according to the file name.
Thus, the electronic device 100 enriches the user voice segments in the audio library by repeatedly collecting, recognizing and disassembling the user voice, and the enriched user voice segments are beneficial to improving the integrity of the video dubbing.
Referring to fig. 10, in some embodiments, the step S14 of processing the subtitle file associated with the video file to read the personalized voice file corresponding to the current subtitle vocabulary from the audio library includes:
s141: extracting a current subtitle in the subtitle file;
s142: splitting a current subtitle to obtain a plurality of current subtitle vocabularies and playing time points corresponding to the current subtitle vocabularies; and
s143: searching a library vocabulary matched with the current caption vocabulary in an audio library to obtain a user voice fragment corresponding to the library vocabulary;
step S18, playing the user voice fragment in the personalized voice file corresponding to the current subtitle word includes:
s181: and playing the user voice clip at the playing time point.
Referring to fig. 3 and fig. 11, in some embodiments, the first processing module 14 includes an extracting unit 141, a splitting unit 142, and a searching unit 143. Step S141 may be implemented by the extraction unit 141. Step S142 may be implemented by the splitting unit 142. Step S143 may be implemented by the lookup unit 143. Step S181 may be implemented by the second play module 18.
That is, the extraction unit 141 may be used to extract the current subtitle in the subtitle file. The splitting unit 142 may be configured to split the current subtitle to obtain a plurality of current subtitle vocabularies and a playing time point corresponding to the current subtitle vocabulary. The searching unit 143 may be configured to search the audio library for a library vocabulary that matches the current subtitle vocabulary to obtain a user speech segment corresponding to the library vocabulary. The second playing module 18 may be further configured to play the user voice clip at a playing time point.
The extracting unit 141, the splitting unit 142, and the searching unit 143 may be programs stored in the memory 30 and capable of implementing the functions indicated in step S141, step S142, and step S143, respectively, and the processor 20 may execute the programs to complete step S141, step S142, and step S143.
Specifically, each subtitle file in a video generally contains text information corresponding to audio and a play time point corresponding to the text information. And when the video is played, synchronizing the time stamp of the audio and the playing time point of the caption vocabulary based on a reference clock so as to realize the synchronous playing of the audio file and the caption file. In order to play the user voice fragment in the audio library as a personalized voice file, the processor 20 first extracts a plurality of current subtitles in the subtitle file, where the current subtitles include specific contents and play time points of the subtitles. The processor 20 splits the current caption into a plurality of current caption vocabularies and a plurality of playing time points matched with the current caption vocabularies one by one.
Taking the subtitle file format as the SRT format as an example, the current subtitle form corresponding to a certain line, for example, the line of "trouble you fish ball rough surface" in "the wheat dunker story" is specifically: "00: 00:00,000- - >00:00:04,400 trouble you fish ball rough surface". Wherein, the '00: 00:00,000- >00:00:04,400' is the playing time point, and the 'troublesome fish ball rough surface' is the concrete character content in the current caption corresponding to the playing time point '00: 00:00,000- >00:00:04,400'. The processor 20 splits the current subtitle based on the word segmentation technique to obtain a plurality of current subtitle words. Specifically, the processor 20 splits the current subtitle "troubled you fish ball rough surface" to obtain a plurality of current subtitle words and a plurality of playing time points that are matched with the current subtitle words one by one, which are: "trouble-00: 00:00,000- - >00:00:01,000", "you-00: 00:01,000- - >00:00:02,000", "fish ball-00: 00:02,000- - >00:00:03,000", "rough surface-00: 00:03,000- - >00:00:04,400".
During the playing of the video file, the processor 20 searches the audio library for a user voice segment matching each current subtitle word, for example, at 00:00:00,000, searches for a user voice segment "troublesome." mp3 "corresponding to the current subtitle word" troublesome "and plays the user voice segment" troublesome. mp3 ", at 00:00:01,000, searches for a user voice segment" fish ball. "mp 3" corresponding to the current subtitle word "you." and plays the user voice segment "you. 3", at 00:00:02,000, searches for a user voice segment "fish ball. mp 3" corresponding to the current subtitle word "fish ball". mp. 3 ", and at 00:03,000, searches for a user voice segment" rough. mp3 "corresponding to the current subtitle word" rough. "and plays the user voice segment" rough. mp. 3 ".
Therefore, the voice dubbed by the user can be played when the video is played, the personalized dubbing of the user is realized, and the user experience is better.
In some implementations, the subtitle files may be hard subtitles or soft subtitles. The hard captions are also called 'embedded captions', namely a plurality of current captions are compressed in the same group of data with the multi-frame playing image in the video file, and like watermarks, the current captions cannot be separated. The soft captions are also called as plug-in captions, namely, caption files and video files are separated into two parts of data and then encapsulated into one video. The above-mentioned subtitle file in SRT format is a soft subtitle. When the caption file is a soft caption, the playing time points of the current caption and the corresponding current caption can be directly extracted. However, when the subtitle file is a hard subtitle, neither the current subtitle nor the playing time point corresponding to the current subtitle can be directly extracted. At this time, the processor 20 may implement the extraction of the current subtitle by recognizing text in the playing image of the video file. It can be understood that, under the hard caption, the playing image directly contains the information of the current caption, that is, the current caption is directly embedded into the playing image, at this time, the region where the current caption is located in the playing image can be extracted from the playing image, and the text in the region is identified, so as to extract the current caption. Further, since each playing image has a corresponding time stamp, the processor 20 may calculate a playing time point of the current subtitle based on the time stamps of one or more playing images corresponding to the current subtitle. Therefore, the current subtitle and the playing time point corresponding to the current subtitle can be extracted when the subtitle file is a hard subtitle.
The video dubbing method provided by the embodiment of the invention has the subtitle processing capacity of the soft subtitle and the hard subtitle, and can realize the personalized dubbing of the user no matter whether the subtitle file in the video is the soft subtitle or the hard subtitle, so that the user experience is better.
Referring to fig. 12 and 13 together, in some embodiments, the video dubbing method further includes, after the step of playing the user voice clip in the personalized voice file corresponding to the current subtitle vocabulary in step S18:
s19: and determining the playing volume of the user voice clip corresponding to the mouth shape according to the mouth shape of the person in the video file.
Wherein, step S19 further includes:
s191: selecting a current playing image associated with the user voice clip from the multi-frame playing images according to the playing time point;
s192: identifying the mouth shape of a person in the currently played image;
s193: calculating the actual aspect ratio of the mouth shape according to the width and the height of the mouth shape;
s194: calculating a volume amplification factor according to the actual aspect ratio and the preset aspect ratio; and
s195: and determining the playing volume of the user voice segment corresponding to the mouth shape of the figure in the currently played image according to the volume magnification factor.
Referring to fig. 3, 14 and 15, in some embodiments, the video dubbing apparatus 10 further includes a volume determining module 19. The volume determination module 19 includes a selection unit 191, a second recognition unit 192, a first calculation unit 193, a second calculation unit 194, and a volume determination unit 195. Step S19 may be implemented by the volume determination module 19. Step S191 may be implemented by the selecting unit 191. Step S192 may be implemented by the second identifying unit 192. Step S193 may be implemented by the first calculation unit 193. Step S194 may be implemented by the second calculation unit 194. Step S195 may be implemented by the volume determination unit 195.
That is, the volume determining module 19 may be configured to determine the playing volume of the user voice segment corresponding to the mouth shape according to the mouth shape of the person in the video file. The selecting unit 191 may be configured to select a currently playing image associated with the user voice clip from the multiple playing images according to the playing time point. The second recognition unit 192 may be used to recognize the mouth shape of the person in the currently playing image. The first calculation unit 193 may be used to calculate the actual aspect ratio of the die from the width and height of the die. The second calculating unit 194 is configured to calculate the volume amplification factor according to the actual aspect ratio and the preset aspect ratio. The volume determination unit 195 may be configured to determine a playback volume of the user voice clip corresponding to the mouth shape of the person in the currently played image according to the volume magnification.
The volume determining module 19 may be a program stored in the memory 30 and capable of implementing the function indicated in step S19, and the processor 20 executes the program to complete step S19. The selecting unit 191, the second identifying unit 192, the first calculating unit 193, the second calculating unit 194, and the volume determining unit 195 may be programs stored in the memory 30 that can respectively implement the functions indicated in step S191, step S192, step S193, step S194, and step S195, and the processor 20 may complete step S191 to step S195 by executing the programs.
In particular, dubbing generally needs to take into account the emotional fluctuation of the character, and the typical manifestation of the emotional fluctuation in dubbing is the volume change of the audio. Therefore, when playing the personalized voice file dubbed by the user, the playing volume of the user voice clip in each personalized voice file also needs to be changed correspondingly according to the emotional fluctuation of the character. Wherein the required playback volume for each user voice clip can be determined by identifying the aspect ratio of the person's mouth shape in the currently playing image of the video file associated with each user voice clip. Specifically, each user voice clip has a playing time point, and the playing image also corresponds to the playing time point. Therefore, for the playing time point of each user voice segment, the processor 20 first finds out one or more frames of playing images corresponding to the playing time point from the multiple frames of playing images based on the playing time point, where the one or more frames of playing images are the currently playing images associated with the user voice segment at the playing time point. Subsequently, the processor 20 identifies the person in each currently played image according to a face recognition algorithm, and further identifies the mouth shape of the person. The processor 20 then calculates the actual aspect ratio (i.e., width to height ratio) of the die from the width and height of the identified die. When calculating the actual aspect ratio of the mouth shape, if there are multiple frames of the played image, the processor 20 may identify multiple mouth shapes, and at this time, may take the median, average or maximum of the widths of the multiple mouth shapes as the final width, and correspondingly take the median, average or maximum of the heights of the multiple mouth shapes as the final height (i.e., when the width takes the median, the height also corresponds to the median; when the width takes the average, the height also corresponds to the average; when the width takes the maximum, the height also corresponds to the maximum); alternatively, the processor 20 first calculates the actual aspect ratio of the mouth shape in each frame of the played image, and then takes the median, average, or maximum value of the plurality of calculated actual aspect ratios as the final actual aspect ratio. The processor 20 then calculates the ratio of the actual aspect ratio (the ratio of the actual width to the actual height of the die) to the preset aspect ratio, which is the volume magnification. Thus, the processor 20 may calculate a plurality of volume magnifications, each volume magnification being calculated from a mouth shape of a person in an associated one or more currently played images associated with a user voice clip, and the processor 20 may calculate a playing volume of the user voice clip associated with the one or more currently played images according to the volume magnification corresponding to the one or more currently played images, that is, the playing volume is equal to a default playing volume × volume magnification.
In addition, if there may be multiple persons in one currently playing image, the processor 20 may obtain actual aspect ratios of multiple mouth shapes after processing the one currently playing image. At this time, the processor 20 first eliminates the actual aspect ratio of the mouth shape corresponding to the person who is not open in the currently played image, specifically, for example, a detection threshold is set, and when the actual aspect ratio of the mouth shape is smaller than the detection threshold, it indicates that the person corresponding to the mouth shape is not open and speaking, the actual aspect ratio of the mouth shape corresponding to the person is eliminated. After the actual aspect ratio of the mouth shape corresponding to the person who is not opened in the currently played image is removed, if only one actual aspect ratio value is left, the actual aspect ratio is directly used for calculating the volume amplification factor. After the actual aspect ratios of the mouth shapes corresponding to persons not opening in the played image are removed, if a plurality of actual aspect ratio values remain, the volume magnification is calculated by taking the median, average or maximum value of the actual aspect ratios of the mouth shapes. Therefore, the accuracy of the volume amplification factor obtained by calculation can be improved.
Referring to fig. 16, in some embodiments, before step S12, the video dubbing method further includes:
s113: determining a target dubbing role according to the input of a user;
the step S14 of processing the subtitle file associated with the video file to read the personalized voice file corresponding to the current subtitle vocabulary from the audio library further comprises:
s144: processing the original audio to divide the voice information of the target dubbing role and the voice information of the non-target dubbing role;
the step S141 of extracting the current subtitle in the subtitle file includes:
s1411: extracting a target subtitle corresponding to the voice information of the target dubbing role from the subtitle file;
s1412: and extracting the current subtitle in the target subtitle.
Referring to fig. 3 and 17 together, in some embodiments, the video dubbing apparatus 10 further includes a character determining module 113. The first processing module 14 further comprises a dividing unit 144. Step S113 may be implemented by the role determination module 113. Step S144 may be implemented by the dividing unit 144. Both step S1411 and step S1412 may be realized by the extraction unit 141.
That is, the role determination module 113 may be configured to determine a target dubbing role based on user input. Dividing unit 144 may be used to process the original audio to divide the speech information for the target dubbing character from the speech information for non-target dubbing characters. The extracting unit 141 may be configured to extract a target subtitle corresponding to the voice information of the target dubbing character from the subtitle file and extract a current subtitle among the target subtitles.
The role determination module 113 and the dividing unit 144 may be programs stored in the memory 30 and capable of implementing the functions indicated in step S113 and step S144, respectively, and the processor 20 executes the programs to complete step S113 and step S144. The extracting unit 141 may also be a program stored in the memory 30 and capable of implementing the functions indicated in step S1411 and step S1412, and the processor 20 executes the program to complete step S1411 and step S1412.
It will be appreciated that in some cases, the user may only want to dub one or a few characters in the video. At this time, the user may first set the target dubbing character, and the processor 20 recognizes the voice information of the target dubbing character from the original audio by voiceprint recognition, and classifies the voice information of the original audio excluding the voice information of the target dubbing character as the voice information of the non-target dubbing character. Then, the processor 20 filters the target subtitles corresponding to the voice information of the target dubbing character from the subtitle file, and divides the target subtitles into a plurality of current subtitles according to the playing time point. Then, the processor 20 splits the current subtitle to obtain a plurality of current subtitle vocabularies and playing time points corresponding to the plurality of current subtitle vocabularies. The processor 20 then looks up the user's speech segments in the audio library that match each current caption vocabulary as a personalized speech file. When the video is played, the played voice of the target dubbing character is a user voice segment, and the played voice of the non-target dubbing character is the original audio. Therefore, the user can selectively dub one or more roles in the video, the video dubbing interest is enhanced, and the use experience of the user is further improved.
In the video dubbing method according to any of the above embodiments, the video dubbed by the user may be automatically stored in the memory 30 by the electronic apparatus 100 or manually stored in the memory 30 by the user. The stored video can simultaneously contain the original audio and the personalized audio formed by combining a plurality of personalized voice files, or the user selects to store any one of the original audio and the personalized audio. When the video contains the original audio and the personalized audio at the same time, the dubbing action is not required to be executed in the playing process when the video is played subsequently, and the original audio or the personalized audio can be played directly according to the selection of the user. If the user selects to play the original audio, the personalized audio is muted; if the user selects to play the personalized audio, the original audio is muted; and if the user does not select, playing the personalized audio by default.
Referring to fig. 3, the present invention further provides an electronic device 100. The electronic device 100 includes one or more processors 20, memory 30, and one or more programs. Where the one or more programs are stored in the memory 30 and configured to be executed by the one or more processors 20. The program comprises instructions for carrying out the video dubbing method of any of the embodiments described above.
For example, referring to fig. 1, the program includes instructions for performing the steps of:
s12: playing the video file;
s14: processing the caption file associated with the video file to read an individualized voice file corresponding to the current caption vocabulary from an audio library, wherein the audio library comprises at least one individualized voice file, and the individualized voice file comprises library vocabularies and user voice fragments corresponding to the library vocabularies;
s16: performing mute processing on the original audio corresponding to the related video file and the current caption vocabulary; and
s18: and playing the user voice fragment in the personalized voice file corresponding to the current caption vocabulary.
For another example, referring to fig. 5, the program further includes instructions for performing the following steps:
s1121: speech recognition speech to obtain text information;
s1122: disassembling the text information to obtain a plurality of library vocabularies;
s1123: decomposing the voice according to the plurality of library vocabularies to obtain a plurality of user voice fragments respectively corresponding to the plurality of library vocabularies; and
s1124: and storing each user voice fragment in an audio library, wherein the file name of each user voice fragment is a library vocabulary corresponding to the user voice fragment.
Referring to fig. 18, the present invention further provides a computer readable storage medium 200. The computer readable storage medium 200 includes a computer program for use in conjunction with the electronic device 100. The computer program is executable by the processor 20 to perform the video dubbing method according to any of the above embodiments.
For example, referring to fig. 1, the computer program may be executed by the processor 20 to perform the following steps:
s12: playing the video file;
s14: processing the subtitle file associated with the video file to read an individualized voice file corresponding to the current subtitle vocabulary from an audio library, wherein the audio library comprises at least one individualized voice file, and the individualized voice file comprises library vocabularies and user voice fragments corresponding to the library vocabularies;
s16: performing mute processing on the original audio corresponding to the related video file and the current caption vocabulary; and
s18: and playing the user voice fragment in the personalized voice file corresponding to the current caption vocabulary.
For another example, referring to fig. 5, the computer program may be further executable by the processor 20 to perform the following steps:
s1121: speech recognition speech to obtain text information;
s1122: disassembling the text information to obtain a plurality of library vocabularies;
s1123: decomposing the voice according to the plurality of library vocabularies to obtain a plurality of user voice fragments respectively corresponding to the plurality of library vocabularies; and
s1124: and storing each user voice fragment in an audio library, wherein the file name of each user voice fragment is a library vocabulary corresponding to the user voice fragment.
The video method described in any of the above embodiments refers to dubbing a video by using a user voice segment during the process of playing the video by a user. In some embodiments, dubbing the video by using the user voice segment may also be performed by a background silence process, that is, the user inputs a video dubbing instruction on the electronic apparatus 100, and the processor 20 performs a video dubbing operation, but during the dubbing process, the electronic apparatus 100 does not play the video for the user to watch. After the dubbing is complete, the processor 20 generates a video (i.e., a personalized video described below) to be dubbed by the user and controls the display screen 40 or electro-acoustic element 50 to prompt the user that the dubbing is complete. At this time, when the user clicks to play the personalized video, the display 40 of the electronic device 100 plays the video file and the subtitle file, and the electroacoustic element 50 of the electronic device 100 plays the audio dubbed by the user (i.e., the personalized audio described below).
Therefore, referring to fig. 3 and fig. 19 together, the present invention further provides a video dubbing method applicable to the electronic apparatus 100. The video dubbing method comprises the following steps:
s23: reading a video and audio library, wherein the video comprises a video file, a subtitle file and original audio, and the audio library comprises library vocabularies and user voice segments corresponding to the library vocabularies;
s24: searching library vocabularies matched with the subtitle files in an audio library so as to generate personalized audio and synchronous associated information of the subtitle files and the personalized audio by using user voice fragments corresponding to the library vocabularies; and
s25: associating the video file, the subtitle file and the personalized audio according to the synchronous associated information to form a personalized video; and
s26: and playing the personalized audio when playing the personalized video.
Referring to fig. 3 and 20, the present invention provides a video dubbing apparatus 20. The video dubbing apparatus 20 is used for the electronic apparatus 100. The video dubbing method according to the embodiment of the present invention can be implemented by the video dubbing apparatus 20. The video dubbing apparatus 20 comprises a reading module 23, a matching module 24, an association module 25 and a playing module 26. Step S23 may be implemented by the reading module 23. Step S24 may be implemented by the matching module 24. Step S25 may be implemented by the association module 25. Step S26 may be implemented by the play module 26.
That is, the reading module 23 may be configured to read a video library and an audio library, wherein the video library includes a video file, a subtitle file, and original audio, and the audio library includes library vocabularies and user voice segments corresponding to the library vocabularies. The matching module 24 may be configured to search the audio library for a library vocabulary matching the subtitle file, so as to generate a personalized audio and associated information of the subtitle file and the personalized audio by using the user speech segment corresponding to the library vocabulary. The association module 25 may be configured to associate the video file, the subtitle file, and the personalized audio to form a personalized video according to the synchronization association information. The play module 26 may be used to play the personalized audio while playing the personalized video.
Among them, the reading module 23, the matching module 24 and the associating module 25 may be programs stored in the memory 30 that can implement the functions indicated by step S23, step S24 and step S25, respectively. Execution of the program by the processor may complete steps S23 through S25. The playing module 26 may be an electroacoustic element 50 of the electronic device 100 for playing personalized audio.
In the video dubbing method according to the embodiment of the present invention, the electronic device 100 collects various voices of the user in daily life and stores the voices in the memory 30, recognizes character information corresponding to the voices, splits the character information to form a plurality of library vocabularies, and splits the voices to form a plurality of user voice fragments based on the library vocabularies, thereby establishing an audio library in which the library vocabularies correspond to the user voice fragments. When a user inputs an instruction for dubbing the video on the electronic device 100, the electronic device 100 finds out a user voice segment corresponding to the text of the subtitle file in the audio library by identifying the text corresponding to the subtitle file in the video, so that a new personalized audio is formed according to the plurality of user voice segments, and the video file, the subtitle file and the personalized audio can form a personalized video, namely the video dubbed by the user.
The synchronous associated information of the subtitle file and the personalized audio is timestamp information carried in the personalized audio. And the process of synchronously associating the video file, the subtitle file and the personalized audio to form the personalized video according to the synchronous association information is the packaging process of packaging the video file, the subtitle file and the personalized audio to obtain the personalized video piece. In the packaging process of the personalized video, the electronic device 100 may cross-store the video file and the personalized audio by means of the synchronization association information, i.e., the timestamp information of the personalized audio. The personalized video may be packaged into different formats, e.g., TS format, MKV format, MOV format, etc. Different formats have different file structures, and the format of the personalized video can be selected by the user.
The video dubbing method of the embodiment of the invention can dub the video based on the voice clip of the user to obtain the personalized video and play the personalized video dubbed by the user, thereby strengthening the interaction between the electronic device 100 and the user during video playing and improving the interest of video file playing.
Referring to fig. 21 and fig. 22 together, in some embodiments, the audio library may be obtained by the following steps, that is, the video dubbing method according to the embodiment of the present invention further includes, before reading the video and audio libraries at step S23:
s21: collecting voice input by a user by using the electronic device 100; and
s22: speech is recognized to obtain a plurality of library words and a plurality of user speech segments.
Wherein, step S22 further includes:
s221: speech recognition speech to obtain text information;
s222: disassembling the text information to obtain a plurality of library vocabularies;
s223: decomposing the voice according to the plurality of library vocabularies to obtain a plurality of user voice fragments respectively corresponding to the plurality of library vocabularies; and
s224: and storing each user voice fragment in an audio library, wherein the file name of each user voice fragment is a library vocabulary corresponding to the user voice fragment.
Referring to fig. 23 and 24, in some embodiments, the video dubbing apparatus 20 further includes a capture module 21 and an identification module 22. Step S21 may be implemented by acquisition module 21. Step S22 may be implemented by the identification module 22. The identification module 22 comprises a first identification unit 221, a first dismantling unit 222, a second dismantling unit 223 and a storage unit 224. Step S221 may be implemented by the first identifying unit 221. The step S222 can be realized by the first disassembling unit 222. Step S223 may be implemented by the second dismantling unit 223. Step S224 may be implemented by the storage unit 224.
That is, the collecting module 21 can be used for collecting the voice recorded by the user using the electronic device 100. The recognition module 22 may be used to speech recognize speech to obtain a plurality of library words and a plurality of user speech segments. The first recognition unit 221 may be used for voice recognition of voice to obtain text information. The first parsing unit 222 can be used for parsing the text message to obtain a plurality of library vocabularies. The second parsing unit 223 is configured to parse the speech according to the plurality of library vocabularies to obtain a plurality of user speech segments respectively corresponding to the plurality of library vocabularies. The storage unit 224 may be used to store each user speech segment in the audio library, with the filename of each user speech segment being the library vocabulary corresponding to the user speech segment.
The acquisition module 21 may be an acoustic-electric element, such as a microphone, disposed on the electronic device 100. The recognition module 22 may be a program stored in the memory 30 to implement the functions indicated by step S22, and the processor 20 executes the program to complete step S22. The first identifying unit 221, the first disassembling unit 222, the second disassembling unit 223, and the storage unit 224 may be subroutines stored in the memory 30 under a program corresponding to the identifying module 22. The processor 20 executing the subroutine may implement steps S221 to S224. The storage unit 224 may be the memory 30 in the electronic device 100.
The way of acquiring various voices in the daily life of the user by the electronic device 100 is the same as the acquisition way in the video dubbing method for dubbing in the video playing process, and the details are not repeated herein.
The way of recognizing the collected voice to obtain the library vocabulary and the user voice segment by the processor 20 is the same as the recognition way in the video dubbing method for dubbing in the video playing process, which is not described herein again.
Thus, the electronic device 100 enriches the user voice segments in the audio library by repeatedly collecting, recognizing and disassembling the user voice, and the enriched user voice segments are beneficial to improving the integrity of the video dubbing.
Referring to fig. 25, in some embodiments, the step S24 searching the audio library for a library vocabulary matching the subtitle file to generate the personalized audio using the user speech segment corresponding to the library vocabulary, and the synchronization relationship information between the subtitle file and the personalized audio includes:
s241: extracting a plurality of subtitle fragments in a subtitle file;
s242: splitting each subtitle fragment to obtain a plurality of subtitle vocabularies and a plurality of playing time points corresponding to the subtitle vocabularies respectively;
s243: searching a plurality of user voice fragments matched with a plurality of caption vocabularies in an audio library;
s244: and combining a plurality of user voice fragments corresponding to the plurality of caption vocabularies according to the playing time point sequence of the plurality of caption vocabularies to form personalized audio.
Referring to fig. 26, in some embodiments, the matching module 24 includes an extracting unit 241, a splitting unit 242, a matching unit 243, and a combining unit 244. Step S241 may be implemented by the extracting unit 241. Step S242 may be implemented by the splitting unit 242. Step S243 may be implemented by the matching unit 243. Step S244 may be implemented by the combining unit 244. That is, the extracting unit 241 may be used to extract a plurality of subtitle segments in a subtitle file. The splitting unit 242 may be configured to split each subtitle fragment to obtain a plurality of subtitle vocabularies and a plurality of playing time points corresponding to the plurality of subtitle vocabularies, respectively. The matching unit 243 may be configured to search the audio library for a plurality of user speech segments matching the plurality of caption words. The combining unit 244 may be configured to combine a plurality of user voice fragments corresponding to a plurality of subtitle vocabularies according to the playing time point sequences of the plurality of subtitle vocabularies to form personalized audio.
The extracting unit 241, the splitting unit 242, the matching unit 243, and the combining unit 244 may be programs that are stored in the memory 30 and can respectively implement the functions indicated by step S241, step S242, step S243, and step S244, and the processor 20 may execute the programs to complete step S241 to step S244.
Specifically, to form a personalized audio according to the user voice segments in the audio library, the processor 20 first extracts a plurality of subtitle segments in the subtitle file and playing time points corresponding to the plurality of subtitle segments. Taking the subtitle file format as an SRT format as an example, the subtitle file corresponding to a certain line, for example, the line of "trouble you fish ball rough surface" in "the wheat dunk story" is specifically in the form: "00: 00:00,000- - >00:00:04,400 trouble you fish ball rough surface". Wherein, the '00: 00:00,000- >00:00:04,400' is the playing time point, and the 'troublesome you fish ball rough surface' is the subtitle fragment corresponding to the playing time point '00: 00:00,000- >00:00:04,400'. Thus, the subtitle segments and the playing time points corresponding to the subtitle segments can be extracted through the subtitle files.
Then, the processor 20 splits the subtitle fragments based on the word segmentation technique to obtain a plurality of subtitle words and playing time points corresponding to the plurality of subtitle words. Continuing to take the video of the "wheat bag story" as an example, after the processor 20 extracts the subtitle segment "trouble you fish ball rough surface", the processor 20 splits the subtitle segment "trouble you fish ball rough surface" to obtain the following subtitle words and playing time points matched with the subtitle words one by one: "trouble-00: 00:00,000- >00:00:01,000", "you-00: 00:01,000- >00:00:02,000", "fish ball-00: 00:02,000- >00:00:03,000", "rough surface-00: 00:03,000- >00:00:04,400".
The processor 20 then searches the audio library for a user speech segment that matches each of the subtitle words, e.g., finds the user speech segment "trouble" corresponding to the subtitle word "trouble". mp3 ", finds the user speech segment" you. mp3 "corresponding to the subtitle word" you ", finds the user speech segment" fish ball. mp3 "corresponding to the subtitle word" fish ball ", and finds the user speech segment" rough surface. mp3 "corresponding to the subtitle word" rough surface ". Then, the processor 20 combines the plurality of user voice segments according to the sequence of the playing time points, namely, the personalized audio of the 'troublesome you fish ball group noodle' is obtained. When the number of the caption segments is multiple, the processor 20 combines a plurality of user voice segments corresponding to the plurality of caption segments according to the playing time point sequence to form a complete personalized audio.
Thus, the personalized audio dubbed by the user can be formed.
In the video dubbing method according to the embodiment of the present invention, the subtitle file may also be a hard subtitle or a soft subtitle, which is not limited herein.
Referring to fig. 27, in some embodiments, the video dubbing method according to the embodiments of the present invention further includes, after playing the personalized audio while playing the personalized video in step S26:
s27: and when the personalized video is played, the original audio is subjected to mute processing.
Referring to fig. 28, in some embodiments, the video dubbing apparatus 20 further includes a volume adjustment module 27. Step S27 may be implemented by the volume adjustment module 27. That is, the volume adjustment module 27 may be used to mute the original audio while the personalized video is being played. The volume adjusting module 27 may be a program stored in the memory 30 and capable of implementing the function indicated in step S27, and the processor 20 executes the program to complete step S27.
Specifically, the processor 20 may directly package the video file, the subtitle file, and the personalized audio into the personalized video, or may package the video file, the subtitle file, the personalized audio, and the original audio together into the personalized video. When the personalized video is played and the personalized video simultaneously comprises the personalized audio and the original audio, the following two playing modes can be provided: (1) when the user does not select the audio, the electronic device 100 defaults to play the personalized audio, and at the moment, the electronic device 100 directly mutes the original audio and plays the personalized audio; (2) playing the personalized video based on the type of the audio selected by the user, and when the user selects to play the original audio, muting the personalized audio by the electronic device 100 and playing the original audio; when the user selects to play the personalized audio, the electronic device 100 mutes the original audio and plays the personalized audio.
Therefore, multiple playing modes are provided for the user, and interesting experience of the user is improved.
Referring to fig. 29 and 30 together, in some embodiments, the video dubbing method according to the embodiments of the present invention further includes:
s28: and determining the playing volume of the user voice clip corresponding to the mouth shape according to the mouth shape of the person in the video file.
Further, step S28 includes:
s281: selecting a playing image associated with the user voice clip from the multi-frame playing images according to the playing time point of the user voice clip;
s282: recognizing the mouth shape of a person in the played image;
s283: calculating the actual aspect ratio of the mouth shape according to the width and the height of the mouth shape;
s284: calculating a volume amplification factor according to the actual aspect ratio and the preset aspect ratio; and
s285: and determining the playing volume of the user voice clip corresponding to the mouth shape of the person in the playing image according to the volume magnification factor.
Referring to fig. 31 and 32 together, in some embodiments, the video dubbing apparatus 20 further includes a volume determining module 28. The volume determination module 28 includes an association unit 281, a second recognition unit 282, a first calculation unit 283, a second calculation unit 284, and a volume determination unit 285. Step S28 may be implemented by the volume determination module 28. Step S281 may be implemented by the associating unit 281. Step S282 may be implemented by the second identification unit 282. Step S283 may be implemented by the first calculation unit 283. Step S284 may be implemented by the second calculation unit 284. Step S285 may be implemented by the volume determination unit 285.
That is, the volume determination module 28 may be configured to determine the playing volume of the voice clip of the user corresponding to the mouth shape according to the mouth shape of the person in the video file. The associating unit 281 may be configured to select a playing image associated with the user voice clip from the plurality of playing images according to a playing time point of the user voice clip. The second recognition unit 282 may be used to recognize the mouth shape of the person in the playing image. The first calculation unit 283 may be used to calculate the actual aspect ratio of the die from the width and height of the die. The second calculating unit 284 may be configured to calculate the volume amplification factor according to the actual aspect ratio and the preset aspect ratio. The volume determination unit 285 may be configured to determine a playing volume of the user voice clip corresponding to the mouth shape of the person in the playing image according to the volume magnification.
The volume determining module 28 may be a program stored in the memory 30 and capable of implementing the function indicated in step S28, and the processor 20 executes the program to complete step S28. The associating unit 281, the second identifying unit 282, the first calculating unit 283, the second calculating unit 284 and the volume determining unit 285 may be programs stored in the memory 30 that can implement the functions indicated by step S281, step S282, step S283, step S284 and step S285, respectively, and the processor 20 may perform the programs to complete steps S281 to S285.
Specifically, the processor 20 first finds out one or more frames of playing images at the playing time point based on the playing time point of each user voice clip, where the one or more frames of playing images are the playing images associated with the user voice clip at the playing time point. Subsequently, the processor 20 identifies the mouth shape of the person in the playing image, calculates the actual aspect ratio of the mouth shape according to the width and height of the mouth shape, calculates the volume amplification factor based on the actual aspect ratio and the preset aspect ratio, and finally determines the playing volume of each user voice clip based on the volume amplification factor. The above-mentioned calculation process of the playing volume is consistent with the above-mentioned calculation mode in the video dubbing method for executing dubbing operation in the video playing process, and is not described herein again.
Referring to fig. 33, in some embodiments, before the step S23 of reading the video and audio libraries, the video dubbing method further includes:
s29: determining a target dubbing role according to the input of a user;
step S24 is to search the library vocabulary matching the subtitle file in the audio library to generate the personalized audio by using the user speech segment corresponding to the library vocabulary, and the synchronization association information between the subtitle file and the personalized audio further includes:
s245: processing the original audio to divide the voice information of the target dubbing role and the voice information of the non-target dubbing role;
the step S241 of extracting a plurality of subtitle segments in the subtitle file includes:
s2411: extracting a target subtitle corresponding to the voice information of the target dubbing role from the subtitle file;
s2412: and extracting caption segments in the target caption.
Referring to fig. 34, in some embodiments, the video dubbing apparatus further comprises a role determination module 29. The matching module 24 further comprises a dividing unit 245. Step S29 may be implemented by the role determination module 29. Step S245 may be implemented by the dividing unit 245. The steps S2411 and S2412 may be implemented by the extracting unit 241.
That is, the character determination module 29 may be configured to determine a target dubbing character based on user input. The dividing unit 245 may be configured to process the original audio to divide the voice information of the target dubbing character and the voice information of the non-target dubbing character. The extracting unit 241 may be configured to extract a target subtitle corresponding to the voice information of the target dubbing character from the subtitle file and extract a subtitle segment in the target subtitle.
Specifically, when the user wants to dub only one or some characters in the video, the user may first set a target dubbing character, and the processor 20 recognizes the voice information of the target dubbing character from the original audio by voiceprint recognition, and classifies the voice information of the original audio excluding the target dubbing character as the voice information of the non-target dubbing character. Then, the processor 20 filters the target subtitle corresponding to the voice information of the target dubbing character from the subtitle file. Subsequently, the processor 20 finds out a plurality of user voice segments matching with a plurality of subtitle segments in the target subtitle in the audio library, and the specific implementation process is consistent with the implementation process in the video dubbing method for performing dubbing operation in the video playing process, which is described above and is not described herein again. Thus, the user dubbing audio of the target dubbing character can be obtained. The processor 20 then fuses the dubbing audio of the user in the target dubbing character with the original audio of the non-target dubbing character to obtain a personalized audio. When the personalized video is played, the played voice of the target dubbing role is a user voice fragment, and the played voice of the non-target dubbing role is an original audio. Therefore, the user can selectively dub one or more roles in the video, the video dubbing interest is enhanced, and the use experience of the user is further improved.
Referring to fig. 3, the present invention further provides an electronic device 100. The electronic device 100 includes one or more processors 20, memory 30, and one or more programs. Where the one or more programs are stored in the memory 30 and configured to be executed by the one or more processors 20. The program comprises instructions for carrying out the video dubbing method of any of the embodiments described above.
For example, referring to FIG. 19, the program includes instructions for performing the steps of:
s23: reading a video and audio library, wherein the video comprises a video file, a subtitle file and original audio, and the audio library comprises library vocabularies and user voice segments corresponding to the library vocabularies;
s24: searching library vocabularies matched with the subtitle files in an audio library so as to generate personalized audio and synchronous associated information of the subtitle files and the personalized audio by using the user voice fragments corresponding to the library vocabularies; and
s25: associating the video file, the subtitle file and the personalized audio according to the synchronous associated information to form a personalized video; and
s26: and playing the personalized audio when playing the personalized video.
For another example, referring to FIG. 22, the program includes instructions for performing the steps of:
s221: speech recognition speech to obtain text information;
s222: disassembling the text information to obtain a plurality of library vocabularies;
s223: decomposing the voice according to the plurality of library vocabularies to obtain a plurality of user voice fragments respectively corresponding to the plurality of library vocabularies; and
s224: and storing each user voice fragment in an audio library, wherein the file name of each user voice fragment is a library vocabulary corresponding to the user voice fragment.
Referring to fig. 18, the present invention further provides a computer readable storage medium. The computer readable storage medium includes a computer program for use in conjunction with the electronic device 100. The computer program is executable by the processor 20 to perform the video dubbing method according to any of the above embodiments.
For example, referring to fig. 19, the computer program may be executed by the processor 20 to perform the following steps:
s23: reading a video and audio library, wherein the video comprises a video file, a subtitle file and original audio, and the audio library comprises library vocabularies and user voice segments corresponding to the library vocabularies;
s24: searching library vocabularies matched with the subtitle files in an audio library so as to generate personalized audio and synchronous associated information of the subtitle files and the personalized audio by using user voice fragments corresponding to the library vocabularies; and
s25: associating the video file, the subtitle file and the personalized audio according to the synchronous associated information to form a personalized video; and
s26: and playing the personalized audio when playing the personalized video.
For another example, referring to fig. 22, the computer program may be executed by the processor 20 to perform the following steps:
s221: speech recognition speech to obtain text information;
s222: disassembling the text information to obtain a plurality of library vocabularies;
s223: decomposing the voice according to the plurality of library vocabularies to obtain a plurality of user voice fragments respectively corresponding to the plurality of library vocabularies; and
s224: and storing each user voice fragment in an audio library, wherein the file name of each user voice fragment is a library vocabulary corresponding to the user voice fragment.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (13)

1. A video dubbing method, comprising:
playing the video file;
processing the subtitle file associated with the video file to read a personalized voice file corresponding to the current subtitle vocabulary from an audio library, wherein the audio library comprises at least one personalized voice file, and the personalized voice file comprises library vocabularies and user voice fragments corresponding to the library vocabularies;
mute the original audio associated with the video file and corresponding to the current caption vocabulary;
playing the user voice fragment in the personalized voice file corresponding to the current caption vocabulary; and
and determining the playing volume of the user voice clip corresponding to the mouth shape according to the mouth shape of the person in the video file.
2. The video dubbing method of claim 1, wherein the step of processing the subtitle file associated with the video file to read a personalized speech file corresponding to a current subtitle vocabulary from an audio library comprises:
extracting the current subtitle in the subtitle file;
splitting the current caption to obtain a plurality of current caption vocabularies and playing time points corresponding to the current caption vocabularies; and
searching the library vocabulary matched with the current caption vocabulary in the audio library to obtain the user voice fragment corresponding to the library vocabulary;
the step of playing the user voice clip in the personalized voice file corresponding to the current caption vocabulary comprises:
and playing the user voice clip at the playing time point.
3. The video dubbing method according to claim 1, wherein the video dubbing method is applied to an electronic device, and the audio library is obtained by:
collecting voice input by a user by using the electronic device; and
and recognizing the voice to obtain the library vocabulary and the user voice fragment.
4. The video dubbing method of claim 3 wherein the step of speech recognizing the speech to obtain the library vocabulary and the user speech segments comprises:
recognizing the voice by voice to obtain character information;
disassembling the text information to obtain a plurality of library vocabularies;
decomposing the voice according to the plurality of library vocabularies to obtain a plurality of user voice fragments respectively corresponding to the plurality of library vocabularies; and
and storing each user voice fragment in the audio library, wherein the file name of each user voice fragment is the library vocabulary corresponding to the user voice fragment.
5. The video dubbing method of claim 1, wherein the video file comprises a plurality of frames of playing images, and the determining the playing volume of the user voice segment corresponding to the mouth shape according to the mouth shape of the person in the video file comprises:
selecting a current playing image associated with the user voice clip from the plurality of frames of playing images according to a playing time point;
identifying the mouth shape of a person in the currently played image;
calculating the actual aspect ratio of the die according to the width and the height of the die;
calculating a volume amplification factor according to the actual aspect ratio and a preset aspect ratio; and
and determining the playing volume of the user voice clip corresponding to the mouth shape of the figure in the current playing image according to the volume amplification factor.
6. A video dubbing method, comprising:
reading a video and audio library, wherein the video comprises a video file, a subtitle file and original audio, and the audio library comprises library vocabularies and user voice segments corresponding to the library vocabularies;
searching the library vocabulary matched with the subtitle file in the audio library so as to generate personalized audio and synchronous associated information of the subtitle file and the personalized audio by using the user voice fragment corresponding to the library vocabulary;
associating the video file, the subtitle file and the personalized audio according to the synchronous associated information to form a personalized video;
playing the personalized audio while playing the personalized video; and
and determining the playing volume of the user voice clip corresponding to the mouth shape according to the mouth shape of the person in the video file.
7. The video dubbing method of claim 6, wherein the video dubbing method is for an electronic device, and the audio library is obtained by:
collecting voice input by a user by using the electronic device; and
and recognizing the voice to obtain the library vocabulary and the user voice fragment.
8. The video dubbing method of claim 7 wherein the step of speech recognizing the speech to obtain the library vocabulary and the user speech segments comprises:
recognizing the voice by voice to obtain character information;
disassembling the text information to obtain a plurality of library vocabularies;
decomposing the voice according to the plurality of library vocabularies to obtain a plurality of user voice fragments respectively corresponding to the plurality of library vocabularies; and
and storing each user voice fragment in the audio library, wherein the file name of each user voice fragment is the library vocabulary corresponding to the user voice fragment.
9. The video dubbing method of claim 7, wherein the step of searching the audio library for the library vocabulary matching the subtitle file to generate personalized audio and information associated with the subtitle file and the personalized audio synchronously with the user speech segment corresponding to the library vocabulary comprises:
extracting a plurality of subtitle fragments in the subtitle file;
splitting each subtitle fragment to obtain a plurality of subtitle vocabularies and a plurality of playing time points corresponding to the subtitle vocabularies respectively;
searching a plurality of user voice fragments matched with a plurality of caption vocabularies in the audio library; and
and combining a plurality of user voice fragments corresponding to the plurality of caption vocabularies according to the playing time point sequence of the plurality of caption vocabularies to form the personalized audio.
10. The video dubbing method of claim 6, wherein during the personalized video playback process, the video dubbing method further comprises:
and when the personalized video is played, carrying out mute processing on the original audio.
11. The video dubbing method of claim 6, wherein the video file comprises a plurality of frames of play images, and the step of determining the play volume of the user voice clip corresponding to the mouth shape according to the mouth shape of the person in the video file comprises:
selecting a playing image associated with the user voice clip from the plurality of frames of playing images according to the playing time point of the user voice clip;
recognizing the mouth shape of a person in the playing image;
calculating the actual aspect ratio of the die according to the width and the height of the die;
calculating a volume amplification factor according to the actual aspect ratio and a preset aspect ratio; and
and determining the playing volume of the user voice clip corresponding to the mouth shape of the person in the playing image according to the volume amplification factor.
12. An electronic device, comprising:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the video dubbing method of any of claims 1-11.
13. A computer-readable storage medium comprising a computer program for use in conjunction with an electronic device, the computer program being executable by a processor to perform the video dubbing method of any of claims 1 to 11.
CN201811122718.2A 2018-09-26 2018-09-26 Video dubbing method, electronic device and readable storage medium Active CN110149548B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811122718.2A CN110149548B (en) 2018-09-26 2018-09-26 Video dubbing method, electronic device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811122718.2A CN110149548B (en) 2018-09-26 2018-09-26 Video dubbing method, electronic device and readable storage medium

Publications (2)

Publication Number Publication Date
CN110149548A CN110149548A (en) 2019-08-20
CN110149548B true CN110149548B (en) 2022-06-21

Family

ID=67589301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811122718.2A Active CN110149548B (en) 2018-09-26 2018-09-26 Video dubbing method, electronic device and readable storage medium

Country Status (1)

Country Link
CN (1) CN110149548B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110691204B (en) * 2019-09-09 2021-04-02 苏州臻迪智能科技有限公司 Audio and video processing method and device, electronic equipment and storage medium
CN110534131A (en) * 2019-08-30 2019-12-03 广州华多网络科技有限公司 A kind of audio frequency playing method and system
CN110769167A (en) * 2019-10-30 2020-02-07 合肥名阳信息技术有限公司 Method for video dubbing based on text-to-speech technology
CN111601174A (en) * 2020-04-26 2020-08-28 维沃移动通信有限公司 Subtitle adding method and device
CN112261435B (en) * 2020-11-06 2022-04-08 腾讯科技(深圳)有限公司 Social interaction method, device, system, equipment and storage medium
CN114765703B (en) * 2021-01-13 2023-07-07 北京中关村科金技术有限公司 Method and device for dyeing TTS voice corresponding subtitle and storage medium
CN112837401B (en) * 2021-01-27 2024-04-09 网易(杭州)网络有限公司 Information processing method, device, computer equipment and storage medium
CN113420627A (en) * 2021-06-15 2021-09-21 读书郎教育科技有限公司 System and method capable of generating English dubbing materials
CN113825005B (en) * 2021-09-30 2024-05-24 北京跳悦智能科技有限公司 Face video and audio synchronization method and system based on joint training

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6766299B1 (en) * 1999-12-20 2004-07-20 Thrillionaire Productions, Inc. Speech-controlled animation system
CN104967789A (en) * 2015-06-16 2015-10-07 福建省泉州市气象局 Automatic processing method and system for city window weather dubbing
CN106060424A (en) * 2016-06-14 2016-10-26 徐文波 Video dubbing method and device
CN107396177A (en) * 2017-08-28 2017-11-24 北京小米移动软件有限公司 Video broadcasting method, device and storage medium
CN107451564A (en) * 2017-07-31 2017-12-08 上海爱优威软件开发有限公司 A kind of human face action control method and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3298076B2 (en) * 1992-10-20 2002-07-02 ソニー株式会社 Image creation device
JP2003047030A (en) * 2001-07-31 2003-02-14 Shibasoku:Kk Lip sync signal generation apparatus
JP5389594B2 (en) * 2009-09-30 2014-01-15 富士フイルム株式会社 Image file generation method, program thereof, recording medium thereof, and image file generation device
CN102054287B (en) * 2009-11-09 2015-05-06 腾讯科技(深圳)有限公司 Facial animation video generating method and device
CN101930747A (en) * 2010-07-30 2010-12-29 四川微迪数字技术有限公司 Method and device for converting voice into mouth shape image
CN104732593B (en) * 2015-03-27 2018-04-27 厦门幻世网络科技有限公司 A kind of 3D animation editing methods based on mobile terminal
US10839825B2 (en) * 2017-03-03 2020-11-17 The Governing Council Of The University Of Toronto System and method for animated lip synchronization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6766299B1 (en) * 1999-12-20 2004-07-20 Thrillionaire Productions, Inc. Speech-controlled animation system
CN104967789A (en) * 2015-06-16 2015-10-07 福建省泉州市气象局 Automatic processing method and system for city window weather dubbing
CN106060424A (en) * 2016-06-14 2016-10-26 徐文波 Video dubbing method and device
CN107451564A (en) * 2017-07-31 2017-12-08 上海爱优威软件开发有限公司 A kind of human face action control method and system
CN107396177A (en) * 2017-08-28 2017-11-24 北京小米移动软件有限公司 Video broadcasting method, device and storage medium

Also Published As

Publication number Publication date
CN110149548A (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN110149548B (en) Video dubbing method, electronic device and readable storage medium
CN107193841B (en) Method and device for accelerating playing, transmitting and storing of media file
US7286749B2 (en) Moving image playback apparatus, moving image playback method, and computer program thereof with determining of first voice period which represents a human utterance period and second voice period other than the first voice period
JP4599244B2 (en) Apparatus and method for creating subtitles from moving image data, program, and storage medium
CN105516651B (en) Method and apparatus for providing a composite digest in an image forming apparatus
CN107316642A (en) Video file method for recording, audio file method for recording and mobile terminal
US20080037953A1 (en) Recording/Reproduction Apparatus And Recording/Reproduction Method, And Recording Medium Storing Recording/Reproduction Program, And Integrated Circuit For Use In Recording/Reproduction Apparatus
US20060136226A1 (en) System and method for creating artificial TV news programs
JP4331217B2 (en) Video playback apparatus and method
WO2009075754A1 (en) Recording audio metadata for stored images
JPWO2005069171A1 (en) Document association apparatus and document association method
CN108307250B (en) Method and device for generating video abstract
JP2011250100A (en) Image processing system and method, and program
KR101100191B1 (en) A multimedia player and the multimedia-data search way using the player
WO2001016935A1 (en) Information retrieving/processing method, retrieving/processing device, storing method and storing device
JP2012238232A (en) Interest section detection device, viewer interest information presentation device, and interest section detection program
JP4192703B2 (en) Content processing apparatus, content processing method, and program
JP2004056286A (en) Image display method
JP2002023716A (en) Presentation system and recording medium
US8538244B2 (en) Recording/reproduction apparatus and recording/reproduction method
KR20110080712A (en) Method and system for searching moving picture by voice recognition of mobile communication terminal and apparatus for converting text of voice in moving picture
JP3642019B2 (en) AV content automatic summarization system and AV content automatic summarization method
KR101618777B1 (en) A server and method for extracting text after uploading a file to synchronize between video and audio
CN110992984A (en) Audio processing method and device and storage medium
JP2002344805A (en) Method for controlling subtitles display for open caption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant