CN110753263A - Video dubbing method, device, terminal and storage medium - Google Patents

Video dubbing method, device, terminal and storage medium Download PDF

Info

Publication number
CN110753263A
CN110753263A CN201911038510.7A CN201911038510A CN110753263A CN 110753263 A CN110753263 A CN 110753263A CN 201911038510 A CN201911038510 A CN 201911038510A CN 110753263 A CN110753263 A CN 110753263A
Authority
CN
China
Prior art keywords
video
dubbing
dubbed
audio
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911038510.7A
Other languages
Chinese (zh)
Inventor
王志峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911038510.7A priority Critical patent/CN110753263A/en
Publication of CN110753263A publication Critical patent/CN110753263A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4396Processing of audio elementary streams by muting the audio signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The embodiment of the invention discloses a video dubbing method, a video dubbing device, a terminal and a storage medium. The method comprises the following steps: receiving a first dubbing request, the first dubbing request comprising: video clip and target dubbing role; performing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed; and receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video. The embodiment of the invention can improve the experience degree of the user video dubbing.

Description

Video dubbing method, device, terminal and storage medium
Technical Field
The present invention relates to the field of electronic devices, and in particular, to the field of video dubbing, and more particularly, to a video dubbing method, a video dubbing apparatus, a terminal, and a computer storage medium.
Background
With the rapid development of internet of things and electronic devices, dubbing videos gradually becomes a popular entertainment activity.
The existing video dubbing technology usually needs to perform clipping and silencing processing on videos through related professionals to obtain dubbing materials, and then the professionals synthesize the dubbing materials and the recording audio of a user to obtain dubbing videos.
Disclosure of Invention
The embodiment of the invention provides a video dubbing method, a video dubbing device, a terminal and a storage medium, which are beneficial to simplifying the video dubbing process and improving the video dubbing entertainment and the user experience.
In one aspect, the present invention provides a video dubbing method applied to an electronic device, including:
receiving a first dubbing request, the first dubbing request comprising: video clip and target dubbing role;
performing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed;
and receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video.
Wherein, the step of executing the silencing operation on the plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain the audio and video to be dubbed comprises the following steps:
determining a plurality of roles corresponding to the video clip, and acquiring the mapping relation between the roles and the voiceprint characteristics;
determining target voiceprint data of the target dubbing role according to the mapping relation between the roles and the voiceprint data;
the target voiceprint data is used as the input of a preset feature extraction model to obtain target voiceprint features;
acquiring audio track data corresponding to the video clip, and determining an audio track set to be dubbed in the audio track data according to the target voiceprint characteristics;
and determining the plurality of video sub-segments according to the audio track set to be dubbed and the video segments, and performing silencing operation on the plurality of sub-segments to obtain the audio and video to be dubbed.
Wherein the determining of the set of audio tracks to be dubbed in the audio track data according to the target voiceprint feature comprises:
taking the audio track data as the input of the feature extraction model to obtain an audio track feature set corresponding to the audio track data, wherein the audio track feature set comprises: a plurality of audio track features corresponding to the plurality of characters;
and matching the target voiceprint features with the audio track feature set, determining target audio track features corresponding to the target voiceprint features, and determining an audio track set corresponding to the target audio track features as the audio track set to be dubbed.
Wherein, the determining the plurality of video sub-segments according to the to-be-dubbed audio track set and the video segments, and performing a muting operation on the plurality of sub-segments to obtain the to-be-dubbed audio and video comprises:
acquiring a plurality of audio tracks to be dubbed contained in the audio track set to be dubbed;
determining a plurality of video sub-segments in the video segments according to the plurality of audio tracks to be dubbed, and performing a silencing operation on the plurality of video sub-segments to obtain a plurality of silencing sub-segments;
and updating the video clip according to the plurality of sound attenuation sub-clips to obtain the sound attenuation video, obtaining a plurality of sound attenuation time sets corresponding to the plurality of sound attenuation sub-clips, and marking the sound attenuation video according to the plurality of sound attenuation time sets to obtain the audio and video to be matched.
Wherein any one of the plurality of mute time sets comprises: after the audio and video to be matched are obtained, the method further comprises the following steps:
playing the audio and video to be matched, and monitoring the played time of the audio and video to be matched;
acquiring a plurality of silencing start times corresponding to the plurality of silencing time sets, executing audio acquisition operation when the played time length is detected to be matched with the plurality of silencing start times, and stopping executing the audio acquisition operation when the played time length is detected to be matched with the plurality of silencing end times to obtain a plurality of recording subdata;
and generating the audio and video to be matched according to the plurality of recording subdata.
The generating of dubbing videos according to the recording data and the audio and video to be dubbed comprises the following steps:
acquiring a to-be-dubbed audio track of the to-be-dubbed audio and video and a recording audio track corresponding to the recording data;
updating the audio track to be dubbed according to the recording audio track to obtain a dubbing audio track;
and replacing the audio track to be dubbed in the audio and video to be dubbed according to the dubbed audio track to obtain the dubbed video.
Wherein, after obtaining a plurality of audio tracks to be dubbed contained in the audio track set to be dubbed, the method further comprises:
the plurality of audio tracks to be dubbed are used as the input of a pre-trained audio track recognition model to obtain a plurality of texts to be dubbed corresponding to the plurality of audio tracks to be dubbed;
and establishing a mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed and storing the mapping table.
Wherein, before the audio acquisition operation is executed, the method further comprises:
determining a target silencing starting time matched with the played time length, determining a target silencing sound track corresponding to the target silencing starting time, and determining a target audio track to be dubbed corresponding to the target silencing sound track;
determining a target dubbing text corresponding to the target dubbing audio track according to the mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed;
and displaying the target dubbing text.
Wherein, after playing the dubbing video, the method further comprises:
when the dubbing video is detected to be played completely, displaying a preset first window, wherein the first window comprises: a video determination request;
if a video determination instruction returned by the target object is received, storing the dubbing video;
and if a refusal instruction returned by the target object is received, receiving a second dubbing request, and executing the video dubbing method according to the second dubbing request.
Wherein, after storing the dubbing video, the method further comprises:
displaying a preset second window, wherein the second window comprises: a video sharing request;
if a video sharing instruction returned by the target object is received, sending the dubbing video to a preset server;
and if a refusal sharing instruction returned by the target object is received, stopping executing the video dubbing method.
On the other hand, an embodiment of the present invention provides a video dubbing method, which is applied to a terminal device, and the method includes:
when receiving a video dubbing function triggering operation of a target object, displaying a video dubbing determination interface;
if video dubbing determining data are extracted from the video dubbing determining interface, displaying a dubbing data interface, and extracting the video dubbing data contained in the dubbing data interface;
playing the audio and video to be dubbed corresponding to the video dubbing data, displaying the dubbing text corresponding to the audio and video to be dubbed, and acquiring the recording data of the target object;
and synthesizing the audio and video to be dubbed and the recording data to obtain the dubbed video.
In another aspect, an embodiment of the present invention provides a video dubbing apparatus applied to an electronic device, where the video dubbing apparatus includes:
a receiving unit, configured to receive a first dubbing request, where the first dubbing request includes: video clip and target dubbing role;
the silencing unit is used for executing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed;
and the execution unit is used for receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video.
In the aspect that a plurality of video sub-segments corresponding to the target dubbing role are subjected to a muting operation in the video segment to obtain the audio and video to be dubbed, the muting unit is specifically configured to:
determining a plurality of roles corresponding to the video clip, and acquiring the mapping relation between the roles and the voiceprint characteristics;
determining target voiceprint data of the target dubbing role according to the mapping relation between the roles and the voiceprint data;
the target voiceprint data is used as the input of a preset feature extraction model to obtain target voiceprint features;
acquiring audio track data corresponding to the video clip, and determining an audio track set to be dubbed in the audio track data according to the target voiceprint characteristics;
and determining the plurality of video sub-segments according to the audio track set to be dubbed and the video segments, and performing silencing operation on the plurality of sub-segments to obtain the audio and video to be dubbed.
Wherein, in the aspect of determining the audio track set to be dubbed in the audio track data according to the target voiceprint feature, the sound muting unit is specifically configured to:
taking the audio track data as the input of the feature extraction model to obtain an audio track feature set corresponding to the audio track data, wherein the audio track feature set comprises: a plurality of audio track features corresponding to the plurality of characters;
and matching the target voiceprint features with the audio track feature set, determining target audio track features corresponding to the target voiceprint features, and determining an audio track set corresponding to the target audio track features as the audio track set to be dubbed.
Wherein, in the aspect that the plurality of video sub-segments are determined according to the to-be-dubbed audio track set and the video segment, and the to-be-dubbed audio and video is obtained by performing a muting operation on the plurality of sub-segments, the muting unit is specifically configured to:
acquiring a plurality of audio tracks to be dubbed contained in the audio track set to be dubbed;
determining a plurality of video sub-segments in the video segments according to the plurality of audio tracks to be dubbed, and performing a silencing operation on the plurality of video sub-segments to obtain a plurality of silencing sub-segments;
and updating the video clip according to the plurality of sound attenuation sub-clips to obtain the sound attenuation video, obtaining a plurality of sound attenuation time sets corresponding to the plurality of sound attenuation sub-clips, and marking the sound attenuation video according to the plurality of sound attenuation time sets to obtain the audio and video to be matched.
Wherein any one of the plurality of mute time sets comprises: the sound attenuation unit is further configured to:
playing the audio and video to be matched, and monitoring the played time of the audio and video to be matched;
acquiring a plurality of silencing start times corresponding to the plurality of silencing time sets, executing audio acquisition operation when the played time length is detected to be matched with the plurality of silencing start times, and stopping executing the audio acquisition operation when the played time length is detected to be matched with the plurality of silencing end times to obtain a plurality of recording subdata;
and generating the audio and video to be matched according to the plurality of recording subdata.
In the aspect of generating a dubbing video according to the recording data and the audio/video to be dubbed, the execution unit is specifically configured to:
acquiring a to-be-dubbed audio track of the to-be-dubbed audio and video and a recording audio track corresponding to the recording data;
updating the audio track to be dubbed according to the recording audio track to obtain a dubbing audio track;
and replacing the audio track to be dubbed in the audio and video to be dubbed according to the dubbed audio track to obtain the dubbed video.
Wherein, in terms of after the obtaining of the plurality of to-be-dubbed audio tracks included in the to-be-dubbed audio track set, the muting unit is further configured to:
the plurality of audio tracks to be dubbed are used as the input of a pre-trained audio track recognition model to obtain a plurality of texts to be dubbed corresponding to the plurality of audio tracks to be dubbed;
and establishing a mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed and storing the mapping table.
Wherein, in an aspect prior to the performing the audio capture operation, the performing unit is further configured to:
determining a target silencing starting time matched with the played time length, determining a target silencing sound track corresponding to the target silencing starting time, and determining a target audio track to be dubbed corresponding to the target silencing sound track;
determining a target dubbing text corresponding to the target dubbing audio track according to the mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed;
and displaying the target dubbing text.
Wherein, in terms of after the dubbing video is played, the execution unit is further configured to:
when the dubbing video is detected to be played completely, displaying a preset first window, wherein the first window comprises: a video determination request;
if a video determination instruction returned by the target object is received, storing the dubbing video;
and if a refusal instruction returned by the target object is received, receiving a second dubbing request, and executing the video dubbing method according to the second dubbing request.
Wherein, in an aspect subsequent to said storing said dubbed video, said execution unit is further to:
displaying a preset second window, wherein the second window comprises: a video sharing request;
if a video sharing instruction returned by the target object is received, sending the dubbing video to a preset server;
and if a refusal sharing instruction returned by the target object is received, stopping executing the video dubbing method.
In another aspect, an embodiment of the present invention provides a terminal, where the terminal includes an input device and an output device, and the terminal further includes:
a processor adapted to implement one or more instructions; and the number of the first and second groups,
a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:
receiving a first dubbing request, the first dubbing request comprising: video clip and target dubbing role;
performing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed;
and receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video.
In yet another aspect, an embodiment of the present invention provides a computer storage medium, where one or more instructions are stored, and the one or more instructions are adapted to be loaded by a processor and execute the following steps:
receiving a first dubbing request, the first dubbing request comprising: video clip and target dubbing role;
performing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed;
and receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video.
The embodiment of the invention receives a first dubbing request, wherein the first dubbing request comprises the following steps: video clip and target dubbing role; performing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed; and receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video. Therefore, the electronic equipment can determine the video sub-segments to be silenced according to the target dubbing role, and generates the dubbing video according to the recording data of the target object and the audio and video to be dubbed, so that the video dubbing process is simplified, the technical requirement on video dubbing is lowered, the practicability of video dubbing is improved, the entertainment of video dubbing is improved, and the improvement of user experience is facilitated.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present invention;
fig. 2 is a schematic view of a scene of a video dubbing method according to an embodiment of the present invention;
fig. 3 is an interaction diagram of a video dubbing method according to an embodiment of the present invention;
fig. 4a is a schematic flowchart of a video dubbing method according to an embodiment of the present invention;
FIG. 4b is a flowchart illustrating a method for obtaining a target voiceprint feature according to an embodiment of the present invention;
fig. 5 is a schematic flow chart of another video dubbing method according to an embodiment of the present invention;
fig. 6 is a flowchart illustrating another video dubbing method according to an embodiment of the present invention;
fig. 7 is a schematic view illustrating an interface display flow of a video dubbing method according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a video dubbing apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of the invention and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1, fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present invention, where the network architecture may include a plurality of servers and terminal devices (as shown in fig. 1, specifically, the terminal device 100, the server 101, and the server 102), and the terminal device 100 may perform data transmission with each server through a network, as shown in fig. 1, when the terminal device 100 executes a video dubbing method, sharing of the dubbing video may be implemented by sending the dubbing video to the server 101 and the server 102.
In one embodiment, when the terminal device 100 receives a first dubbing request sent by a target object, a video dubbing operation is performed according to the first dubbing request to obtain a dubbing video, and when a determined sharing instruction sent by the target object is detected, the dubbing video is sent to the server 101 and/or the server 102 according to the determined sharing instruction.
The terminal device may include a mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), and a wearable device (e.g., a smart watch, a smart band, etc.).
Therefore, the video dubbing method provided by the embodiment of the invention is realized through interaction between the terminal equipment and the server, the dubbing video is generated through the terminal equipment, and the terminal equipment realizes sharing of the dubbing video by sending the dubbing video to the plurality of servers, so that the video dubbing process is simplified, and the user experience is improved.
Referring to fig. 2, fig. 2 is a schematic view of a scene of a video dubbing method according to an embodiment of the present invention, as shown in fig. 2, the scene takes the terminal device 100 in the embodiment corresponding to fig. 1 as an example, and the video dubbing method includes:
when the terminal device 101 plays a video, monitoring an instruction sent by a target object, when it is monitored that the target object sends a first dubbing instruction, obtaining a video clip and a target dubbing role from the first dubbing instruction, wherein the number of dubbing people can be obtained from the first dubbing instruction, determining a target role voiceprint according to the target dubbing role and the video clip, executing a learning operation on the target voiceprint data to obtain a target voiceprint feature corresponding to the target voiceprint data, and performing matching and silencing in the video clip according to the target voiceprint feature to generate an audio and video to be dubbed, wherein the generating the audio and video to be dubbed by performing matching and silencing in the video clip according to the target voiceprint feature comprises: determining a plurality of video sub-segments in a video segment according to the target voiceprint characteristics, performing silencing operation on the plurality of video sub-segments to obtain a plurality of silencing sub-segments, replacing the video segment according to the plurality of silencing sub-segments to obtain an audio and video to be matched, playing the audio and video to be matched and displaying a request for determining the audio and video to be matched, wherein the request is used for requesting a target object to determine the audio and video to be matched, receiving an instruction for determining the audio and video to be matched returned by the target object, starting a recording function to obtain recording data, synthesizing the recording data and the audio and video to be matched to obtain a dubbed video, sending a video determination request to the target object, receiving a second dubbing request if a reject instruction is received, and executing a video dubbing method according to the second dubbing request; if a video determining instruction is received, storing the dubbing video, sending a video sharing request to a target object, and if a refusal sharing instruction is received, stopping executing the video dubbing method; and if a video sharing instruction is received, carrying out multi-channel video sharing operation according to the video sharing instruction, and stopping executing the video dubbing method.
The embodiment of the invention receives a first dubbing request, wherein the first dubbing request comprises the following steps: video clip and target dubbing role; performing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed; and receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video. Therefore, the electronic equipment can determine the video sub-segments to be silenced according to the target dubbing role, and generates the dubbing video according to the recording data of the target object and the audio and video to be dubbed, so that the video dubbing process is simplified, the technical requirement on video dubbing is lowered, the practicability of video dubbing is improved, the entertainment of video dubbing is improved, and the improvement of user experience is facilitated.
Fig. 3 is an interaction diagram of a video dubbing method according to an embodiment of the present invention. The method may comprise the steps of:
step 301, receiving a first dubbing request, where the first dubbing request includes: video clip and target dubbing role;
optionally, when it is detected that the electronic device executes a video playing function, monitoring an instruction sent by a target object, when it is detected that the target object starts a video clipping function, obtaining a played time length of a source video, clipping the video with the played time length as a video start time, and when it is detected that the video clipping function of the target object is closed, obtaining the video segment with a current played time length as a video end time; acquiring at least one role contained in the video clip, generating a role determination request according to the at least one role, displaying the role determination request, and receiving a role determination response returned by the target object, wherein the role determination response comprises: a target dubbing role; and generating the first dubbing request according to the video clip and the target dubbing role and returning.
Step 302, performing a silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain an audio/video to be dubbed;
optionally, determining a source video corresponding to the video segment, and acquiring video information corresponding to the source video, where the video information includes: a character set, the character set comprising: the method comprises the steps that a plurality of roles are obtained, the mapping relation between the roles and voiceprint data is obtained, and target voiceprint data corresponding to a target dubbing role are determined according to the mapping relation between the roles and the voiceprint data; and acquiring a preset feature extraction model, and taking the target voiceprint data as the input of the feature extraction model to obtain the target voiceprint features. Acquiring audio track data corresponding to the video clip, and determining an audio track set to be dubbed in the audio track data according to the target voiceprint characteristics; and determining the plurality of video sub-segments according to the audio track set to be dubbed and the video segment, and performing silencing operation on the plurality of video sub-segments to obtain the audio and video to be dubbed.
Further, determining the set of audio tracks to be dubbed in the audio track data according to the target voiceprint feature may further include: taking the audio track data as the input of the feature extraction model, and obtaining an audio track feature set corresponding to the audio track data, wherein the audio track feature set comprises: and matching the target voiceprint features with the audio track feature set, if the target voiceprint features are successfully matched with the audio track feature set, determining target audio track features corresponding to the target voiceprint features, and determining the audio track set corresponding to the target audio track features as an audio track set to be dubbed.
Further, performing the muting operation on the plurality of video sub-segments may include: the method comprises the steps of obtaining a plurality of audio tracks to be dubbed contained in an audio track set to be dubbed, determining a plurality of video sub-segments in the video segments according to the audio tracks to be dubbed, executing a silencing operation on the plurality of video sub-segments to obtain a plurality of silencing sub-segments, updating the video segments according to the plurality of silencing sub-segments to obtain a silencing video, and marking the plurality of silencing sub-segments in the silencing video to obtain the audio and video to be dubbed.
Step 303, receiving recording data of a target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video;
optionally, a recording audio track corresponding to the recording data is obtained, and the audio track corresponding to the audio and video to be dubbed is replaced by the recording audio track, so that the dubbing video is obtained.
Step 304, if a video determination instruction returned by the target object is received, storing the dubbing video;
optionally, before receiving a video determination instruction returned by the target object, displaying a preset first window on a display screen of the electronic device, where the first window includes: a video determination request for inquiring whether the target object re-executes the video dubbing method.
Further, if a video determination instruction returned by the target object is received, storing the dubbing video into a preset database; and if the rejection instruction returned by the target object is received, receiving a second dubbing request sent by the target object, and re-executing the video dubbing method according to the second dubbing request.
Step 305, if a video sharing instruction returned by the target object is received, sending the dubbing video to a preset server;
optionally, before a video sharing instruction returned by the target object is received, a preset second window is displayed on a display screen of the electronic device, where the second window includes: the video sharing method includes the steps that a video sharing request is used for inquiring whether a target object executes video sharing operation or not.
Optionally, sending the dubbing video to the preset server may include: if the video sharing request comprises a target sharing platform, determining a target server corresponding to the target sharing platform, and sending the dubbing video to the target server and a sharing request, wherein the sharing request is used for requesting the target server to store the dubbing video in an associated account of the target object on the target sharing platform; if the video sharing request does not contain a target sharing platform, determining a preset associated sharing platform of the target object, and sending the dubbing video to an associated server corresponding to the associated sharing platform.
Further, if a refusal sharing instruction returned by the target object is received, the video dubbing method is stopped to be executed.
And step 306, storing and sharing the dubbing video.
Fig. 4a is a schematic flow chart of a video dubbing method according to an embodiment of the present invention. The method may comprise the steps of:
step 401, receiving a first dubbing request, where the first dubbing request includes: video clip and target dubbing role;
optionally, the first dubbing request may further include: and obtaining a first target dubbing character corresponding to the first target object and a second target dubbing character corresponding to the second target object from the first dubbing request when the number of dubbing persons is larger than 1.
Optionally, step 401 may further include: starting a dubbing function, receiving a first dubbing request, acquiring a video clip, the number of dubbing people and a target dubbing role in the first dubbing request, and determining a source video corresponding to the video clip.
The above-mentioned dubbing function may be activated in various ways, for example, in an alternative embodiment, whether to activate the dubbing function may be determined by a specific button. Of course, in another alternative embodiment, the dubbing function may be activated when a set trigger condition is met, where the trigger condition may be a specific operation to determine whether to activate the dubbing function, and the specific operation includes, but is not limited to, a specific gesture, or a biometric verification including, but not limited to: face recognition verification, fingerprint recognition verification, voiceprint recognition verification, and the like. The embodiments of the present application do not limit the scheme for starting the dubbing function.
Step 402, performing a silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain an audio/video to be dubbed;
optionally, the specific implementation of step 402 may also be: determining a plurality of roles corresponding to the video clip, and acquiring the mapping relation between the roles and the voiceprint characteristics; determining target voiceprint data of the target dubbing role according to the mapping relation between the roles and the voiceprint characteristics; taking the target voiceprint data as the input of a preset feature extraction model to obtain target voiceprint features; acquiring audio track data corresponding to the video clip, and determining an audio track set to be dubbed in the audio track data according to the target voiceprint characteristics; and determining the plurality of video sub-segments according to the audio track set to be dubbed and the video segment, and performing silencing operation on the plurality of sub-segments to obtain the audio and video to be dubbed.
In a specific implementation, the feature extraction model may include an iVector model based on deep learning, where the feature extraction model may be as shown in fig. 4 b. When the target voiceprint features are extracted, training is carried out by taking DNN features and target voiceprint data as input of a feature extraction model to obtain frame posterior probability and 0-order statistics corresponding to the target voiceprint data, target voice features corresponding to the target voiceprint data are extracted, wherein the target voice features are obtained by carrying out pre-emphasis, framing, windowing, Fourier transform, filtering, logarithmic operation and discrete cosine transform on the target voiceprint data, 1-order statistics corresponding to the voice features are calculated according to the frame posterior probability after the voice features are input into the feature extraction model, and a calculation module (i.e. an i-vector system) is controlled to calculate the 0-order statistics and the 1-order statistics to obtain the target voiceprint features (i-vectors) corresponding to the target voiceprint data.
In a specific implementation process, when the target voiceprint data is input as a preset feature extraction model, the interference of channel information on voiceprint feature learning is reduced through a Probabilistic Linear Discriminant Analysis (PLDA). In voiceprint feature learning, it is assumed that the training data speech consists of the speech of i speakers, where each speaker has j segments of his own different speech. Then, define the firstThe j-th voice of i speakers is Xij. Then, based on the factorial analysis, we define XijThe generated model is:
Xjj=μ+Fhi+Gwij+∈ij
this model can be considered, among other things, as two parts: the first two items on the right of the equal sign are only related to the speaker and are not related to a specific certain voice of the speaker, and are called signal parts, which describe the difference between the speakers; the last two items to the right of the equal sign describe the difference between different voices of the same speaker, called the noise part. The middle two items on the right of the equal sign are respectively a matrix and a vector representation, which is another core part of factor analysis. The two matrices F and G contain basic factors in the respective imaginary variable spaces, which can be regarded as eigenvectors of the respective spaces. For example, each column of F corresponds to a feature vector of an inter-class space, and each column of G corresponds to a feature vector of an intra-class space. And the two vectors can be regarded as feature representations in respective spaces, such as hiCan be regarded as XijA representation of features in speaker space. In the stage of identification scoring, if h of two voices isiThe greater the likelihood that the features are the same, the more certain the two utterances belong to the same speaker.
Further, the specific implementation manner of step 402 may also be: taking the audio track data as the input of the feature extraction model to obtain an audio track feature set corresponding to the audio track data, wherein the audio track feature set comprises: a plurality of audio track features corresponding to the plurality of characters; and matching the target voiceprint features with the audio track feature set, determining target audio track features corresponding to the target voiceprint features, and determining an audio track set corresponding to the target audio track features as the audio track set to be dubbed.
In a specific implementation process, as shown in fig. 4b, a frame posterior probability and a 0 order statistic corresponding to audio track data are obtained by training through DNN features and the audio track data as input of a feature extraction model, and a speech feature corresponding to the audio track data is extracted, wherein the speech feature is obtained by performing pre-emphasis, framing, windowing, fourier transform, filtering, logarithm operation, and discrete cosine transform on the audio track data, after the speech feature is input into the feature extraction model, a 1 order statistic corresponding to the speech feature is calculated according to the frame posterior probability, and a calculation module (i.e., an i-vector system) is controlled to calculate the 0 order statistic and the 1 order statistic, so as to obtain an audio track feature set (i-vectors) corresponding to the audio track data.
When the target voiceprint feature is matched with the audio track feature set, whether matching is successful is determined by calculating a plurality of similarities between the target voiceprint feature and a plurality of audio track features in the audio track feature set, where the similarities can be determined by calculating a cosine distance of a feature vector, and a calculation formula of the cosine distance may include:
Figure BDA0002252212550000141
wherein cos θ represents the similarity of the target voiceprint feature x and the audio track feature y in the audio track feature set; and obtaining a maximum value in the similarity, judging whether the maximum value is greater than a preset similarity threshold, and if the maximum value is greater than the similarity threshold, determining that the target voiceprint feature is successfully matched with the audio track feature corresponding to the maximum value.
Optionally, the specific implementation of step 402 may further include: acquiring a plurality of audio tracks to be dubbed contained in the audio track set to be dubbed; determining a plurality of video sub-segments in the video segment according to the plurality of audio tracks to be dubbed, and performing a silencing operation on the plurality of video sub-segments to obtain a plurality of silencing sub-segments; and updating the video segment according to the plurality of sound attenuation sub-segments to obtain the sound attenuation video, acquiring a plurality of sound attenuation time sets corresponding to the plurality of sound attenuation sub-segments, and marking the sound attenuation video according to the plurality of sound attenuation time sets to obtain the audio and video to be matched.
In the specific implementation process, it is assumed that a first video sub-segment and a second video sub-segment are determined in a dubbing segment, a first video sub-audio track of the first video sub-segment and a second video sub-audio track of the second video sub-segment are obtained, a muting operation is performed on the first video sub-audio track and the second video sub-audio track to obtain a first muting sub-audio track and a second muting sub-audio track, the first video sub-audio track of the first video sub-segment is replaced by the first muting sub-audio track to obtain a first muting sub-segment, the second video sub-audio track of the second video sub-segment is replaced by the second muting sub-audio track to obtain a second muting sub-segment, the first video sub-segment is replaced by the first muting sub-segment in the video segment, the second video sub-segment is replaced by the second muting sub-audio segment to obtain a muting video, a first muting start time and a first end time of the first muting sub-segment are marked in the muting video, and marking the second silencing start time and the second silencing end time of the second silencing sub-segment in the silencing video to obtain the audio and video to be matched.
Further, the multiple audio tracks to be dubbed are used as the input of a pre-trained audio track recognition model, and multiple texts to be dubbed corresponding to the multiple audio tracks to be dubbed are obtained; and establishing and storing a mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed.
Optionally, the specific implementation of step 402 may further include: playing the audio and video to be matched, and monitoring the played time of the audio and video to be matched; acquiring a plurality of silencing start times corresponding to the plurality of silencing time sets, executing audio acquisition operation when detecting that the played time length is matched with the plurality of silencing start times, and stopping executing the audio acquisition operation when detecting that the played time length is matched with the plurality of silencing end times to obtain a plurality of recording subdata; and generating the audio and video to be matched according to the plurality of recording subdata.
In a specific implementation process, when it is detected that the played time length matches any one of the plurality of silence termination times (for example, the played time length is 19:38, the plurality of silence termination times are 1:25, 15:20, and 19:38, and it is determined that the played time length matches the plurality of silence termination times), a timer is set, and when the timer time is equal to a preset time, the audio capturing operation is stopped, where the preset time may be: 1s, 2s, 5s, etc., without limitation.
And 403, receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video.
Optionally, the specific implementation of step 403 may further include: acquiring a to-be-dubbed audio track of the to-be-dubbed audio and video and a recording audio track corresponding to the recording data; updating the audio track to be dubbed according to the recording audio track to obtain a dubbing audio track; and replacing the audio track to be dubbed in the audio and video to be dubbed according to the dubbed audio track to obtain the dubbed video.
Optionally, before receiving the recording data of the target object, determining a target muting start time matched with the played time length, determining a target muting audio track corresponding to the target muting start time, and determining a target audio track to be dubbed corresponding to the target muting audio track; determining a target dubbing text corresponding to the target dubbing audio track according to the mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed; the target dubbing text is displayed.
Further, a target dubbing speed is determined according to the target audio track to be dubbed, and the target dubbing speed is displayed.
Optionally, when it is detected that the dubbing video is played completely, displaying a preset first window, where the first window includes: a video determination request; if a video determination instruction returned by the target object is received, storing the dubbing video; and if a refusal instruction returned by the target object is received, receiving a second dubbing request, and executing the video dubbing method according to the second dubbing request.
Optionally, a preset second window is displayed, where the second window includes: a video sharing request; if a video sharing instruction returned by the target object is received, sending the dubbing video to a preset server; and if a refusal sharing instruction returned by the target object is received, stopping executing the video dubbing method.
The embodiment of the invention receives a first dubbing request, wherein the first dubbing request comprises the following steps: video clip and target dubbing role; performing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed; and receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video. Therefore, the electronic equipment can determine the video sub-segments to be silenced according to the target dubbing role, and generate the dubbing video according to the recording data of the target object and the audio and video to be dubbed, so that the video dubbing process is simplified, the technical requirement of video dubbing is lowered, the entertainment of video dubbing is improved, and the user experience is improved.
Fig. 5 is a schematic flow chart of another video dubbing method according to an embodiment of the present invention. The method may comprise the steps of:
step 501, receiving a first dubbing request, where the first dubbing request includes: video clip and target dubbing role;
step 502, determining a plurality of roles corresponding to the video clip, and acquiring a mapping relation between the plurality of roles and voiceprint features;
step 503, determining target voiceprint data of the target dubbing character according to the mapping relation between the plurality of characters and the voiceprint data;
step 504, the target voiceprint data is used as the input of a preset feature extraction model, and the target voiceprint features are obtained;
step 505, obtaining audio track data corresponding to the video clip, and determining an audio track set to be dubbed in the audio track data according to the target voiceprint characteristics;
step 506, determining the plurality of video sub-segments according to the audio track set to be dubbed and the video segments, and performing a silencing operation on the plurality of sub-segments to obtain the audio and video to be dubbed;
and 507, receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video.
The embodiment of the present invention may receive a first dubbing request, where the first dubbing request includes: video clip and target dubbing role; determining a plurality of roles corresponding to the video clip, and acquiring the mapping relation between the roles and the voiceprint characteristics; determining target voiceprint data of the target dubbing role according to the mapping relation between the roles and the voiceprint data; taking the target voiceprint data as the input of a preset feature extraction model to obtain target voiceprint features; acquiring audio track data corresponding to the video clip, and determining an audio track set to be dubbed in the audio track data according to the target voiceprint characteristics; determining the plurality of video sub-segments according to the audio track set to be dubbed and the video segment, and performing silencing operation on the plurality of sub-segments to obtain the audio and video to be dubbed; and receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video. The method has the advantages that the voice print recognition is used for silencing the video clips, the intelligence of the video dubbing method is improved, the flow of the video silencing operation is simplified, the practicability of the video dubbing is improved, and the improvement of the user experience is facilitated.
Fig. 6 is a schematic flow chart of another video dubbing method according to an embodiment of the present invention. The method may comprise the steps of:
step 601, receiving a first dubbing request, where the first dubbing request includes: video clip and target dubbing role;
step 602, performing a silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain an audio/video to be dubbed;
step 603, receiving recording data of a target object, and acquiring a to-be-dubbed audio track of the to-be-dubbed audio and video and a recording audio track corresponding to the recording data;
step 604, updating the audio track to be dubbed according to the recording audio track to obtain a dubbing audio track;
605, replacing the audio track to be dubbed in the audio and video to be dubbed according to the audio track to be dubbed to obtain the dubbed video;
step 606, determining the plurality of video sub-segments according to the audio track set to be dubbed and the video segments, performing a silencing operation on the plurality of sub-segments to obtain the audio and video to be dubbed, and playing the dubbed video.
The embodiment of the present invention may receive a first dubbing request, where the first dubbing request includes: video clip and target dubbing role; performing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segment to obtain an audio and video to be dubbed; receiving recording data of a target object, and acquiring a to-be-dubbed audio track of the to-be-dubbed audio and video and a recording audio track corresponding to the recording data; updating the audio track to be dubbed according to the recording audio track to obtain a dubbing audio track; replacing the audio track to be dubbed in the audio and video to be dubbed according to the dubbed audio track to obtain the dubbed video; and determining the plurality of video sub-segments according to the audio track set to be dubbed and the video segment, performing silencing operation on the plurality of sub-segments to obtain the audio and video to be dubbed, and playing the dubbed video. The dubbing video is obtained by replacing the dubbing audio track according to the recording audio track, so that the video synthesis rate is improved, the video dubbing practicability is improved, and the improvement of the user experience is facilitated.
Please refer to fig. 7, fig. 7 is a schematic view illustrating an interface display flow of a video dubbing method according to an embodiment of the present invention, as shown in fig. 7, taking an animation "pig cookies" as an example, first, a terminal device plays "pig cookies", and when the animation is played to 1:05 and a video dubbing function trigger operation of a target object is received, a video dubbing determination interface is displayed, where the video dubbing determination interface may include: a video dubbing determination request 700a, wherein the video dubbing determination request 700a is also displayable via a video dubbing determination popup; next, if video dubbing determination data is extracted from the video dubbing determination interface, displaying a dubbing data interface, and extracting video dubbing data included in the dubbing data interface, as shown in fig. 7, the dubbing data interface includes: a dubbing data request 700b, the dubbing data request 700b may include: dubbing segments, dubbing roles, and dubbing head numbers, as shown in fig. 7, the segment duration can be selected in the area corresponding to the dubbing segment, for example, the segment corresponding to the selected duration 1:05-3:10 is a dubbing video segment, the target dubbing role can be selected in the area corresponding to the dubbing role, for example, "role 1" is selected as the target dubbing role, and the dubbing head number can be selected in the dubbing head number interface, for example, the dubbing head number is selected as 1; then, playing the audio and video to be dubbed corresponding to the video dubbing data, displaying the dubbing text corresponding to the audio and video to be dubbed, and collecting the recording data of the target object, as shown in fig. 7, when playing the audio and video to be dubbed, displaying a "recording" character at the upper left corner of the current interface, and simultaneously marking the dubbed text corresponding to the playing progress of the current audio and video to be dubbed by the font size, wherein the dubbed text corresponding to the current audio and video to be dubbed can also be marked by the font color, for example, the current playing progress is to be played to 1:10, and the current dubbed text is "java! The new red shoe ", the dubbed text corresponding to the current playing progress is" Wa! Red ", the spoken text" Wa | is enlarged! The font size of red; and finally, obtaining the dubbing video according to the audio and video to be dubbed and the recording data.
In the embodiment of the invention, when the terminal equipment receives the video dubbing function triggering operation of a target object, a video dubbing determination interface is displayed; if video dubbing determining data are extracted from the video dubbing determining interface, displaying a dubbing data interface, and extracting the video dubbing data contained in the dubbing data interface; playing the audio and video to be dubbed corresponding to the video dubbing data, displaying the dubbing text corresponding to the audio and video to be dubbed, and acquiring the recording data of the target object; and playing the dubbing video synthesized by the audio and video to be dubbed and the recording data. Therefore, the embodiment of the invention can simplify the video dubbing process and is beneficial to improving the user experience.
Based on the description of the method embodiment and the device embodiment, the embodiment of the invention also provides a terminal. Referring to fig. 8, the terminal includes at least a processor 801, an input device 802, an output device 803, and a computer storage medium 804. The processor 801, the input device 802, the output device 803, and the computer storage medium 804 within the terminal may be connected by a bus or other means.
A computer storage medium 804 may be stored in the memory of the terminal, the computer storage medium 804 being for storing a computer program comprising program instructions, the processor 801 being for executing the program instructions stored by the computer storage medium 804. The processor 801 (or CPU) is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function; in one embodiment, the processor 801 according to the embodiment of the present invention may be configured to execute a series of video dubbing methods, including: receiving a first dubbing request, the first dubbing request comprising: video clip and target dubbing role; performing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed; and receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, playing the dubbing video, and the like.
The embodiment of the invention also provides a computer storage medium (Memory), which is a Memory device in the terminal and is used for storing programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal, and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 801. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.
In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by the processor 801 to implement the corresponding steps of the methods described above in connection with the scene cut embodiments; in particular implementations, one or more instructions in the computer storage medium are loaded and executed by the processor 801 to perform the steps of:
receiving a first dubbing request, the first dubbing request comprising: video clip and target dubbing role;
performing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed;
and receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video.
In an embodiment, in terms of obtaining an audio/video to be dubbed by performing a mute operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segment, the one or more instructions may be further loaded and specifically executed by the processor 801: determining a plurality of roles corresponding to the video clip, and acquiring the mapping relation between the roles and the voiceprint characteristics; determining target voiceprint data of the target dubbing role according to the mapping relation between the roles and the voiceprint data; the target voiceprint data is used as the input of a preset feature extraction model to obtain target voiceprint features; acquiring audio track data corresponding to the video clip, and determining an audio track set to be dubbed in the audio track data according to the target voiceprint characteristics; and determining the plurality of video sub-segments according to the audio track set to be dubbed and the video segments, and performing silencing operation on the plurality of sub-segments to obtain the audio and video to be dubbed.
In one embodiment, in said determining the set of audio tracks to dub in said audio track data according to said target voiceprint characteristics, said one or more instructions are further loadable and specifically executable by the processor 801: taking the audio track data as the input of the feature extraction model to obtain an audio track feature set corresponding to the audio track data, wherein the audio track feature set comprises: a plurality of audio track features corresponding to the plurality of characters; and matching the target voiceprint features with the audio track feature set, determining target audio track features corresponding to the target voiceprint features, and determining an audio track set corresponding to the target audio track features as the audio track set to be dubbed.
In one embodiment, in the aspect that the determining the plurality of video sub-segments according to the to-be-dubbed soundtrack set and the video segments and performing the muting operation on the plurality of sub-segments to obtain the to-be-dubbed video, the one or more instructions may be further loaded and specifically executed by the processor 801: acquiring a plurality of audio tracks to be dubbed contained in the audio track set to be dubbed; determining a plurality of video sub-segments in the video segments according to the plurality of audio tracks to be dubbed, and performing a silencing operation on the plurality of video sub-segments to obtain a plurality of silencing sub-segments; and updating the video clip according to the plurality of sound attenuation sub-clips to obtain the sound attenuation video, obtaining a plurality of sound attenuation time sets corresponding to the plurality of sound attenuation sub-clips, and marking the sound attenuation video according to the plurality of sound attenuation time sets to obtain the audio and video to be matched.
In one embodiment, any one of the plurality of mute time sets comprises: in the following aspect of obtaining the audio and video to be configured, the one or more instructions may be further loaded and specifically executed by the processor 801: playing the audio and video to be matched, and monitoring the played time of the audio and video to be matched; and acquiring a plurality of silencing start times corresponding to the plurality of silencing time sets, executing audio acquisition operation when the played time length is detected to be matched with the plurality of silencing start times, and stopping executing the audio acquisition operation when the played time length is detected to be matched with the plurality of silencing end times, so as to obtain a plurality of recording subdata and generate the audio and video to be matched according to the plurality of recording subdata.
In an embodiment, in the aspect of generating a dubbing video according to the recording data and the audio/video to be dubbed, the one or more instructions may be further loaded and specifically executed by the processor 801: acquiring a to-be-dubbed audio track of the to-be-dubbed audio and video and a recording audio track corresponding to the recording data; updating the audio track to be dubbed according to the recording audio track to obtain a dubbing audio track; and replacing the audio track to be dubbed in the audio and video to be dubbed according to the dubbed audio track to obtain the dubbed video.
In one embodiment, in terms of obtaining a plurality of audio tracks to be dubbed contained in the set of audio tracks to be dubbed, the one or more instructions are further loadable and specifically executable by the processor 801: the plurality of audio tracks to be dubbed are used as the input of a pre-trained audio track recognition model to obtain a plurality of texts to be dubbed corresponding to the plurality of audio tracks to be dubbed; and establishing a mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed and storing the mapping table.
In one embodiment, to the extent that it is prior to the performing of the audio capture operation, the one or more instructions are further loadable and specifically executable by the processor 801 to: determining a target silencing starting time matched with the played time length, determining a target silencing sound track corresponding to the target silencing starting time, and determining a target audio track to be dubbed corresponding to the target silencing sound track; determining a target dubbing text corresponding to the target dubbing audio track according to the mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed; and displaying the target dubbing text.
In one embodiment, in an aspect after said playing said dubbed video, said one or more instructions are further loadable and specifically executable by processor 801: when the dubbing video is detected to be played completely, displaying a preset first window, wherein the first window comprises: a video determination request; if a video determination instruction returned by the target object is received, storing the dubbing video; and if a refusal instruction returned by the target object is received, receiving a second dubbing request, and executing the video dubbing method according to the second dubbing request.
In one embodiment, to the extent that the dubbed video is stored, the one or more instructions are further loadable and specifically executable by processor 801 to: displaying a preset second window, wherein the second window comprises: a video sharing request; if a video sharing instruction returned by the target object is received, sending the dubbing video to a preset server; and if a refusal sharing instruction returned by the target object is received, stopping executing the video dubbing method.
Based on the description of the above-mentioned scene switching method embodiment, the embodiment of the present invention further discloses a video dubbing apparatus, which may be a computer program (including a program code) running in a terminal. The video dubbing apparatus may perform the method of fig. 4a, 4b, 5 or 6. Referring to fig. 9, the virtual object control apparatus may operate as follows:
a receiving unit 901, configured to receive a first dubbing request, where the first dubbing request includes: video clip and target dubbing role;
a silencing unit 902, configured to perform a silencing operation on multiple video sub-segments corresponding to the target dubbing role in the video segment to obtain an audio/video to be dubbed;
and the execution unit 903 is used for receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video.
In an embodiment, in terms of obtaining an audio/video to be dubbed by performing a muting operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segment, the muting unit 902 is specifically configured to: determining a plurality of roles corresponding to the video clip, and acquiring a mapping relation between the roles and voiceprint data; determining target voiceprint data of the target dubbing role according to the mapping relation between the roles and the voiceprint characteristics; the target voiceprint data is used as the input of a preset feature extraction model to obtain target voiceprint features; acquiring audio track data corresponding to the video clip, and determining an audio track set to be dubbed in the audio track data according to the target voiceprint characteristics; and determining the plurality of video sub-segments according to the audio track set to be dubbed and the video segments, and performing silencing operation on the plurality of sub-segments to obtain the audio and video to be dubbed.
In an embodiment, in the aspect of determining the audio track set to be dubbed in the audio track data according to the target voiceprint feature, the sound reduction unit 902 is specifically configured to: taking the audio track data as the input of the feature extraction model to obtain an audio track feature set corresponding to the audio track data, wherein the audio track feature set comprises: a plurality of audio track features corresponding to the plurality of characters; and matching the target voiceprint features with the audio track feature set, determining target audio track features corresponding to the target voiceprint features, and determining an audio track set corresponding to the target audio track features as the audio track set to be dubbed.
In an embodiment, in the aspect that the plurality of video sub-segments are determined according to the set of audio tracks to be dubbed and the video segment, and the audio/video to be dubbed is obtained by performing a muting operation on the plurality of sub-segments, the muting unit 902 is specifically configured to: acquiring a plurality of audio tracks to be dubbed contained in the audio track set to be dubbed; determining a plurality of video sub-segments in the video segments according to the plurality of audio tracks to be dubbed, and performing a silencing operation on the plurality of video sub-segments to obtain a plurality of silencing sub-segments; and updating the video clip according to the plurality of sound attenuation sub-clips to obtain the sound attenuation video, obtaining a plurality of sound attenuation time sets corresponding to the plurality of sound attenuation sub-clips, and marking the sound attenuation video according to the plurality of sound attenuation time sets to obtain the audio and video to be matched.
In one embodiment, any one of the plurality of mute time sets comprises: the sound deadening unit 902 is further configured to, in the later aspect of obtaining the audio and video to be matched, perform sound deadening at a sound deadening start time and a sound deadening end time: playing the audio and video to be matched, and monitoring the played time of the audio and video to be matched; and acquiring a plurality of silencing start times corresponding to the plurality of silencing time sets, executing audio acquisition operation when the played time length is detected to be matched with the plurality of silencing start times, and stopping executing the audio acquisition operation when the played time length is detected to be matched with the plurality of silencing end times, so as to obtain a plurality of recording subdata and generate the audio and video to be matched according to the plurality of recording subdata.
In an embodiment, in the aspect of generating a dubbing video according to the recording data and the audio/video to be dubbed, the executing unit 903 is specifically configured to: acquiring a to-be-dubbed audio track of the to-be-dubbed audio and video and a recording audio track corresponding to the recording data; updating the audio track to be dubbed according to the recording audio track to obtain a dubbing audio track; and replacing the audio track to be dubbed in the audio and video to be dubbed according to the dubbed audio track to obtain the dubbed video.
In one embodiment, after the obtaining the plurality of to-be-dubbed audio tracks included in the to-be-dubbed audio track set, the sound muffling unit 902 is further configured to: the plurality of audio tracks to be dubbed are used as the input of a pre-trained audio track recognition model to obtain a plurality of texts to be dubbed corresponding to the plurality of audio tracks to be dubbed; and establishing a mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed and storing the mapping table.
In an embodiment, in an aspect before the performing the audio capturing operation, the performing unit 903 is further configured to: determining a target silencing starting time matched with the played time length, determining a target silencing sound track corresponding to the target silencing starting time, and determining a target audio track to be dubbed corresponding to the target silencing sound track; determining a target dubbing text corresponding to the target dubbing audio track according to the mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed; and displaying the target dubbing text.
In an embodiment, in terms of the playing of the dubbing video, the executing unit 903 is further configured to: when the dubbing video is detected to be played completely, displaying a preset first window, wherein the first window comprises: a video determination request; if a video determination instruction returned by the target object is received, storing the dubbing video; and if a refusal instruction returned by the target object is received, receiving a second dubbing request, and executing the video dubbing method according to the second dubbing request.
In an embodiment, after the storing the dubbing video, the executing unit 903 is further configured to: displaying a preset second window, wherein the second window comprises: a video sharing request; if a video sharing instruction returned by the target object is received, sending the dubbing video to a preset server; and if a refusal sharing instruction returned by the target object is received, stopping executing the video dubbing method.
Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (14)

1. A video dubbing method applied to an electronic device, the method comprising:
receiving a first dubbing request, the first dubbing request comprising: video clip and target dubbing role;
performing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed;
and receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video.
2. The method according to claim 1, wherein the performing a mute operation on a plurality of video sub-segments corresponding to the target dubbing character in the video segment to obtain an audio/video to be dubbed comprises:
determining a plurality of roles corresponding to the video clip, and acquiring the mapping relation between the roles and the voiceprint characteristics;
determining target voiceprint data of the target dubbing role according to the mapping relation between the roles and the voiceprint data;
the target voiceprint data is used as the input of a preset feature extraction model to obtain target voiceprint features;
acquiring audio track data corresponding to the video clip, and determining an audio track set to be dubbed in the audio track data according to the target voiceprint characteristics;
and determining the plurality of video sub-segments according to the audio track set to be dubbed and the video segments, and performing silencing operation on the plurality of sub-segments to obtain the audio and video to be dubbed.
3. The method according to claim 2, wherein said determining a set of audio tracks to be dubbed in said audio track data in accordance with said target voiceprint feature comprises:
taking the audio track data as the input of the feature extraction model to obtain an audio track feature set corresponding to the audio track data, wherein the audio track feature set comprises: a plurality of audio track features corresponding to the plurality of characters;
and matching the target voiceprint features with the audio track feature set, determining target audio track features corresponding to the target voiceprint features, and determining an audio track set corresponding to the target audio track features as the audio track set to be dubbed.
4. The method according to claim 2, wherein said determining the plurality of video sub-segments according to the set of soundtracks to be dubbed and the video segments, and performing a muting operation on the plurality of sub-segments to obtain the video to be dubbed comprises:
acquiring a plurality of audio tracks to be dubbed contained in the audio track set to be dubbed;
determining a plurality of video sub-segments in the video segments according to the plurality of audio tracks to be dubbed, and performing a silencing operation on the plurality of video sub-segments to obtain a plurality of silencing sub-segments;
and updating the video clip according to the plurality of sound attenuation sub-clips to obtain the sound attenuation video, obtaining a plurality of sound attenuation time sets corresponding to the plurality of sound attenuation sub-clips, and marking the sound attenuation video according to the plurality of sound attenuation time sets to obtain the audio and video to be matched.
5. The method of claim 4, wherein any one of the plurality of mute time sets comprises: after the audio and video to be matched are obtained, the method further comprises the following steps:
playing the audio and video to be matched, and monitoring the played time of the audio and video to be matched;
acquiring a plurality of silencing start times corresponding to the plurality of silencing time sets, executing audio acquisition operation when the played time length is detected to be matched with the plurality of silencing start times, and stopping executing the audio acquisition operation when the played time length is detected to be matched with the plurality of silencing end times until a plurality of recording subdata are obtained;
and generating the audio and video to be matched according to the plurality of recording subdata.
6. The method according to claim 1, wherein the generating of dubbing videos according to the recording data and the audio and video to be dubbed comprises:
acquiring a to-be-dubbed audio track of the to-be-dubbed audio and video and a recording audio track corresponding to the recording data;
updating the audio track to be dubbed according to the recording audio track to obtain a dubbing audio track;
and replacing the audio track to be dubbed in the audio and video to be dubbed according to the dubbed audio track to obtain the dubbed video.
7. The method according to claim 4, wherein after obtaining a plurality of tracks to be dubbed contained in the set of tracks to be dubbed, further comprising:
the plurality of audio tracks to be dubbed are used as the input of a pre-trained audio track recognition model to obtain a plurality of texts to be dubbed corresponding to the plurality of audio tracks to be dubbed;
and establishing a mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed and storing the mapping table.
8. The method of claim 5, wherein prior to performing the audio capture operation, further comprising:
determining a target silencing starting time matched with the played time length, determining a target silencing sound track corresponding to the target silencing starting time, and determining a target audio track to be dubbed corresponding to the target silencing sound track;
determining a target dubbing text corresponding to the target dubbing audio track according to the mapping table of the plurality of audio tracks to be dubbed and the plurality of texts to be dubbed;
and displaying the target dubbing text.
9. The method of claim 1, wherein after playing the dubbed video, further comprising:
when the dubbing video is detected to be played completely, displaying a preset first window, wherein the first window comprises: a video determination request;
if a video determination instruction returned by the target object is received, storing the dubbing video;
and if a refusal instruction returned by the target object is received, receiving a second dubbing request, and executing the video dubbing method according to the second dubbing request.
10. The method of claim 9, wherein after storing the dubbing video, further comprising:
displaying a preset second window, wherein the second window comprises: a video sharing request;
if a video sharing instruction returned by the target object is received, sending the dubbing video to a preset server;
and if a refusal sharing instruction returned by the target object is received, stopping executing the video dubbing method.
11. A video dubbing method is applied to a terminal device, and comprises the following steps:
when receiving a video dubbing function triggering operation of a target object, displaying a video dubbing determination interface;
if video dubbing determining data are extracted from the video dubbing determining interface, displaying a dubbing data interface, and extracting the video dubbing data contained in the dubbing data interface;
playing the audio and video to be dubbed corresponding to the video dubbing data, displaying the dubbing text corresponding to the audio and video to be dubbed, and acquiring the recording data of the target object;
and obtaining the dubbing video according to the audio and video to be dubbed and the recording data.
12. A video dubbing apparatus applied to an electronic device, the apparatus comprising:
a receiving unit, configured to receive a first dubbing request, where the first dubbing request includes: video clip and target dubbing role;
the silencing unit is used for executing silencing operation on a plurality of video sub-segments corresponding to the target dubbing role in the video segments to obtain audio and video to be dubbed;
and the execution unit is used for receiving the recording data of the target object, generating a dubbing video according to the recording data and the audio and video to be dubbed, and playing the dubbing video.
13. A terminal comprising an input device and an output device, further comprising:
a processor adapted to implement one or more instructions; and the number of the first and second groups,
a computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to execute the video dubbing method of any of claims 1 to 10.
14. A computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform a video dubbing method as claimed in any one of claims 1 to 10.
CN201911038510.7A 2019-10-29 2019-10-29 Video dubbing method, device, terminal and storage medium Pending CN110753263A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911038510.7A CN110753263A (en) 2019-10-29 2019-10-29 Video dubbing method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911038510.7A CN110753263A (en) 2019-10-29 2019-10-29 Video dubbing method, device, terminal and storage medium

Publications (1)

Publication Number Publication Date
CN110753263A true CN110753263A (en) 2020-02-04

Family

ID=69280840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911038510.7A Pending CN110753263A (en) 2019-10-29 2019-10-29 Video dubbing method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN110753263A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111741231A (en) * 2020-07-23 2020-10-02 北京字节跳动网络技术有限公司 Video dubbing method, device, equipment and storage medium
CN112752142A (en) * 2020-08-26 2021-05-04 腾讯科技(深圳)有限公司 Dubbing data processing method and device and electronic equipment
CN113377326A (en) * 2021-06-08 2021-09-10 广州博冠信息科技有限公司 Audio data processing method and device, terminal and storage medium
CN114339423A (en) * 2021-12-24 2022-04-12 咪咕文化科技有限公司 Short video generation method and device, computing equipment and computer readable storage medium
CN114564645A (en) * 2022-02-28 2022-05-31 北京字节跳动网络技术有限公司 Encyclopedic information display method, encyclopedic information display device, encyclopedic information display equipment and encyclopedic information display medium
WO2022179530A1 (en) * 2021-02-24 2022-09-01 花瓣云科技有限公司 Video dubbing method, related device, and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488764A (en) * 2013-09-26 2014-01-01 天脉聚源(北京)传媒科技有限公司 Personalized video content recommendation method and system
CN105895102A (en) * 2015-11-15 2016-08-24 乐视移动智能信息技术(北京)有限公司 Recording editing method and recording device
CN107172449A (en) * 2017-06-19 2017-09-15 微鲸科技有限公司 Multi-medium play method, device and multimedia storage method
CN107396177A (en) * 2017-08-28 2017-11-24 北京小米移动软件有限公司 Video broadcasting method, device and storage medium
CN108337558A (en) * 2017-12-26 2018-07-27 努比亚技术有限公司 Audio and video clipping method and terminal
CN110312161A (en) * 2018-03-20 2019-10-08 Tcl集团股份有限公司 A kind of video dubbing method, device and terminal device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488764A (en) * 2013-09-26 2014-01-01 天脉聚源(北京)传媒科技有限公司 Personalized video content recommendation method and system
CN105895102A (en) * 2015-11-15 2016-08-24 乐视移动智能信息技术(北京)有限公司 Recording editing method and recording device
CN107172449A (en) * 2017-06-19 2017-09-15 微鲸科技有限公司 Multi-medium play method, device and multimedia storage method
CN107396177A (en) * 2017-08-28 2017-11-24 北京小米移动软件有限公司 Video broadcasting method, device and storage medium
CN108337558A (en) * 2017-12-26 2018-07-27 努比亚技术有限公司 Audio and video clipping method and terminal
CN110312161A (en) * 2018-03-20 2019-10-08 Tcl集团股份有限公司 A kind of video dubbing method, device and terminal device

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11817127B2 (en) 2020-07-23 2023-11-14 Beijing Bytedance Network Technology Co., Ltd. Video dubbing method, apparatus, device, and storage medium
WO2022017451A1 (en) * 2020-07-23 2022-01-27 北京字节跳动网络技术有限公司 Video dubbing method. device, apparatus, and storage medium
CN111741231B (en) * 2020-07-23 2022-02-22 北京字节跳动网络技术有限公司 Video dubbing method, device, equipment and storage medium
CN111741231A (en) * 2020-07-23 2020-10-02 北京字节跳动网络技术有限公司 Video dubbing method, device, equipment and storage medium
CN112752142B (en) * 2020-08-26 2022-07-29 腾讯科技(深圳)有限公司 Dubbing data processing method and device and electronic equipment
CN112752142A (en) * 2020-08-26 2021-05-04 腾讯科技(深圳)有限公司 Dubbing data processing method and device and electronic equipment
CN115037975A (en) * 2021-02-24 2022-09-09 花瓣云科技有限公司 Method for dubbing video, related equipment and computer-readable storage medium
WO2022179530A1 (en) * 2021-02-24 2022-09-01 花瓣云科技有限公司 Video dubbing method, related device, and computer readable storage medium
CN115037975B (en) * 2021-02-24 2024-03-01 花瓣云科技有限公司 Method for dubbing video, related equipment and computer readable storage medium
CN113377326B (en) * 2021-06-08 2023-02-03 广州博冠信息科技有限公司 Audio data processing method and device, terminal and storage medium
CN113377326A (en) * 2021-06-08 2021-09-10 广州博冠信息科技有限公司 Audio data processing method and device, terminal and storage medium
CN114339423A (en) * 2021-12-24 2022-04-12 咪咕文化科技有限公司 Short video generation method and device, computing equipment and computer readable storage medium
CN114564645A (en) * 2022-02-28 2022-05-31 北京字节跳动网络技术有限公司 Encyclopedic information display method, encyclopedic information display device, encyclopedic information display equipment and encyclopedic information display medium

Similar Documents

Publication Publication Date Title
CN110753263A (en) Video dubbing method, device, terminal and storage medium
CN107527620B (en) Electronic device, the method for authentication and computer readable storage medium
CN107481720B (en) Explicit voiceprint recognition method and device
CN102568478B (en) Video play control method and system based on voice recognition
CN112435684B (en) Voice separation method and device, computer equipment and storage medium
KR102081495B1 (en) How to add accounts, terminals, servers, and computer storage media
CN112396182B (en) Method for training face driving model and generating face mouth shape animation
US6990446B1 (en) Method and apparatus using spectral addition for speaker recognition
US20100057452A1 (en) Speech interfaces
CN108364656B (en) Feature extraction method and device for voice playback detection
CN112735371B (en) Method and device for generating speaker video based on text information
WO2020098523A1 (en) Voice recognition method and device and computing device
CN110970036A (en) Voiceprint recognition method and device, computer storage medium and electronic equipment
CN110473552A (en) Speech recognition authentication method and system
CN110767214A (en) Speech recognition method and device and speech recognition system
CN112331217B (en) Voiceprint recognition method and device, storage medium and electronic equipment
WO2019196305A1 (en) Electronic device, identity verification method, and storage medium
Ghiurcau et al. Speaker recognition in an emotional environment
CN113823323A (en) Audio processing method and device based on convolutional neural network and related equipment
US11775070B2 (en) Vibration control method and system for computer device
US11081115B2 (en) Speaker recognition
CN110689887B (en) Audio verification method and device, storage medium and electronic equipment
CN108630208B (en) Server, voiceprint-based identity authentication method and storage medium
CN110610697B (en) Voice recognition method and device
EP4300493A1 (en) Audio data processing method and apparatus, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40021477

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination