CN109841225B - Sound replacement method, electronic device, and storage medium - Google Patents

Sound replacement method, electronic device, and storage medium Download PDF

Info

Publication number
CN109841225B
CN109841225B CN201910082625.XA CN201910082625A CN109841225B CN 109841225 B CN109841225 B CN 109841225B CN 201910082625 A CN201910082625 A CN 201910082625A CN 109841225 B CN109841225 B CN 109841225B
Authority
CN
China
Prior art keywords
audio
person
resource
determining
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910082625.XA
Other languages
Chinese (zh)
Other versions
CN109841225A (en
Inventor
许栋刚
邢丽
张延良
王伟
李林
王静
王娜
刘大鹏
张玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yijiesheng Technology Co ltd
Original Assignee
Beijing Yijiesheng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yijiesheng Technology Co ltd filed Critical Beijing Yijiesheng Technology Co ltd
Priority to CN201910082625.XA priority Critical patent/CN109841225B/en
Publication of CN109841225A publication Critical patent/CN109841225A/en
Application granted granted Critical
Publication of CN109841225B publication Critical patent/CN109841225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a sound replacement method, an electronic device and a storage medium. The method determines a first video asset; determining a first person in a first video resource; determining a first audio characteristic of a first persona; determining a second person corresponding to the first person, the second person being different from the first person; determining a second audio characteristic of a second persona; determining a replacement audio feature from the second audio feature and the first audio feature; adjusting the sound of the first person according to the replacement audio features; the audio features include, pitch, loudness, timbre, speech rate, and language style. The method adjusts the voice of the first character in the first video resource according to the audio characteristic of the second character, realizes the voice change of the character after the video resource is made, and improves the participation and the interactivity.

Description

Sound replacement method, electronic device, and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a sound replacement method, an electronic device, and a storage medium.
Background
At present, in video resources such as movies, televisions, animations, cartoons, games and the like, character images are fixed, namely once the video resources are manufactured, character sounds only can be the same as the sound images during the manufacturing process and cannot be changed.
The character sound is presented in a mode that the character image is unchangeable, the interestingness of the video resource can be reduced, and the participation and the interactivity between the video resource and the user are insufficient.
Disclosure of Invention
Technical problem to be solved
In order to improve interactivity of video resources, the invention provides a sound replacement method, an electronic device and a storage medium.
(II) technical scheme
In order to achieve the purpose, the invention adopts the main technical scheme that:
a sound replacement method includes:
s101, determining a first video resource;
s102, determining a first person in the first video resource;
s103, determining a first audio characteristic of the first person;
s104, determining a second person corresponding to the first person, wherein the second person is different from the first person;
s105, determining a second audio characteristic of the second person;
s106, determining a replacement audio feature according to the second audio feature and the first audio feature;
s107, adjusting the sound of the first person according to the replaced audio features;
the audio features include: pitch, loudness, timbre, speech rate, language style.
Optionally, the S102 includes:
s102-1, determining the total occurrence duration of each character and the total audio duration of each character in the first video resource;
s102-2, determining the ranking value of each person according to the following formula:
Ce=Te2/Te1
wherein e is any person in the first video resource, CeIs the ranking value, T, of any character e in the first video resourcee1Is the total occurrence time length T of any character e in the first video resourcee2Is as followsThe total audio time of any character e in a video resource;
s102-3, sequencing all the people in the first video resource from large to small according to the sequencing value;
s102-4, determining the preset number of people ranked in the front as first people;
when the number of the first people is 1, the number of the second people is 1;
when the number of the first people is multiple, the number of the second people is the same as that of the first people, each second person corresponds to one unique first person, and the second people are different from the corresponding first people.
Optionally, the S104 includes:
monitoring whether at least one replacement resource is triggered;
when at least one alternative resource is triggered, determining a second person from the triggered alternative resource;
wherein the at least one replacement resource is triggered, comprising:
at least one stored audio is selected; alternatively, the first and second electrodes may be,
at least one stored second video asset is selected; alternatively, the first and second electrodes may be,
at least one stored audio is clicked on; alternatively, the first and second electrodes may be,
at least one stored second video asset is clicked on; alternatively, the first and second electrodes may be,
at least one audio is uploaded; alternatively, the first and second electrodes may be,
at least one second video asset is uploaded; alternatively, the first and second electrodes may be,
at least one audio is recorded instantly; or, to
At least one second video resource is shot instantly;
the second video asset is different from the first video asset.
Optionally, the first video resource is a dynamic image resource containing audio, and the dynamic image is a movie, a television, an animation, a game, a self-portrait video, an advertisement video, or a small video;
the second video resource is a dynamic image resource containing audio, and the dynamic image is a movie, a television, an animation, a game, a self-portrait video, an advertisement video or a small video.
Optionally, the determining the second person from the triggered alternative resource includes:
determining the person selected by the user in the triggered alternative resources as a second person; alternatively, the first and second electrodes may be,
when the triggered replacement resource is audio, identifying all characters in the triggered replacement resource, calculating the audio time length of each character, respectively calculating the ratio of the audio time length of each character to the total audio time length, and determining a second character according to the ratio of each character; alternatively, the first and second electrodes may be,
and when the triggered alternative resource is the second video resource, identifying all the persons in the triggered alternative resource, and determining the second person according to the importance degree of each person.
Alternatively, the importance level of each character is determined by:
determining that all frames of any person i exist aiming at the person i;
the importance level of any character i is determined according to the following formula:
Figure BDA0001960841810000031
wherein, WiN is the degree of importance of any character iiThe total number of frames for which any character i exists, N is the total number of frames of the second video resource, Ti1Is the total time length of the appearance of any character i, T1Total duration of video for triggered replacement resource, Ti2Is the total audio duration, T, of the character i2Total duration of audio for triggered replacement resource, s1Total effective video duration, s, for triggered character of replacement resource2The total length of valid audio for the triggered character of the replacement resource.
Optionally, the language style in the first audio feature of the first person is determined by:
s301-1, acquiring all audio of a first person in a first video resource;
s301-2, performing voice recognition on the audio obtained in S301-1, and determining a first sound characteristic;
s301-3, converting the audio obtained in S301-1 into a first text;
s301-4, performing semantic analysis on the first text to determine first word characteristics;
s301-5, taking the first sound characteristic and the first word characteristic as language styles in the first audio characteristic of the first character;
the linguistic style in the second audio feature of the second persona is determined by:
s302-1, acquiring the audio of a second person;
s302-2, performing voice recognition on the audio obtained in the S302-1, and determining a second voice characteristic;
s302-3, converting the audio obtained in the S302-1 into a second text;
s302-4, performing semantic analysis on the second text to determine second word characteristics;
s302-5, taking the second sound characteristic and the second word characteristic as language styles in the second sound characteristic of the second character;
the sound features include: the pronunciation rhythm, the pause between words, the pronunciation tone of sentence, the position of accent and pronunciation rhythm;
the accent includes: parallel stress, contrast stress, responsiveness stress, progressive stress, turning stress, positive stress, emphatic stress, metaphorical stress, anaudic stress, antisense stress;
the word features include: spoken words, modifiers, word combinations, ellipses.
Optionally, the S106 includes:
s106-1, acquiring a first tone, a first loudness, a first tone, a first speech speed and a first language style in the first audio characteristic;
s106-2, acquiring a second tone, a second loudness, a second tone, a second speech speed and a second language style in the second audio characteristic;
s106-3, determining the average value of the first tone and the second tone as the tone in the replacement audio feature;
s106-4, determining the first loudness as the loudness in the replacement audio features;
s106-5, determining the second tone as the tone in the replacement audio feature;
s106-6, the value A of the following formula3Determining to adjust to speech rate in the replacement audio feature:
Figure BDA0001960841810000051
wherein A is3To replace speech rate in audio features, A1Is the first speech rate, A2At the second speech rate, B1For interword pauses in a first language style, B2An interword pause in a second language style;
s106-7, determining the sum of the word characteristics in the first language style and the word characteristics in the second language style as the word characteristics of the language style in the replacement audio characteristics;
and S106-8, determining the sound features in the second language style as the sound features of the language style in the replacement audio features.
In order to achieve the above purpose, the main technical solution adopted by the present invention further comprises:
an electronic device comprising a memory, a processor, a bus and a computer program stored on the memory and executable on the processor, the processor implementing a method as claimed in any one of the above methods when executing the program.
In order to achieve the above purpose, the main technical solution adopted by the present invention further comprises:
a computer storage medium having stored thereon a computer program which, when executed by a processor, implements a method as in any one of the above methods.
(III) advantageous effects
The invention has the beneficial effects that: determining a first video resource; determining a first person in a first video resource; determining a first audio characteristic of a first persona; determining a second person corresponding to the first person, the second person being different from the first person; determining a second audio characteristic of a second persona; determining a replacement audio feature from the second audio feature and the first audio feature; adjusting the sound of the first person according to the replacement audio features; the audio features comprise tone, loudness, timbre, speech rate and language style, so that the character voice change after the video resource is made is realized, and the participation and the interactivity are improved.
Drawings
Fig. 1 is a schematic flow chart of a sound replacement method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to improve interactivity of video resources, the proposal provides a sound replacement method, electronic equipment and a storage medium, and a first video resource is determined; determining a first person in a first video resource; determining a first audio characteristic of a first persona; determining a second person corresponding to the first person, the second person being different from the first person; determining a second audio characteristic of a second persona; determining a replacement audio feature from the second audio feature and the first audio feature; adjusting the sound of the first person according to the replacement audio features; the audio features comprise tone, loudness, timbre, speech rate and language style, so that the character voice change after the video resource is made is realized, and the participation and the interactivity are improved.
Referring to fig. 1, the implementation flow of the sound replacement method provided in this embodiment is as follows:
s101, determining a first video resource.
The first video resource is a dynamic image resource containing audio.
For example, the moving image is a movie, or a television, or an animation, or a game, or a self-timer video, or an advertisement video, or a small video.
Namely a movie with sound, or a television with sound, or an animation with sound, or a game with sound, or a self-timer video with sound, or an advertisement video with sound, or a small video with sound.
For convenience of description, the animation a with sound as the first video resource is taken as an example in this embodiment and the following embodiments. For other forms of the first video asset, this embodiment will not be illustrated.
S102, determining a first person in the first video resource.
The number of first persons in this step may be one or more. The number of first persons is not limited in this embodiment.
In this step, there are various ways to determine the first person, for example, if the user clicks one person, the person clicked by the user is determined as the first person.
For another example, if the user clicks a plurality of characters, all the characters clicked by the user are determined as the first character.
As another example, the first person is determined by:
s102-1, determining the total occurrence time length of each character and the total audio time length of each character in the first video resource.
S102-2, determining the ranking value of each person according to the following formula:
Ce=Te2/Te1
wherein e is any person in the first video resource, CeIs the ranking value, T, of any character e in the first video resourcee1Is the total occurrence time length T of any character e in the first video resourcee2The total audio time of any character e in the first video resource.
S102-3, sequencing all the people in the first video resource from large to small according to the sequencing value.
S102-4, determining the preset number of people ranked at the top as the first people.
For example, if the preset number is 2, there are 4 characters in the animation a, that is, the character 1, the character 2, the character 3, and the character 4, respectively, then the total length of time that the character 1 appears in the animation a (e.g., T11), the total length of time that the character 2 appears in the animation a (e.g., T21), the total length of time that the character 3 appears in the animation a (e.g., T31), the total length of time that the character 4 appears in the animation a (e.g., T41), and the total length of audio that the character 1 appears in the animation a are determined. (e.g., T12), the total length of audio that character 2 appears in animation a. (e.g., T22), the total length of audio that character 3 appears in animation a. (e.g., T32), the total length of audio that character 4 appears in animation a (e.g., T42). Determining rank value C of person 11Rank value C of person 2 ═ T12/T112Rank value C of person 3 ═ T22/T213Rank value C of person 4 ═ T32/T314T42/T41. If C is present4>C2>C1=C3If the sequence value is from large to small, all the characters in the animation A are sequenced to obtain the following sequence: person 4, person 2, person 1, and person 3. The top 2 persons (person 4 and person 2) are each determined as the first person.
S103, determining a first audio characteristic of the first person.
And if the number of the first persons is 1, determining the first audio characteristics of the first persons. If the number of the first persons is 2, the first audio characteristics of each first person are determined.
The 'first' in the step is only to distinguish the audio features of the subsequent second person, and has no practical meaning.
The audio features include: pitch, loudness, timbre, speech rate, language style.
Where the tone is represented by the frequency of the sound wave.
Loudness is expressed in the amplitude of vibration of the sound wave.
The tone is represented by the vibration waveform of the sound wave.
The speech rate is expressed in words per minute.
The language style is determined by:
s301-1, in the first video resource, all the audios of the first person are obtained.
S301-2, performing voice recognition on the audio obtained in S301-1, and determining a first sound characteristic.
Wherein the sound features include: word pronunciation tone, pause between words, sentence pronunciation tone, accent position, pronunciation rhythm.
The accent includes: parallel stress, contrast stress, responsiveness stress, progressive stress, turning stress, positive stress, emphatic stress, metaphorical stress, pseudovocalic stress, antisense stress.
The word or phrase is represented by the language stress. Such as: talk about life, talk about ideal, talk about the future.
The contrastive stress means that some words or phrases which are compared and contrasted to make the characteristics of the things more prominent and make the things more vivid exist in the paragraphs or the sentences, and the comparative relationship between the words or phrases is represented by the linguistic stress. If the elephant is large, the mouse is small.
The stress of the correspondence refers to the fact that the context corresponding relation is represented by the stress of the language. The pearl is big like black bean, and small like millet.
The progressive stress means that the language stress represents that the relationship is developed forward step by step and deepened step by step. If the attitude of the manager is changed first, then the attitude of the employee is changed.
The turning accent indicates a relationship of content change in the opposite direction by the language accent. If the person has no way in the world, the person has more people and the way is also formed.
Positive stress indicates positive attitude by linguistic stress. If the question is not really done by me.
Accentuation accents express a particular emotion and emphasize a particular meaning by linguistic accents, with the aim of drawing the listener's attention to a certain portion of his own emphasis. As i go to the classroom.
Metaphorical stress means that some words or phrases exist in paragraphs or sentences, which are abstracted into concrete, deepened or shallow, so that the language is fun, and the words or phrases are difficult to forget by listeners, and the words or phrases are expressed by language stress. Such as a doll just landed in spring, is new from head to foot.
The speech accents represent the ideographic words by linguistic accents. Such as by exhaling.
Antisense stress refers to the presence of words or phrases in paragraphs or sentences that are either spoken in the opposite direction or spoken in the opposite direction in order to reveal the nature of the matter, the words or phrases being represented by linguistic stress. How smart you are?
S301-3, the audio obtained in S301-1 is converted into a first text.
S301-4, performing semantic analysis on the first text, and determining first word features.
Wherein, the word characteristics include: spoken words, modifiers, word combinations, ellipses.
S301-5, the first sound characteristic and the first word characteristic are taken as the language style in the first audio characteristic of the first character.
The language characteristics of the first character can be embodied through the sound characteristics, and the word characteristics of the first character can be embodied through the word characteristics. The combination of the voice characteristics and the word characteristics can accurately describe the language style of the first character.
And S104, determining a second person corresponding to the first person.
Wherein the second person is different from the first person.
That is, when the number of the first person is 1, the number of the second person is 1, and the second person is different from the first person. When the number of the first people is multiple, the number of the second people is the same as that of the first people, each second person corresponds to one unique first person, and the second people are different from the corresponding first people.
For example, when the number of first persons is 2 (e.g., a and B), the number of second persons is also 2 (e.g., C and D), each second person corresponds to a unique first person (e.g., C corresponds to a and D corresponds to B), and the second person is different from its corresponding first person (e.g., C is different from a and D is different from B). In this embodiment, only C is different from a, and D is the same as B, but this embodiment does not limit whether C is the same as B, and this embodiment does not limit whether a is the same as D.
The specific implementation manner of the step is as follows: monitoring whether at least one replacement resource is triggered; and when at least one alternative resource is triggered, determining a second person from the triggered alternative resource.
The state of the replacement resource may be a stored replacement resource, an uploaded replacement resource, or an immediately photographed replacement resource. In addition, the alternate resource may be either an audio or a second video resource. (the second video asset is also a dynamic video asset containing audio. for example, the dynamic video is a movie, or a television, or an animation, or a game, or a self-timer video, or an advertisement video, or a small video. the movie with sound, or the television with sound, or the animation with sound, or the game with sound, or the self-timer video with sound, or the advertisement video with sound, or the small video with sound. the second video asset is only used for distinguishing from the first video asset in S101, that is, the second and the first video assets are only used for limiting the assets in different stages and have no other meanings, the first video asset is the asset where the replaced character is located, the second video asset is the asset where the replaced character is located.
Therefore, the at least one alternative resource in this embodiment may be at least one stored audio, or at least one stored second video resource, or at least one uploaded audio, or at least one uploaded second video resource, or at least one instantly recorded audio, or at least one instantly photographed second video resource.
Based thereon, it may be determined that at least one replacement resource is triggered when the following events are monitored to occur, including:
at least one stored audio is selected by a user, or at least one stored second video resource is selected by a user, or at least one stored audio is clicked by a user, or at least one stored second video resource is clicked by a user, or at least one audio is uploaded, or at least one second video resource is uploaded, or at least one audio is recorded instantly, or at least one second video resource is shot instantly.
Furthermore, the implementation manner of determining the second person from the triggered alternative resource may be: and determining the person selected by the user in the triggered alternative resource as a second person.
Or, when the triggered alternative resource is audio, determining the implementation manner of the second person from the triggered alternative resource may be: identifying all characters in the triggered replacement resources, calculating the audio time length of each character, respectively calculating the ratio of the audio time length of each character to the total audio time length, and determining a second character according to the ratio of each character. For example, a predetermined number of persons with a larger ratio are determined as the second person.
The preset number here is the same as the preset number when the first person is determined in S102.
If the ratio is larger than 2 persons, the second person is determined.
In addition, when the triggered alternative resource is the second video resource, the implementation manner of determining the second person from the triggered alternative resource may be: and identifying all the persons in the triggered replacement resource, and determining a second person according to the importance degree of each person.
And e.g. according to the ranking of the importance degrees of the characters from high to low, determining the preset number of the characters ranked at the top as the second character.
The preset number here is the same as the preset number when the first person is determined in S102.
For example, 2 persons having a higher degree of importance are set as the second persons.
The calculation method for the degree of importance includes but is not limited to:
for any person i, it is determined that all frames of any person i exist.
The degree of importance of any person i is determined according to the following formula.
Figure BDA0001960841810000111
Wherein, WiN is the degree of importance of any character iiThe total number of frames for any character i to exist, N is the total number of frames of the second video resource, Ti1For the total length of appearance of any character i, T1Total duration of video for triggered replacement resource, Ti2Total duration of audio, T, for any character i2Total duration of audio for triggered replacement resource, s1Total effective video duration, s, for triggered character of replacement resource2The total length of valid audio for the triggered character of the replacement resource.
The total effective video time length of the person of the triggered alternative resource is the video time length of the person in the triggered alternative resource, and the time of only having a windy scene or beginning and end of a film is not in the time length range. The total time length of the effective audio of the character of the triggered replacement resource is the time length of the character audio in the triggered replacement resource, and the time length of the character without speaking is not within the long range when the character is in the windy scene or the head and the tail of the character.
Taking a video with 5 frames in total, a total duration of 3 seconds and an audio duration of 2 seconds as an example, for any person i, it is determined that all frames (e.g., frame 1 and frame 3) of the person i exist.
Importance of any character i
Figure BDA0001960841810000121
Wherein, WiN is the degree of importance of character iiTotal number of frames for existence of character i (n)i2), N is the total number of frames of the second video asset (N is 5), Ti1For the total length of occurrence of character i, T1Total duration of video (T) for triggered replacement resource13 seconds), Ti2Total duration of audio for character i, T2Total duration of audio (T) for triggered replacement resource22 seconds), s1Active video time count for triggered character of replacement resourceLength, s2The total length of valid audio for the triggered character of the replacement resource.
In addition, when there are a plurality of first persons, the determination method of the corresponding relationship between the second person and the first person is not limited in this embodiment. The person may be designated manually, or the second person ranked first may be associated with the first person ranked first.
S105, determining a second audio characteristic of the second person.
The content of the audio feature here is the same as the audio feature in S103.
The "second" in this step is only to distinguish from the audio feature of the first person in S103, and does not have any practical meaning.
The audio features include, pitch, loudness, timbre, speech rate, and language style.
Where the tone is represented by the frequency of the sound wave.
Loudness is expressed in the amplitude of vibration of the sound wave.
The tone is represented by the vibration waveform of the sound wave.
The speech rate is expressed in words per minute.
The language style is determined by:
s302-1, the audio of the second person is obtained.
S302-2, performing voice recognition on the audio obtained in S302-1, and determining a second sound characteristic.
The sound features include: word pronunciation tone, pause between words, sentence pronunciation tone, accent position, pronunciation rhythm.
The accent includes: parallel stress, contrast stress, responsiveness stress, progressive stress, turning stress, positive stress, emphatic stress, metaphorical stress, pseudovocalic stress, antisense stress.
S302-3, converting the audio obtained in S302-1 into a second text.
S302-4, performing semantic analysis on the second text, and determining second word characteristics.
The word characteristics include: spoken words, modifiers, word combinations, ellipses.
And S302-5, taking the second sound characteristic and the second word characteristic as the language style in the second audio characteristic of the second character.
In addition, when there are a plurality of first persons, a second person corresponding to each first person is determined in S104, and a second audio feature of each second person is determined in this step.
And S106, determining a replacement audio characteristic according to the second audio characteristic and the first audio characteristic.
When the number of the first persons is multiple, the number of the second persons is multiple, and the step determines the replacement audio characteristics of each first person according to the first audio characteristics of the first person and the second audio characteristics of the second person corresponding to the first person.
That is, when the first person is p and q, and the second person corresponding to p is p ', and the second person corresponding to q is q ', this step determines the replacement audio feature for p according to the audio feature of p and the audio feature of p ' corresponding to p. And determining the replacement audio characteristic aiming at q according to the audio characteristic of q and the audio characteristic of q' corresponding to q.
The implementation for determining the replacement audio feature from the second audio feature and the first audio feature is as follows:
s106-1, acquiring a first tone, a first loudness, a first tone, a first speech speed and a first language style in the first audio feature.
S106-2, acquiring a second tone, a second loudness, a second tone, a second speech speed and a second language style in the second audio feature.
S106-3, determining the average of the first tone and the second tone as the tone in the replacement audio feature.
Since the pitch is represented by a frequency, a frequency representing a pitch in the replacement audio feature ═ 2 (a frequency representing the first pitch + a frequency representing the second pitch).
And S106-4, determining the first loudness as the loudness in the replacement audio feature.
S106-5, determining the second tone color as the tone color in the replacement audio feature.
S106-6, the value A of the following formula3Determined to adjust to the speech rate in the replacement audio feature.
Figure BDA0001960841810000141
Wherein A is3To replace speech rate in audio features, A1Is the first speech rate, A2At the second speech rate, B1For interword pauses in a first language style, B2Is an interword pause in a second language style.
And S106-7, determining the sum of the word characteristics in the first language style and the word characteristics in the second language style as the word characteristics of the language style in the replacement audio characteristics.
Word features include words such as spoken words, modifiers, word combinations, ellipses, and the like. Combining a word set formed by the word characteristics in the first language style with a word set formed by the word characteristics in the second language style, and determining the combined word set as the word characteristics replacing the language style in the audio characteristics.
And S106-8, determining the sound features in the second language style as the sound features of the language style in the replacement audio features.
S107, adjusting the sound of the first person according to the replaced audio features.
Based on the tone, the loudness, the timbre, the speech speed and the language style in the replaced audio features as the audio features of the first person, the speech of the first person is re-pronounced according to the tone, the loudness, the timbre, the speech speed and the language style in the replaced audio features, and then the sound of one person in the first video resource is replaced by the sound with the replaced audio features. Because the replacement audio feature is obtained based on the second person, the method provided by the embodiment can replace the voice of one person in the first video with the voice of the user, realize the voice change of the person after the video resource is made, and improve the participation and the interactivity.
In addition, in order to avoid abrupt and uncoordinated sound caused by tone, loudness, timbre, speech rate, language style and the like after replacement, in the embodiment, the audio features of the user are fused instead of being directly used during replacement, so that the final audio features are formed for occurrence, and the replacement effect is improved.
It should be noted that "first" and "second" in this embodiment and subsequent embodiments are only serial numbers, and are used to distinguish different characters, audio features, video resources, texts, and the like, and have no other meaning.
The method provided by the invention comprises the steps of determining a first video resource; determining a first person in a first video resource; determining a first audio characteristic of a first persona; determining a second person corresponding to the first person, the second person being different from the first person; determining a second audio characteristic of a second persona; determining a replacement audio feature from the second audio feature and the first audio feature; adjusting the sound of the first person according to the replacement audio features; the audio features comprise tone, loudness, timbre, speech rate and language style, so that the character voice change after the video resource is made is realized, and the participation and the interactivity are improved.
Referring to fig. 2, the present embodiment provides an electronic apparatus including: memory 201, processor 202, bus 203, and computer programs stored on memory 201 and executable on processor 202.
The processor 202 implements the following method when executing the program:
s101, determining a first video resource;
s102, determining a first person in a first video resource;
s103, determining a first audio characteristic of a first person;
s104, determining a second person corresponding to the first person, wherein the second person is different from the first person;
s105, determining a second audio characteristic of a second person;
s106, determining a replacement audio characteristic according to the second audio characteristic and the first audio characteristic;
s107, adjusting the sound of the first person according to the replaced audio features;
the audio features include: pitch, loudness, timbre, speech rate, language style.
Optionally, S102 includes:
s102-1, determining the total occurrence duration of each character and the total audio duration of each character in the first video resource;
s102-2, determining the ranking value of each person according to the following formula:
Ce=Te2/Te1
wherein e is any person in the first video resource, CeIs the ranking value, T, of any character e in the first video resourcee1Is the total occurrence time length T of any character e in the first video resourcee2The total audio time length of any character e in the first video resource is obtained;
s102-3, sequencing all the people in the first video resource from large to small according to the sequencing value;
s102-4, determining the preset number of people ranked in the front as first people;
when the number of the first people is 1, the number of the second people is 1;
when the number of the first people is multiple, the number of the second people is the same as that of the first people, each second person corresponds to one unique first person, and the second people are different from the corresponding first people.
Optionally, S104 includes:
monitoring whether at least one replacement resource is triggered;
when at least one alternative resource is triggered, determining a second person from the triggered alternative resource;
wherein the at least one replacement resource is triggered, comprising:
at least one stored audio is selected; alternatively, the first and second electrodes may be,
at least one stored second video asset is selected; alternatively, the first and second electrodes may be,
at least one stored audio is clicked on; alternatively, the first and second electrodes may be,
at least one stored second video asset is clicked on; alternatively, the first and second electrodes may be,
at least one audio is uploaded; alternatively, the first and second electrodes may be,
at least one second video asset is uploaded; alternatively, the first and second electrodes may be,
at least one audio is recorded instantly; alternatively, the first and second electrodes may be,
at least one second video asset is shot instantly;
the second video asset is different from the first video asset.
Optionally, the first video resource is a dynamic image resource containing audio, and the dynamic image is a movie, a television, an animation, a game, a self-portrait video, an advertisement video, or a small video;
the second video resource is a dynamic image resource containing audio, and the dynamic image is a movie, a television, an animation, a game, a self-portrait video, an advertisement video or a small video.
Optionally, determining the second person from the triggered alternative resource includes:
determining the person selected by the user in the triggered alternative resources as a second person; alternatively, the first and second electrodes may be,
when the triggered replacement resource is audio, identifying all characters in the triggered replacement resource, calculating the audio time length of each character, respectively calculating the ratio of the audio time length of each character to the total audio time length, and determining a second character according to the ratio of each character; alternatively, the first and second electrodes may be,
and when the triggered alternative resource is the second video resource, identifying all the persons in the triggered alternative resource, and determining the second person according to the importance degree of each person.
Alternatively, the importance level of each character is determined by:
determining all frames of any person i to exist aiming at any person i;
the importance level of any person i is determined according to the following formula:
Figure BDA0001960841810000171
wherein, WiN is the degree of importance of any character iiThe total number of frames for any character i to exist, N is the total number of frames of the second video resource, Ti1For the total length of appearance of any character i, T1Total duration of video for triggered replacement resource, Ti2Total duration of audio, T, for any character i2Total duration of audio for triggered replacement resource, s1Total effective video duration, s, for triggered character of replacement resource2The total length of valid audio for the triggered character of the replacement resource.
Optionally, the linguistic style in the first audio feature of the first person is determined by:
s301-1, acquiring all audio of a first person in a first video resource;
s301-2, performing voice recognition on the audio obtained in S301-1, and determining a first sound characteristic;
s301-3, converting the audio obtained in S301-1 into a first text;
s301-4, performing semantic analysis on the first text to determine first word characteristics;
s301-5, taking the first sound characteristic and the first word characteristic as the language style in the first audio characteristic of the first character;
the linguistic style in the second audio feature of the second persona is determined by:
s302-1, acquiring the audio of a second person;
s302-2, performing voice recognition on the audio obtained in the S302-1, and determining a second voice characteristic;
s302-3, converting the audio obtained in the S302-1 into a second text;
s302-4, performing semantic analysis on the second text to determine second word characteristics;
s302-5, taking the second sound characteristic and the second word characteristic as the language style in the second audio characteristic of the second character;
the sound features include: the pronunciation rhythm, the pause between words, the pronunciation tone of sentence, the position of accent and pronunciation rhythm;
the accent includes: parallel stress, contrast stress, responsiveness stress, progressive stress, turning stress, positive stress, emphatic stress, metaphorical stress, anaudic stress, antisense stress;
the word characteristics include: spoken words, modifiers, word combinations, ellipses.
Optionally, S106 includes:
s106-1, acquiring a first tone, a first loudness, a first tone, a first speech speed and a first language style in the first audio characteristic;
s106-2, acquiring a second tone, a second loudness, a second tone, a second speech speed and a second language style in the second audio characteristic;
s106-3, determining the average value of the first tone and the second tone as the tone in the replacement audio feature;
s106-4, determining the first loudness as the loudness in the replacement audio features;
s106-5, determining the second tone as the tone in the replacement audio feature;
s106-6, the value A of the following formula3Determining to adjust to speech rate in the replacement audio feature:
Figure BDA0001960841810000181
wherein A is3To replace speech rate in audio features, A1Is the first speech rate, A2At the second speech rate, B1For interword pauses in a first language style, B2An interword pause in a second language style;
s106-7, determining the sum of the word characteristics in the first language style and the word characteristics in the second language style as the word characteristics of the language style in the replacement audio characteristics;
and S106-8, determining the sound features in the second language style as the sound features of the language style in the replacement audio features.
The electronic device provided by the embodiment determines a first video resource; determining a first person in a first video resource; determining a first audio characteristic of a first persona; determining a second person corresponding to the first person, the second person being different from the first person; determining a second audio characteristic of a second persona; determining a replacement audio feature from the second audio feature and the first audio feature; adjusting the sound of the first person according to the replacement audio features; the audio features comprise tone, loudness, timbre, speech rate and language style, so that the character voice change after the video resource is made is realized, and the participation and the interactivity are improved.
The present embodiments provide a computer storage medium that performs the following operations:
s101, determining a first video resource;
s102, determining a first person in a first video resource;
s103, determining a first audio characteristic of a first person;
s104, determining a second person corresponding to the first person, wherein the second person is different from the first person;
s105, determining a second audio characteristic of a second person;
s106, determining a replacement audio characteristic according to the second audio characteristic and the first audio characteristic;
s107, adjusting the sound of the first person according to the replaced audio features;
the audio features include: pitch, loudness, timbre, speech rate, language style.
Optionally, S102 includes:
s102-1, determining the total occurrence duration of each character and the total audio duration of each character in the first video resource;
s102-2, determining the ranking value of each person according to the following formula:
Ce=Te2/Te1
wherein e is any person in the first video resource, CeIs the ranking value, T, of any character e in the first video resourcee1The total time length of the appearance of any person e in the first video resource,Te2the total audio time length of any character e in the first video resource is obtained;
s102-3, sequencing all the people in the first video resource from large to small according to the sequencing value;
s102-4, determining the preset number of people ranked in the front as first people;
when the number of the first people is 1, the number of the second people is 1;
when the number of the first people is multiple, the number of the second people is the same as that of the first people, each second person corresponds to one unique first person, and the second people are different from the corresponding first people.
Optionally, S104 includes:
monitoring whether at least one replacement resource is triggered;
when at least one alternative resource is triggered, determining a second person from the triggered alternative resource;
wherein the at least one replacement resource is triggered, comprising:
at least one stored audio is selected; alternatively, the first and second electrodes may be,
at least one stored second video asset is selected; alternatively, the first and second electrodes may be,
at least one stored audio is clicked on; alternatively, the first and second electrodes may be,
at least one stored second video asset is clicked on; alternatively, the first and second electrodes may be,
at least one audio is uploaded; alternatively, the first and second electrodes may be,
at least one second video asset is uploaded; alternatively, the first and second electrodes may be,
at least one audio is recorded instantly; alternatively, the first and second electrodes may be,
at least one second video asset is shot instantly;
the second video asset is different from the first video asset.
Optionally, the first video resource is a dynamic image resource containing audio, and the dynamic image is a movie, a television, an animation, a game, a self-portrait video, an advertisement video, or a small video;
the second video resource is a dynamic image resource containing audio, and the dynamic image is a movie, a television, an animation, a game, a self-portrait video, an advertisement video or a small video.
Optionally, determining the second person from the triggered alternative resource includes:
determining the person selected by the user in the triggered alternative resources as a second person; alternatively, the first and second electrodes may be,
when the triggered replacement resource is audio, identifying all characters in the triggered replacement resource, calculating the audio time length of each character, respectively calculating the ratio of the audio time length of each character to the total audio time length, and determining a second character according to the ratio of each character; alternatively, the first and second electrodes may be,
and when the triggered alternative resource is the second video resource, identifying all the persons in the triggered alternative resource, and determining the second person according to the importance degree of each person.
Alternatively, the importance level of each character is determined by:
determining all frames of any person i to exist aiming at any person i;
the importance level of any person i is determined according to the following formula:
Figure BDA0001960841810000211
wherein, WiN is the degree of importance of any character iiThe total number of frames for any character i to exist, N is the total number of frames of the second video resource, Ti1For the total length of appearance of any character i, T1Total duration of video for triggered replacement resource, Ti2Total duration of audio, T, for any character i2Total duration of audio for triggered replacement resource, s1Total effective video duration, s, for triggered character of replacement resource2The total length of valid audio for the triggered character of the replacement resource.
Optionally, the linguistic style in the first audio feature of the first person is determined by:
s301-1, acquiring all audio of a first person in a first video resource;
s301-2, performing voice recognition on the audio obtained in S301-1, and determining a first sound characteristic;
s301-3, converting the audio obtained in S301-1 into a first text;
s301-4, performing semantic analysis on the first text to determine first word characteristics;
s301-5, taking the first sound characteristic and the first word characteristic as the language style in the first audio characteristic of the first character;
the linguistic style in the second audio feature of the second persona is determined by:
s302-1, acquiring the audio of a second person;
s302-2, performing voice recognition on the audio obtained in the S302-1, and determining a second voice characteristic;
s302-3, converting the audio obtained in the S302-1 into a second text;
s302-4, performing semantic analysis on the second text to determine second word characteristics;
s302-5, taking the second sound characteristic and the second word characteristic as the language style in the second audio characteristic of the second character;
the sound features include: the pronunciation rhythm, the pause between words, the pronunciation tone of sentence, the position of accent and pronunciation rhythm;
the accent includes: parallel stress, contrast stress, responsiveness stress, progressive stress, turning stress, positive stress, emphatic stress, metaphorical stress, anaudic stress, antisense stress;
the word characteristics include: spoken words, modifiers, word combinations, ellipses.
Optionally, S106 includes:
s106-1, acquiring a first tone, a first loudness, a first tone, a first speech speed and a first language style in the first audio characteristic;
s106-2, acquiring a second tone, a second loudness, a second tone, a second speech speed and a second language style in the second audio characteristic;
s106-3, determining the average value of the first tone and the second tone as the tone in the replacement audio feature;
s106-4, determining the first loudness as the loudness in the replacement audio features;
s106-5, determining the second tone as the tone in the replacement audio feature;
s106-6, the value A of the following formula3Determining to adjust to speech rate in the replacement audio feature:
Figure BDA0001960841810000221
wherein A is3To replace speech rate in audio features, A1Is the first speech rate, A2At the second speech rate, B1For interword pauses in a first language style, B2An interword pause in a second language style;
s106-7, determining the sum of the word characteristics in the first language style and the word characteristics in the second language style as the word characteristics of the language style in the replacement audio characteristics;
and S106-8, determining the sound features in the second language style as the sound features of the language style in the replacement audio features.
The computer storage medium provided by the embodiment determines a first video resource; determining a first person in a first video resource; determining a first audio characteristic of a first persona; determining a second person corresponding to the first person, the second person being different from the first person; determining a second audio characteristic of a second persona; determining a replacement audio feature from the second audio feature and the first audio feature; adjusting the sound of the first person according to the replacement audio features; the audio features comprise tone, loudness, timbre, speech rate and language style, so that the character voice change after the video resource is made is realized, and the participation and the interactivity are improved.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Finally, it should be noted that: the above-mentioned embodiments are only used for illustrating the technical solution of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A sound replacement method, characterized in that the method comprises:
s101, determining a first video resource;
s102, determining a first person in the first video resource;
s103, determining a first audio characteristic of the first person;
s104, determining a second person corresponding to the first person, wherein the second person is different from the first person;
s105, determining a second audio characteristic of the second person;
s106, determining a replacement audio feature according to the second audio feature and the first audio feature;
s107, adjusting the sound of the first person according to the replaced audio features;
the audio features include: tone, loudness, timbre, speech rate, language style;
wherein the speech rate is the number of words per minute in the audio;
the S102 includes:
s102-1, determining the total occurrence duration of each character and the total audio duration of each character in the first video resource;
s102-2, determining the ranking value of each person according to the following formula:
Ce=Te2/Te1
wherein e is any person in the first video resource, CeIs the ranking value, T, of any character e in the first video resourcee1Is the total occurrence time length T of any character e in the first video resourcee2The total audio time length of any character e in the first video resource is obtained;
s102-3, sequencing all the people in the first video resource from large to small according to the sequencing value;
s102-4, determining the preset number of people ranked in the front as first people;
when the number of the first people is 1, the number of the second people is 1;
when the number of the first people is multiple, the number of the second people is the same as that of the first people, each second person corresponds to one unique first person, and the second people are different from the corresponding first people.
2. The method of claim 1, wherein the S104 comprises:
monitoring whether at least one replacement resource is triggered;
when at least one alternative resource is triggered, determining a second person from the triggered alternative resource;
wherein the at least one replacement resource is triggered, comprising:
at least one stored audio is selected; alternatively, the first and second electrodes may be,
at least one stored second video asset is selected; alternatively, the first and second electrodes may be,
at least one stored audio is clicked on; alternatively, the first and second electrodes may be,
at least one stored second video asset is clicked on; alternatively, the first and second electrodes may be,
at least one audio is uploaded; alternatively, the first and second electrodes may be,
at least one second video asset is uploaded; alternatively, the first and second electrodes may be,
at least one audio is recorded instantly; alternatively, the first and second electrodes may be,
at least one second video asset is shot instantly;
the second video asset is different from the first video asset.
3. The method of claim 2, wherein the first video resource is a dynamic image resource containing audio, and the dynamic image is a movie, a television, an animation, a game, a self-portrait video, an advertisement video, or a small video;
the second video resource is a dynamic image resource containing audio, and the dynamic image is a movie, a television, an animation, a game, a self-portrait video, an advertisement video or a small video.
4. The method of claim 3, wherein determining the second persona from the triggered alternative resource comprises:
determining the person selected by the user in the triggered alternative resources as a second person; alternatively, the first and second electrodes may be,
when the triggered replacement resource is audio, identifying all characters in the triggered replacement resource, calculating the audio time length of each character, respectively calculating the ratio of the audio time length of each character to the total audio time length, and determining a second character according to the ratio of each character; alternatively, the first and second electrodes may be,
and when the triggered alternative resource is the second video resource, identifying all the persons in the triggered alternative resource, and determining the second person according to the importance degree of each person.
5. The method of claim 4, wherein the importance level of each character is determined by:
determining that all frames of any person i exist aiming at the person i;
the importance level of any character i is determined according to the following formula:
Figure FDA0002736418940000031
wherein, WiN is the degree of importance of any character iiThe total number of frames for which any character i exists, N is the total number of frames of the second video resource, Ti1Is the total time length of the appearance of any character i, T1Total duration of video for triggered replacement resource, Ti2Is the total audio duration, T, of the character i2Total duration of audio for triggered replacement resource, s1Total effective video duration, s, for triggered character of replacement resource2The total effective audio duration of the triggered character of the replacement resource;
the total video duration of the triggered replacement resource is the total duration of the occurrence of N frames of the second video resource;
the total occurrence time length of any character i is niThe total length of time that a frame occurs;
the total audio duration of the triggered replacement resource is the total duration of audio in the second video resource;
the total audio duration of any character i is the total audio duration of the character i appearing in the second video resource;
the total time length of the person effective videos of the triggered replacement resources is the total time length of frames in which persons appear in the triggered replacement resources;
the total duration of the character valid audio of the triggered replacement resource is the duration of the character audio in the triggered replacement resource.
6. The method according to claim 1, wherein the S106 comprises:
s106-1, acquiring a first tone, a first loudness, a first tone, a first speech speed and a first language style in the first audio characteristic;
a first linguistic style in a first audio feature of the first persona is determined by:
s301-1, acquiring all audio of a first person in a first video resource;
s301-2, performing voice recognition on the audio obtained in S301-1, and determining a first sound characteristic;
s301-3, converting the audio obtained in S301-1 into a first text;
s301-4, performing semantic analysis on the first text to determine first word characteristics;
s301-5, taking the first sound characteristic and the first word characteristic as a first language style in a first audio characteristic of the first character;
the sound features include: the pronunciation rhythm, the pause between words, the pronunciation tone of sentence, the position of accent and pronunciation rhythm;
the inter-word pause is the pause time between words;
the accent includes: parallel stress, contrast stress, responsiveness stress, progressive stress, turning stress, positive stress, emphatic stress, metaphorical stress, anaudic stress, antisense stress;
the word features include: spoken words, modifiers, word combinations, ellipses;
s106-2, acquiring a second tone, a second loudness, a second tone, a second speech speed and a second language style in the second audio characteristic;
a second linguistic style in a second audio feature of the second persona is determined by:
s302-1, acquiring the audio of a second person;
s302-2, performing voice recognition on the audio obtained in the S302-1, and determining a second voice characteristic;
s302-3, converting the audio obtained in the S302-1 into a second text;
s302-4, performing semantic analysis on the second text to determine second word characteristics;
s302-5, taking the second sound characteristic and the second word characteristic as language styles in the second sound characteristic of the second character;
s106-3, determining the average value of the first tone and the second tone as the tone in the replacement audio feature;
s106-4, determining the first loudness as the loudness in the replacement audio features;
s106-5, determining the second tone as the tone in the replacement audio feature;
s106-6, the value A of the following formula3Determining to adjust to speech rate in the replacement audio feature:
Figure FDA0002736418940000051
wherein A is3To replace speech rate in audio features, A1Is the first speech rate, A2At the second speech rate, B1For interword pauses in a first language style, B2An interword pause in a second language style;
s106-7, determining the sum of the word characteristics in the first language style and the word characteristics in the second language style as the word characteristics of the language style in the replacement audio characteristics;
and S106-8, determining the sound features in the second language style as the sound features of the language style in the replacement audio features.
7. An electronic device comprising a memory, a processor, a bus and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the method of any of claims 1-6.
8. A computer storage medium having a computer program stored thereon, characterized in that: the program when executed by a processor implementing the method of any one of claims 1 to 6.
CN201910082625.XA 2019-01-28 2019-01-28 Sound replacement method, electronic device, and storage medium Active CN109841225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910082625.XA CN109841225B (en) 2019-01-28 2019-01-28 Sound replacement method, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910082625.XA CN109841225B (en) 2019-01-28 2019-01-28 Sound replacement method, electronic device, and storage medium

Publications (2)

Publication Number Publication Date
CN109841225A CN109841225A (en) 2019-06-04
CN109841225B true CN109841225B (en) 2021-04-30

Family

ID=66884289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910082625.XA Active CN109841225B (en) 2019-01-28 2019-01-28 Sound replacement method, electronic device, and storage medium

Country Status (1)

Country Link
CN (1) CN109841225B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110475157A (en) * 2019-07-19 2019-11-19 平安科技(深圳)有限公司 Multimedia messages methods of exhibiting, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006079813A1 (en) * 2005-01-27 2006-08-03 Synchro Arts Limited Methods and apparatus for use in sound modification
CN101563698A (en) * 2005-09-16 2009-10-21 富利克索尔股份有限公司 Personalizing a video
WO2013152453A1 (en) * 2012-04-09 2013-10-17 Intel Corporation Communication using interactive avatars
CN107333071A (en) * 2017-06-30 2017-11-07 北京金山安全软件有限公司 Video processing method and device, electronic equipment and storage medium
CN108305636A (en) * 2017-11-06 2018-07-20 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006079813A1 (en) * 2005-01-27 2006-08-03 Synchro Arts Limited Methods and apparatus for use in sound modification
CN101563698A (en) * 2005-09-16 2009-10-21 富利克索尔股份有限公司 Personalizing a video
WO2013152453A1 (en) * 2012-04-09 2013-10-17 Intel Corporation Communication using interactive avatars
CN107333071A (en) * 2017-06-30 2017-11-07 北京金山安全软件有限公司 Video processing method and device, electronic equipment and storage medium
CN108305636A (en) * 2017-11-06 2018-07-20 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device

Also Published As

Publication number Publication date
CN109841225A (en) 2019-06-04

Similar Documents

Publication Publication Date Title
CN110446115B (en) Live broadcast interaction method and device, electronic equipment and storage medium
US20170270965A1 (en) Method and device for accelerated playback, transmission and storage of media files
CN110267052B (en) Intelligent barrage robot based on real-time emotion feedback
CN109618223B (en) Sound replacing method
CN112399269B (en) Video segmentation method, device, equipment and storage medium
Sánchez Translating blackness in Spanish dubbing
CN110784662A (en) Method, system, device and storage medium for replacing video background
CN112738557A (en) Video processing method and device
CN113392273A (en) Video playing method and device, computer equipment and storage medium
CN109841225B (en) Sound replacement method, electronic device, and storage medium
CN109376145B (en) Method and device for establishing movie and television dialogue database and storage medium
US11687576B1 (en) Summarizing content of live media programs
CN111460094A (en) Method and device for optimizing audio splicing based on TTS (text to speech)
van Waterschoot et al. BLISS: An agent for collecting spoken dialogue data about health and well-being
CN114708869A (en) Voice interaction method and device and electric appliance
CN108986785B (en) Text recomposition method and device
KR20230087577A (en) Control Playback of Scene Descriptions
CN114283820A (en) Multi-character voice interaction method, electronic equipment and storage medium
Ohnaka et al. Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images
Rahman An ay for an ah: Language of survival in African American narrative comedy
KR20220123170A (en) Language Learning System and Method with AI Avatar Tutor
Vergara Uptalk in Spanish dating shows?
Stuart-Smith Bridging the gap (s): The role of style in language change linked to the broadcast media
CN112700520B (en) Formant-based mouth shape expression animation generation method, device and storage medium
CN115171645A (en) Dubbing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant