CN110085244A - Living broadcast interactive method, apparatus, electronic equipment and readable storage medium storing program for executing - Google Patents
Living broadcast interactive method, apparatus, electronic equipment and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN110085244A CN110085244A CN201910368510.7A CN201910368510A CN110085244A CN 110085244 A CN110085244 A CN 110085244A CN 201910368510 A CN201910368510 A CN 201910368510A CN 110085244 A CN110085244 A CN 110085244A
- Authority
- CN
- China
- Prior art keywords
- style
- tone color
- vector
- content
- main broadcaster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 34
- 238000003860 storage Methods 0.000 title claims abstract description 27
- 230000009466 transformation Effects 0.000 claims abstract description 39
- 238000010586 diagram Methods 0.000 claims abstract description 38
- 230000007704 transition Effects 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims description 28
- 238000006243 chemical reaction Methods 0.000 claims description 27
- 238000000605 extraction Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 10
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 claims description 2
- 241001269238 Data Species 0.000 claims 1
- 238000013135 deep learning Methods 0.000 claims 1
- 230000003993 interaction Effects 0.000 abstract description 13
- 230000000694 effects Effects 0.000 abstract description 9
- 238000012546 transfer Methods 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 230000015654 memory Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000010899 nucleation Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 1
- 244000046052 Phaseolus vulgaris Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/762—Media network packet handling at the source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The embodiment of the present application provides a kind of living broadcast interactive method, apparatus, electronic equipment and readable storage medium storing program for executing, by extracting content characteristic figure from the first audio data that main broadcaster inputs and extracting content feature vector by preset characteristic vector pickup network, then content feature vector is converted using target tone color style corresponding style transformation model, obtains the style transition diagram with target tone color style.Then feature inverse transform is carried out to content characteristic pattern and style transition diagram, obtains the second audio data with the target tone color style.Finally, generating the interdynamic video stream of the corresponding virtual image of the main broadcaster according to second audio data, and it is sent to client and plays out.So, it can be directed to any main broadcaster, while not changing audio content, the tone color style during virtual image is broadcast live is converted to target tone color style to interact with spectators, interaction effect during raising live streaming in turn, more Shangdi transfer spectators are interacted with main broadcaster's.
Description
Technical field
This application involves internets, and field is broadcast live, in particular to living broadcast interactive method, apparatus, electronic equipment and can
Read storage medium.
Background technique
Internet live streaming in, with virtual image replace main broadcaster reality image participate in living broadcast interactive, be at present compared with
For a kind of popular direct-seeding.
In current direct-seeding, the tone color of virtual image mostly uses greatly the former tone color style or fixed in advance of main broadcaster
A certain tone color style provides live data streams, can not be converted into other tone color styles and interact with spectators, such nothing
Method meets certain particular demands of specific main broadcaster or niche audience, so that will lead to interaction live streaming effect reduces.Such as spectators can
The sound that can prefer to hear is the tone color style of oneself liked star or the tone color style of people known to oneself.Example again
Such as, main broadcaster may be not intended to the tone color style show of oneself exposing privacy concern to other spectators.
Summary of the invention
In view of this, the embodiment of the present application is designed to provide a kind of living broadcast interactive method, apparatus, electronic equipment and can
Storage medium is read, to solve the above problems.
According to the one aspect of the embodiment of the present application, a kind of electronic equipment is provided, may include that one or more storages are situated between
Matter and one or more processors communicated with storage medium.One or more storage mediums are stored with the executable machine of processor
Device executable instruction.When electronic equipment operation, the processor executes the machine-executable instruction, to execute living broadcast interactive
Method.
According to the another aspect of the embodiment of the present application, a kind of living broadcast interactive method is provided, is applied to main broadcaster end, the main broadcaster
At least one style transformation model is stored in end, every kind of style transformation model is corresponding with a kind of tone color style, the method
Include:
According to the tone color convert requests received, audio frequency characteristics figure, institute are extracted from the first audio data that main broadcaster inputs
Stating audio frequency characteristics figure includes content characteristic figure, and the tone color convert requests include target tone color style;
The content characteristic figure is input to preset characteristic vector pickup network, extracts the content of the content characteristic figure
Feature vector;
The content feature vector is converted using the target tone color style corresponding style transformation model, is obtained
Style transition diagram with the target tone color style;
Feature inverse transform is carried out to the content characteristic figure and the style transition diagram, obtains that there is the target tone color style
Second audio data;
The interdynamic video stream of the corresponding virtual image of the main broadcaster is generated according to the second audio data, and is sent to client
End plays out.
According to the another aspect of the embodiment of the present application, a kind of living broadcast interactive device is provided, is applied to main broadcaster end, the main broadcaster
At least one style transformation model is stored in end, every kind of style transformation model is corresponding with a kind of tone color style, described device
Include:
Extraction module, for being extracted from the first audio data that main broadcaster inputs according to the tone color convert requests received
Audio frequency characteristics figure, the audio frequency characteristics figure include content characteristic figure, and the tone color convert requests include target tone color style;
Input module extracts in described for the content characteristic figure to be input to preset characteristic vector pickup network
Hold the content feature vector of characteristic pattern;
Conversion module, for using the corresponding style transformation model of the target tone color style to the content feature vector
It is converted, obtains the style transition diagram with the target tone color style;
Inverse transform block is had for carrying out feature inverse transform to the content characteristic figure and the style transition diagram
There is the second audio data of the target tone color style;
Sending module is generated, the interaction for generating the corresponding virtual image of the main broadcaster according to the second audio data regards
Frequency flows, and is sent to client and plays out.
According to the another aspect of the embodiment of the present application, a kind of readable storage medium storing program for executing is provided, is stored on the readable storage medium storing program for executing
The step of having machine-executable instruction, above-mentioned living broadcast interactive method can be executed when which is run by processor.
Based on any of the above-described aspect, compared to existing technologies, the embodiment of the present application by inputted from main broadcaster first
Content characteristic figure is extracted in audio data and content feature vector is extracted by preset characteristic vector pickup network, is then used
The corresponding style transformation model of target tone color style converts content feature vector, obtains the wind with target tone color style
Lattice transition diagram.Then feature inverse transform is carried out to content characteristic pattern and style transition diagram, obtained with the target tone color style
Second audio data.Finally, generating the interdynamic video stream of the corresponding virtual image of the main broadcaster according to second audio data, and send
It is played out to client.The audio content provided for any main broadcaster is provided, while not changing audio content,
Tone color style during virtual image is broadcast live is converted to target tone color style to interact with spectators, and then improves live streaming
Interaction effect in the process, more Shangdi transfer spectators interact with main broadcaster's.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows the schematic diagram of live broadcast system provided by the embodiment of the present application;
Fig. 2 shows one of the flow diagrams of living broadcast interactive method provided by the embodiment of the present application;
Fig. 3 shows a kind of boundary that selection target tone color style in Internet application is broadcast live provided by the embodiment of the present application
Face schematic diagram;
Fig. 4 shows the live streaming interface schematic diagram at main broadcaster end provided by the embodiment of the present application;
Fig. 5 shows two of the flow diagram of living broadcast interactive method provided by the embodiment of the present application;
Fig. 6 shows the stream for each sub-steps that step S101 shown in Fig. 5 provided by the embodiment of the present application includes
Journey schematic diagram;
Fig. 7 shows the training flow diagram of style transformation model provided by the embodiment of the present application;
Fig. 8 shows the schematic diagram of electronic equipment provided by the embodiment of the present application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is implemented
The component of example can be arranged and be designed with a variety of different configurations.
Therefore, the detailed description of the embodiments herein provided in the accompanying drawings is not intended to limit below claimed
Scope of the present application, but be merely representative of the selected embodiment of the application.Based on the embodiment in the application, this field is common
Technical staff's every other embodiment obtained without making creative work belongs to the model of the application protection
It encloses.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
Shown in referring to Fig.1, Fig. 1 is the configuration diagram of live broadcast system 10 provided by the embodiments of the present application.For example, live streaming system
System 10 can be the service platform for such as internet live streaming etc.Live broadcast system 10 may include direct broadcast server 200, master
Broadcast end 100 and client 300, direct broadcast server 200 is communicated to connect with main broadcaster end 100 and client 300 respectively, for for
Main broadcaster end 100 and client 300 provide direct broadcast service.Such as main broadcaster end 100 and each is can store in direct broadcast server 200
Corresponding relationship between a direct broadcast band, after client 300 selects direct broadcast band, direct broadcast server 200 can be according to each straight
The corresponding relationship for broadcasting channel Yu main broadcaster end 100 sends live video stream to the client 300 belonged in same direct broadcast band.
In some implement scenes, main broadcaster end 100 and client 300 be may be used interchangeably.For example, the master at main broadcaster end 100
It broadcasts and main broadcaster end 100 can be used to provide live video service for spectators, or check that other main broadcasters provide straight as spectators
Broadcast video.In another example client 300, which also can be used, in the spectators of client 300 watches the live streaming view that main broadcaster of interest provides
Frequently, or as main broadcaster for other spectators provide live video service.In the present embodiment, main broadcaster end 100 and client 300 can be with
Can include but is not limited to any hand-held electronic product based on intelligent operating system, can with user by keyboard,
The input equipments such as dummy keyboard, touch tablet, touch screen and voice-operated device carry out human-computer interaction, such as smart phone, plate
Computer, PC etc..Wherein, intelligent operating system includes but is not limited to any by answering to the various movements of mobile device offer
For enriching the operating system of functions of the equipments, Android (Android), iOS, Windows Phone etc..Wherein, Zhu Boduan
100 and client 300 in can install for providing the internet product of internet direct broadcast service, for example, internet product can
To be application APP relevant to internet direct broadcast service, Web page, small routine used in computer or smart phone
Deng.
In the present embodiment, live broadcast system 10 can also include the video acquisition device for acquiring main broadcaster's video frame of main broadcaster
400, video acquisition device 400 is mounted directly or is integrated in main broadcaster end 100, can also be independently of main broadcaster end 100 and and main broadcaster
100 connection of end.
Referring to shown in Fig. 2, Fig. 2 shows the flow diagrams of living broadcast interactive method provided by the embodiments of the present application, this is straight
Broadcast interactive approach can the main broadcaster end 100 as shown in Fig. 1 execute.It should be appreciated that in other embodiments, the live streaming of the present embodiment
The sequence of interactive approach part step can be exchanged with each other according to actual needs or part steps therein or save
Slightly or delete.The detailed step of the living broadcast interactive method is described below.
Step S110 extracts audio from the first audio data that main broadcaster inputs according to the tone color convert requests received
Characteristic pattern.
Content characteristic figure is input to preset characteristic vector pickup network, extracts the interior of content characteristic figure by step S120
Hold feature vector.
Step S130 converts content feature vector using the corresponding style transformation model of target tone color style, obtains
To the style transition diagram with target tone color style.
Step S140 carries out feature inverse transform to content characteristic pattern and style transition diagram, obtains having the target tone color wind
The second audio data of lattice.
Step S150, the interdynamic video stream of the corresponding virtual image of the main broadcaster is generated according to second audio data, and is sent
It is played out to client 300.
In the present embodiment, for step S110, any main broadcaster can be by clicking the live streaming installed on main broadcaster end 100
Internet application starts and enters live streaming interface and starts to be broadcast live, and live video stream can be generated during live streaming, picture is broadcast live, is straight
Broadcast the data such as frequency, text barrage.
It optionally, may include spectators' selection of the main broadcaster or the direct broadcasting room into the main broadcaster in the tone color convert requests
Target tone color style, which can be understood as the main broadcaster or enters the spectators of direct broadcasting room of the main broadcaster listening
The tone color style heard is wished when aforementioned live audio.For example, the audio data that the main broadcaster may want to oneself output sounds
The tone color style of the idol star liked such as oneself or the tone color style of friend known to oneself or oneself like
Intonation of speaking (such as " Beijing chamber " " Taiwan chamber " etc.) tone color style.In another example may for a part of spectators
Wish that the audio data of the main broadcaster oneself heard output sounds the tone color style of the similar idol star oneself liked, or
The tone color style of friend known to oneself.Based on this, which can both be sent out by the corresponding main broadcaster end 100 of main broadcaster
Out, it can also be issued by the client 300 for entering the spectators of the direct broadcasting room of the main broadcaster.
For example, needle can be set in the interface for the live streaming Internet application installed in main broadcaster end 100 or client 300
To the selection interface of the target tone color style, which shows the option of multiple and different tone color styles, the main broadcaster or
Spectators into the direct broadcasting room of the main broadcaster can select mesh required for oneself from the respective option shown in the selection interface
Then the corresponding option of mark with phonetic symbols color style generates corresponding tone color convert requests by main broadcaster end 100 or client 300.
Only as an example, being answered referring to Fig. 3, showing the live streaming internet installed in main broadcaster end 100 or client 300
Interface schematic diagram shows the option of different tone color styles in the interface, respectively include tone color style A, tone color style B,
The spectators of tone color style C, tone color style D etc., the main broadcaster or the direct broadcasting room into the main broadcaster can select from the selection interface
Select the corresponding option of target tone color style required for oneself.For example, the main broadcaster likes the tone color of oneself known friend A
Style, and tone color style A is the tone color style of friend A, then the main broadcaster can choose tone color style A, then pass through main broadcaster end 100
Generate corresponding tone color convert requests.In another example the spectators of the direct broadcasting room of the main broadcaster like the tone color style that singer opens schoolmate, and
Tone color style B is the tone color style that singer opens schoolmate, then the spectators can choose tone color style B, then raw by client 300
At corresponding tone color convert requests.
Wherein, the first audio data can be the audio data that the main broadcaster prerecords, and be also possible to during live streaming
The audio data exported in real time, the present embodiment are not specifically limited this.
Through present inventor the study found that any a segment of audio data can be indicated by a series of waveform diagram, base
In this, a kind of exemplary approach for extracting the corresponding audio frequency characteristics figure of the first audio data of the main broadcaster be may is that at interval of pre-
If the first audio data is carried out cutting by the time (such as every 10 seconds), multiple audio fragments are obtained, each audio is then extracted
Audiograph, spectrogram or the sound spectrograph of segment or audiograph, spectrogram or the sound spectrograph of each audio fragment carry out at image
Transformed image is managed as audio frequency characteristics figure.The present embodiment, can be to avoid by the way that the first audio data is carried out cutting as a result,
The Caton at the main broadcaster end 100 caused by the amount of audio data disposably handled is excessive, each audio that another aspect cutting obtains
The time span of segment is consistent, can be in order to subsequent processing.
Audio frequency characteristics figure may include content characteristic figure and style and features figure, and style and features figure can be used to indicate that the first sound
Style and features of frequency evidence, such as tone color style etc.;Content characteristic figure can be used to indicate that the content characteristic of the first audio data,
Such as volume, speech content etc..
For step S120, which can use convolutional neural networks, convolutional Neural net
Network is a kind of feedforward neural network, and artificial neuron can respond the surrounding cells in a part of coverage area, for image
Processing has outstanding performance.The abstract characteristics that convolutional neural networks can extract object by multilayer convolution complete object identification.
Based on this, the content feature vector of content characteristic figure can be extracted by convolutional neural networks.Optionally, the preset feature
It is residual that visual graphics generator (Visual Graphics Generator, VGG) model, depth can be used in vector extraction network
Poor network (Deep Residual Network, ResNet) model etc. is used to extract the model of the vector characteristics of image.
In the present embodiment, it is previously stored in main broadcaster end 100 at least one style transformation model in step S130,
Every kind of style transformation model is corresponding with a kind of tone color style, and every kind of style transformation model can be used for the content of any main broadcaster
Characteristic pattern is converted to the style transition diagram with target tone color style.
For step S140, since the style and features figure in original audio frequency characteristics figure, the step is substituted in style transition diagram
In content characteristic figure and the style transition diagram after conversion can be understood as the audio frequency characteristics figure with the target tone color style.For
The audio data that spectators can hear is generated, the present embodiment is also needed the style transition diagram after the content characteristic figure and conversion
Feature inverse transform is carried out, the second audio data with the target tone color style is obtained.In this way, the second audio data combines
The style and features of style transition diagram after the corresponding content characteristic figure of first audio data and conversion, thus do not change this
While the content of one audio data, reach auditory effect corresponding to the target tone color style.
It is worth noting that although the function that can be changed voice in the prior art using some changes of voice (such as old man's sound, little Hai Sheng
Sound etc.) to change one's voice in speech, but the sound effect converted in this scheme is unsatisfactory, is unable to reach preferable effect true to nature
Fruit, and can not still be converted to required tone color style.The technical solution provided through this embodiment, the tone color after conversion are
For the tone color of required target tone color style, there is extremely strong vivid effect.
It further needs exist for illustrating, since the style transformation model that this programme provides can learn corresponding sound
The style and features vector of color style, therefore can be converted into for the arbitrary content of any main broadcaster output with corresponding
Tone color style style transition diagram, without for each main broadcaster individually train style transformation model, greatly reduce trained work
It measures.Wherein, it will be described in detail below for the specific training process of style transformation model.
It, can be in the display interface of direct broadcasting room in order to improve the interest during living broadcast interactive for step S140
Virtual image replaces the reality image of the main broadcaster to interact with spectators.For example, virtual image can imitate in real time the main broadcaster's
The characteristic attributes such as expression, movement interact to represent the main broadcaster with spectators, i.e., spectators can pass through virtual image and the master
It broadcasts and is interacted, which can be any one numerous subscribed in bean vermicelli of main broadcaster.In addition, the first virtual image can imitate
The main broadcaster make to operation relevant with main broadcaster's content or movement, for example, holding a certain product, introducing a certain product etc..
After generating second audio data, be somebody's turn to do corresponding with frame audio frame each in second audio data can be generated in real time
The interdynamic video frame of the corresponding virtual image of main broadcaster.For example, can be by each frame audio frame in identification second audio data
Affective content or particular keywords, the virtual image is then controlled in the form of corresponding emotion behavior according to affective content and is held
Row interaction movement, or go to search the execution interaction of the interaction form of expression corresponding to corresponding keyword according to particular keywords and move
Make, and records the interdynamic video frame when virtual image executes interaction movement.
Then, each audio frame and its corresponding interdynamic video frame are synthesized, obtains the interdynamic video of virtual image
Stream.For example, each audio frame can be directed to, the word content for including in the audio frame is parsed, then by the audio frame, the sound
The corresponding interdynamic video frame of word content and the audio frame for including in frequency frame is synthesized, to obtain the corresponding void of the main broadcaster
Intend the interdynamic video stream of image.On this basis, the interdynamic video stream of virtual image can be sent by direct broadcast server 200
It is played out to client 300.
For example, referring to Fig. 4, a kind of live streaming examples of interfaces figure for showing main broadcaster end 100 can in the live streaming interface
To include that interface display frame, main broadcaster's video frame display box, barrage area, virtual image region and every frame audio frame of main broadcaster is broadcast live
Word content XXXXX.Wherein, live streaming interface display frame for show currently be broadcast live the video flowing being broadcast live in platform or
The complete video stream formed after the completion of live streaming, main broadcaster's video frame display box is for showing video acquisition device collected master in real time
Video frame is broadcast, virtual image region is used to show the virtual image of main broadcaster and the interdynamic video frame of virtual image, and barrage area is used for
Show the interaction content (such as AAAAA, BBBBB, CCCCC, DDDDD, EEEEE) between spectators and main broadcaster.
It is appreciated that live streaming interface shown in Fig. 4 is only to illustrate, the live streaming interface includes may be used also during actually live streaming
With live information area, live information area may include direct broadcasting room title, main broadcaster's user account number, main broadcaster's head portrait, spectators' user account number,
Spectators' head portrait, main broadcaster be concerned number, in the present ranking list that the popularity of main broadcaster, main broadcaster receive at least one of letter
Breath.
In this way, the present embodiment being capable of tone color wind while not changing audio content, during virtual image is broadcast live
Lattice are converted to target tone color style to interact with spectators, and then improve the interaction effect during live streaming, to a greater extent
Ground transfer spectators interact with main broadcaster's.
As a kind of possible embodiment, provided in this embodiment straight referring to Fig. 5, before abovementioned steps S110
Broadcasting method can also include the following steps:
Step S101 obtains the corresponding style transformation model of target tone color style previously according to training sample training, specifically
Referring to Fig. 6, step S101 may include following sub-step:
Sub-step S1011 obtains training sample, and training sample includes the first audio sample and the second audio of any main broadcaster
Sample.
In the present embodiment, the first audio sample can be any audio sample with target tone color style.For example, if mesh
Mark with phonetic symbols color style is the tone color style of some performer A, then can collect the audio data of a large amount of performer A as the first audio sample
This.
In the present embodiment, the second audio sample is not specifically limited, and can be any main broadcaster or other any users
Audio data can be collected as second audio sample.
Please refer to Fig. 7, the training process of the present embodiment be related to feature extraction network, characteristic vector pickup network with
And initial conversion network.Exemplary elaboration is carried out below based on training process of the Fig. 7 to style transformation model in this step S101.
Sub-step S1012 extracts the reference style and features figure of the first audio sample and the content of the second audio sample respectively
Characteristic pattern.
It is shown in Figure 7, it can be according to the above-mentioned side for extracting audio frequency characteristics figure from the first audio data that main broadcaster inputs
Formula extracts the reference style and features figure of the first audio sample and the content characteristic of the second audio sample by feature extraction network
Figure.
Sub-step S1013 is extracted corresponding with reference to style with reference to style and features figure respectively by characteristic vector pickup network
Feature vector and the corresponding content feature vector of content characteristic figure.
Sub-step S1014 obtains mesh according to content feature vector and with reference to style and features vector training initial conversion model
The corresponding style transformation model of mark with phonetic symbols color style, and be stored in main broadcaster end 100.
Exemplary elaboration is carried out below based on detailed training process of the Fig. 7 to this sub-step S1014.
The first, content feature vector is input in initial conversion model, the reference style for generating content feature vector turns
Change figure.
The second, by characteristic vector pickup network extract with reference to style transition diagram it is corresponding with reference to style converting characteristic to
Amount.
Third adjusts just according to content feature vector, with reference to style and features vector and with reference to style converting characteristic vector
The network parameter of beginning transformation model.
In detail, the present embodiment can be calculated with reference to style and features vector and with reference to the between style converting characteristic vector
One vector difference value, and with reference to the secondary vector difference value between style converting characteristic vector and content feature vector.It is optional
The calculation of ground, aforementioned primary vector difference value and primary vector difference value may is that generation content feature vector is corresponding
Content characteristic grayscale image, with reference to style and features vector it is corresponding with reference to style grayscale image and refer to style converting characteristic vector pair
The reference style converting characteristic grayscale image answered.
Then, it calculates with reference to style grayscale image and with reference to the pixel difference value between style converting characteristic grayscale image as the
One vector difference value.For example, can calculate with reference to the gray-scale pixel values of the pixel in style grayscale image and turn with reference to style
The gray scale difference value between the gray-scale pixel values of the pixel of signature grey scale figure corresponding position is changed, and is calculated with reference in style grayscale image
Each pixel and with reference to the squared difference value between corresponding position in style converting characteristic grayscale image.Then, to all
The corresponding squared difference value of pixel is summed, and is obtained with reference to style grayscale image and is referred to style converting characteristic grayscale image
Between pixel difference value as primary vector difference value.
Meanwhile it calculating with reference to the pixel difference value between style converting characteristic grayscale image and content characteristic grayscale image as the
Two vector difference values.For example, can calculate with reference to the gray-scale pixel values of the pixel in style converting characteristic grayscale image and interior
Hold the gray scale difference value between the gray-scale pixel values of the pixel of signature grey scale figure corresponding position, and calculates with reference to style converting characteristic
Squared difference value in each pixel in grayscale image and content characteristic grayscale image between the pixel of corresponding position.Then,
It sums, is obtained with reference to style converting characteristic grayscale image and content characteristic to the corresponding squared difference value of all pixels point
Pixel difference value between grayscale image is as secondary vector difference value.
It is worth noting that those skilled in the art can also increase except aforementioned primary vector is poor in the hands-on stage
Other loss functions except different value and secondary vector difference value, the application do not limit this in detail.
On aforementioned base, backpropagation training can be carried out according to primary vector difference value and secondary vector difference value,
And calculate the gradient of the network parameter of initial conversion model.Then, according to the gradient being calculated, using stochastic gradient descent method
Continue to train after updating the network parameter of initial conversion model, when initial conversion model meets training termination condition, output
The corresponding style transformation model of target tone color style that training obtains.
Wherein, the calculating process of stochastic gradient descent method (can also edge along the direction that gradient declines solution minimum
Gradient ascent direction solves maximum).The direction of gradient decline under normal circumstances, can work as ladder by obtaining to function derivation
Degree vector shows that the amplitude of gradient is also 0 at this time to an extreme point when being 0, and it is optimal to use gradient descent algorithm to carry out
When changing solution, the termination condition of algorithm iteration is the amplitude of gradient vector close to 0, and a very small constant threshold can be set
Value.
Wherein, above-mentioned training termination condition may include at least one of following three kinds of conditions:
1) repetitive exercise number reaches setting number;2) primary vector difference value and secondary vector difference value are lower than setting threshold
Value;3) primary vector difference value and secondary vector difference value no longer decline.
In addition, in the actual implementation process, can also be not limited to using above-mentioned example as training termination condition, this field
Technical staff can design the training termination condition different from above-mentioned example according to actual needs.
Based on the corresponding style transformation model of target tone color style that above-mentioned steps obtain, can be used for any main broadcaster's
The corresponding content characteristic figure of audio data is converted to the style and features transition diagram with target tone color style, is not changing any master
While the audio content for the audio data broadcast, the tone color style during virtual image is broadcast live is converted to target tone color style
To interact with spectators, and then the interaction effect during live streaming is improved, more Shangdi transfer spectators' and main broadcaster is mutual
It is dynamic.And any audio content that the corresponding style transformation model of target tone color style can be exported for any main broadcaster makes
With, no longer need to for each main broadcaster individually train style transformation model, greatly reduce training amount.
It is worth noting that the above is only in the training process of the corresponding style transformation model of preceding aim tone color style,
For the training of the corresponding style transformation model of other tone color styles, it is referred to the associated description of above-described embodiment, herein not
It repeats again.
Fig. 8 shows the schematic diagram of electronic equipment provided by the embodiments of the present application, and in the present embodiment, which can be with
Refer to main broadcaster end 100 shown in FIG. 1 comprising storage medium 110, processor 120 and living broadcast interactive device 500.This implementation
In example, storage medium 110 is respectively positioned in main broadcaster end 100 with processor 120 and the two is separately positioned.It is to be understood, however, that
Storage medium 110 is also possible to independently of except main broadcaster end 100, and can be accessed by processor 120 by bus interface.It can
Alternatively, storage medium 110 is also desirably integrated into processor 120, for example, it may be cache and/or general register.
Storage medium 110 is used as a kind of computer readable storage medium, and it is executable to can be used for storing software program, computer
Program and module, the corresponding program instruction/module of living broadcast interactive method as described in the application any embodiment was (for example, should
Extraction module 510, input module 520, conversion module 530, inverse transform block 540 and the life that living broadcast interactive device 500 includes
At sending module 550).Storage medium 110 can mainly include storing program area and storage data area, wherein storing program area can deposit
Application program needed for storing up operating system, at least one function;Storage data area can be stored to be created according to using for equipment
Data etc..In addition, storage medium 110 may include high-speed random access memory, it can also include nonvolatile memory, example
Such as at least one disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, it stores
Medium 110 can further comprise the memory remotely located relative to processor 120, these remote memories can pass through network
It is connected to equipment.The example of above-mentioned network include but is not limited to internet, intranet, local area network, mobile radio communication and its
Combination.
The function of each functional module of the living broadcast interactive device 500 is described in detail separately below.
Extraction module 510, for being mentioned from the first audio data that main broadcaster inputs according to the tone color convert requests received
Audio frequency characteristics figure is taken, it includes needing the first audio data in tone color convert requests that audio frequency characteristics figure, which includes content characteristic figure,
The target tone color style that tone color style is converted.It is appreciated that the extraction module 510 can be used for executing above-mentioned steps
S110, the detailed implementation about the extraction module 510 are referred to above-mentioned to the related content of step S110.
Input module 520 extracts content characteristic for content characteristic figure to be input to preset characteristic vector pickup network
The content feature vector of figure.It is appreciated that the input module 520 can be used for executing above-mentioned steps S120, about the input mould
The detailed implementation of block 520 is referred to above-mentioned to the related content of step S120.
Conversion module 530, for being carried out using the corresponding style transformation model of target tone color style to content feature vector
Conversion, obtains the style transition diagram with target tone color style.It is appreciated that the conversion module 530 can be used for executing it is above-mentioned
Step S130, the detailed implementation about the conversion module 530 are referred to above-mentioned to the related content of step S130.
Inverse transform block 540 obtains having the mesh for carrying out feature inverse transform to content characteristic pattern and style transition diagram
The second audio data of mark with phonetic symbols color style.It is appreciated that the inverse transform block 540 can be used for executing above-mentioned steps S140, close
It is referred in the detailed implementation of the inverse transform block 540 above-mentioned to the related content of step S140.
Sending module 550 is generated, the interaction for generating the corresponding virtual image of the main broadcaster according to second audio data regards
Frequency flows, and is sent to client 300 and plays out.It is appreciated that the generation sending module 550 can be used for executing above-mentioned steps
S150, the detailed implementation about the generation sending module 550 are referred to above-mentioned to the related content of step S150.
Further, the embodiment of the present application also provides a kind of computer readable storage medium, computer readable storage medium
It is stored with machine-executable instruction, machine-executable instruction, which is performed, realizes living broadcast interactive method provided by the above embodiment.
The above, the only various embodiments of the application, but the protection scope of the application is not limited thereto, it is any
Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain
Lid is within the scope of protection of this application.Therefore, the protection scope of the application shall be subject to the protection scope of the claim.
Claims (12)
1. a kind of living broadcast interactive method, which is characterized in that be applied to main broadcaster end, be stored at least one style in the main broadcaster end
Transformation model, every kind of style transformation model are corresponding with a kind of tone color style, which comprises
According to the tone color convert requests received, audio frequency characteristics figure, the sound are extracted from the first audio data that main broadcaster inputs
Frequency characteristic pattern includes content characteristic figure, and the tone color convert requests include target tone color style;
The content characteristic figure is input to preset characteristic vector pickup network, extracts the content characteristic of the content characteristic figure
Vector;
The content feature vector is converted using the target tone color style corresponding style transformation model, is had
The style transition diagram of the target tone color style;
Feature inverse transform is carried out to the content characteristic figure and the style transition diagram, obtains the with the target tone color style
Two audio datas;
The interdynamic video stream of the corresponding virtual image of the main broadcaster is generated according to the second audio data, and be sent to client into
Row plays.
2. living broadcast interactive method according to claim 1, which is characterized in that the style transformation model utilizes the first audio
Sample and the second audio sample of any main broadcaster are obtained based on the neural metwork training of deep learning, wherein first audio
Sample has the target tone color style.
3. living broadcast interactive method according to claim 1, which is characterized in that described to be asked according to the tone color conversion received
It asks, before extracting audio frequency characteristics figure in the first audio data that main broadcaster inputs, the method also includes:
The corresponding style transformation model of the target tone color style is obtained previously according to training sample training, is specifically included:
Training sample is obtained, the training sample includes the first audio sample and the second audio sample of any main broadcaster, wherein institute
The first audio sample is stated with the target tone color style;
The reference style and features figure of first audio sample and the content characteristic figure of second audio sample are extracted respectively;
It is extracted respectively by described eigenvector extraction network described corresponding with reference to style and features vector with reference to style and features figure
Content feature vector corresponding with the content characteristic figure;
According to the content feature vector and the reference style and features vector training initial conversion model, the target sound is obtained
The corresponding style transformation model of color style, and be stored in the main broadcaster end.
4. living broadcast interactive method according to claim 3, which is characterized in that described according to the content feature vector and institute
It states with reference to style and features vector training initial conversion model, obtains the step of the corresponding style transformation model of the target tone color style
Suddenly, comprising:
The content feature vector is input in initial conversion model, the reference style conversion of the content feature vector is generated
Figure;
It is described corresponding with reference to style converting characteristic vector with reference to style transition diagram that network extraction is extracted by described eigenvector;
It is adjusted according to the content feature vector, the reference style and features vector and the reference style converting characteristic vector
The network parameter of the initial conversion model.
5. living broadcast interactive method according to claim 4, which is characterized in that described according to the content feature vector, institute
It states with reference to style and features vector and the network parameter for adjusting the initial conversion model with reference to style converting characteristic vector
The step of, comprising:
Calculate it is described with reference to style and features vector and the primary vector difference value with reference between style converting characteristic vector with
And the secondary vector difference value with reference between style converting characteristic vector and the content feature vector;
Backpropagation training is carried out according to the primary vector difference value and the secondary vector difference value, and is calculated described initial
The gradient of the network parameter of transformation model;
According to the gradient being calculated, after the network parameter that the initial conversion model is updated using stochastic gradient descent method
Continue to train, when the initial conversion model meets training termination condition, the obtained target tone color wind is trained in output
The corresponding style transformation model of lattice.
6. living broadcast interactive method according to claim 5, which is characterized in that the calculating is described to refer to style and features vector
With the primary vector difference value with reference between style converting characteristic vector and it is described with reference to style converting characteristic vector with
The step of secondary vector difference value between the content feature vector, comprising:
It is corresponding with reference to wind to generate the corresponding content characteristic grayscale image of the content feature vector, the reference style and features vector
Lattice grayscale image and the reference style converting characteristic vector are corresponding with reference to style converting characteristic grayscale image;
It calculates described with reference to style grayscale image and the pixel difference value with reference between style converting characteristic grayscale image is as institute
Primary vector difference value is stated, and calculates the picture with reference between style converting characteristic grayscale image and the content characteristic grayscale image
Plain difference value is as the secondary vector difference value.
7. living broadcast interactive method according to claim 6, which is characterized in that it is described calculate it is described with reference to style grayscale image with
The step of pixel difference value with reference between style converting characteristic grayscale image is as the primary vector difference value, comprising:
Calculate the gray-scale pixel values with reference to the pixel in style grayscale image and described with reference to style converting characteristic gray scale
Gray scale difference value between the gray-scale pixel values of the pixel of figure corresponding position, and calculate described with reference to each of style grayscale image
Pixel and the squared difference value with reference between corresponding position in style converting characteristic grayscale image;
It sums to the corresponding squared difference value of all pixels point, obtains the reference style grayscale image and the reference
Pixel difference value between style converting characteristic grayscale image;
The pixel difference value with reference between style converting characteristic grayscale image and the content characteristic grayscale image is calculated as institute
The step of stating secondary vector difference value, comprising:
Calculate the gray-scale pixel values with reference to the pixel in style converting characteristic grayscale image and the content characteristic gray scale
Gray scale difference value between the gray-scale pixel values of the pixel of figure corresponding position, and calculate described with reference to style converting characteristic grayscale image
In each pixel and the content characteristic grayscale image in corresponding position pixel between squared difference value;
Sum to the corresponding squared difference value of all pixels point, obtain it is described with reference to style converting characteristic grayscale image with
Pixel difference value between the content characteristic grayscale image.
8. living broadcast interactive method according to any one of claims 1-7, which is characterized in that described according to described second
Audio data generates the interdynamic video stream of the corresponding virtual image of the main broadcaster, and is sent to the step of client plays out, packet
It includes:
For each audio frame in the second audio data, the interdynamic video of the corresponding virtual image of the audio frame is generated
Frame;
Each audio frame and its corresponding interdynamic video frame are synthesized, the interdynamic video stream of the virtual image is obtained, and
The interdynamic video stream of the virtual image is sent to client to play out.
9. living broadcast interactive method according to claim 8, which is characterized in that described by each audio frame and its corresponding mutual
The step of dynamic video frame is synthesized, and the interdynamic video stream of the virtual image is obtained, comprising:
For each audio frame, the word content for including in the audio frame is parsed;
The corresponding interdynamic video frame of word content and the audio frame for including in the audio frame, the audio frame is synthesized, from
And obtain the interdynamic video stream of the corresponding virtual image of the main broadcaster.
10. a kind of living broadcast interactive device, which is characterized in that be applied to main broadcaster end, be stored at least one wind in the main broadcaster end
Lattice transformation model, every kind of style transformation model is corresponding with a kind of tone color style, and described device includes:
Extraction module, for extracting audio from the first audio data that main broadcaster inputs according to the tone color convert requests received
Characteristic pattern, the audio frequency characteristics figure include content characteristic figure, include needing first audio in the tone color convert requests
The target tone color style that the tone color style of data is converted;
It is special to extract the content for the content characteristic figure to be input to preset characteristic vector pickup network for input module
Levy the content feature vector of figure;
Conversion module, for being carried out using the corresponding style transformation model of the target tone color style to the content feature vector
Conversion, obtains the style transition diagram with the target tone color style;
Inverse transform block is obtained having and be somebody's turn to do for carrying out feature inverse transform to the content characteristic figure and the style transition diagram
The second audio data of target tone color style;
Sending module is generated, for generating the interdynamic video of the corresponding virtual image of the main broadcaster according to the second audio data
Stream, and be sent to client and play out.
11. a kind of electronic equipment, which is characterized in that the electronic equipment includes one or more storage mediums and one or more
The processor communicated with storage medium, one or more storage mediums are stored with the executable machine-executable instruction of processor,
When electronic equipment operation, processor executes the machine-executable instruction, to realize described in any one of claim 1-9
Living broadcast interactive method.
12. a kind of readable storage medium storing program for executing, which is characterized in that the readable storage medium storing program for executing is stored with machine-executable instruction, described
Machine-executable instruction, which is performed, realizes living broadcast interactive method described in any one of claim 1-9.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910368510.7A CN110085244B (en) | 2019-05-05 | 2019-05-05 | Live broadcast interaction method and device, electronic equipment and readable storage medium |
CN202011508099.8A CN112562705A (en) | 2019-05-05 | 2019-05-05 | Live broadcast interaction method and device, electronic equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910368510.7A CN110085244B (en) | 2019-05-05 | 2019-05-05 | Live broadcast interaction method and device, electronic equipment and readable storage medium |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011508099.8A Division CN112562705A (en) | 2019-05-05 | 2019-05-05 | Live broadcast interaction method and device, electronic equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110085244A true CN110085244A (en) | 2019-08-02 |
CN110085244B CN110085244B (en) | 2020-12-25 |
Family
ID=67418510
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011508099.8A Pending CN112562705A (en) | 2019-05-05 | 2019-05-05 | Live broadcast interaction method and device, electronic equipment and readable storage medium |
CN201910368510.7A Active CN110085244B (en) | 2019-05-05 | 2019-05-05 | Live broadcast interaction method and device, electronic equipment and readable storage medium |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011508099.8A Pending CN112562705A (en) | 2019-05-05 | 2019-05-05 | Live broadcast interaction method and device, electronic equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN112562705A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110956971A (en) * | 2019-12-03 | 2020-04-03 | 广州酷狗计算机科技有限公司 | Audio processing method, device, terminal and storage medium |
CN112164407A (en) * | 2020-09-22 | 2021-01-01 | 腾讯音乐娱乐科技(深圳)有限公司 | Tone conversion method and device |
CN112672207A (en) * | 2020-12-30 | 2021-04-16 | 广州繁星互娱信息科技有限公司 | Audio data processing method and device, computer equipment and storage medium |
CN113792853A (en) * | 2021-09-09 | 2021-12-14 | 北京百度网讯科技有限公司 | Training method of character generation model, character generation method, device and equipment |
CN113823300A (en) * | 2021-09-18 | 2021-12-21 | 京东方科技集团股份有限公司 | Voice processing method and device, storage medium and electronic equipment |
CN114173142A (en) * | 2021-11-19 | 2022-03-11 | 广州繁星互娱信息科技有限公司 | Object live broadcast display method and device, storage medium and electronic equipment |
CN115412773A (en) * | 2021-05-26 | 2022-11-29 | 武汉斗鱼鱼乐网络科技有限公司 | Method, device and system for processing audio data of live broadcast room |
WO2023102932A1 (en) * | 2021-12-10 | 2023-06-15 | 广州虎牙科技有限公司 | Audio conversion method, electronic device, program product, and storage medium |
CN116993918A (en) * | 2023-08-11 | 2023-11-03 | 无锡芯算智能科技有限公司 | Modeling system and method for anchor image based on deep learning |
WO2024109375A1 (en) * | 2022-11-21 | 2024-05-30 | 腾讯科技(深圳)有限公司 | Method and apparatus for training speech conversion model, device, and medium |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114051105B (en) * | 2021-11-09 | 2023-03-10 | 北京百度网讯科技有限公司 | Multimedia data processing method and device, electronic equipment and storage medium |
US20230377556A1 (en) * | 2022-05-23 | 2023-11-23 | Lemon Inc. | Voice generation for virtual characters |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102522084A (en) * | 2011-12-22 | 2012-06-27 | 广东威创视讯科技股份有限公司 | Method and system for converting voice data into text files |
US9026446B2 (en) * | 2011-06-10 | 2015-05-05 | Morgan Fiumi | System for generating captions for live video broadcasts |
CN105488135A (en) * | 2015-11-25 | 2016-04-13 | 广州酷狗计算机科技有限公司 | Live content classification method and device |
CN106601263A (en) * | 2016-12-01 | 2017-04-26 | 武汉斗鱼网络科技有限公司 | Method and system used for acquiring sound of sound card and microphone and audio mixing |
CN107731241A (en) * | 2017-09-29 | 2018-02-23 | 广州酷狗计算机科技有限公司 | Handle the method, apparatus and storage medium of audio signal |
CN107886964A (en) * | 2017-09-25 | 2018-04-06 | 惠州市德赛西威汽车电子股份有限公司 | A kind of audio-frequency processing method and its system |
CN108986190A (en) * | 2018-06-21 | 2018-12-11 | 珠海金山网络游戏科技有限公司 | A kind of method and system of the virtual newscaster based on human-like persona non-in three-dimensional animation |
CN109151366A (en) * | 2018-09-27 | 2019-01-04 | 惠州Tcl移动通信有限公司 | A kind of sound processing method of video calling |
CN109218761A (en) * | 2018-08-07 | 2019-01-15 | 邓德雄 | Method and system for switching between live video and video |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3806263B2 (en) * | 1998-07-16 | 2006-08-09 | ヤマハ株式会社 | Musical sound synthesizer and storage medium |
CN106649703B (en) * | 2016-12-20 | 2019-11-19 | 中国科学院深圳先进技术研究院 | Audio data method for visualizing and device |
CN107767879A (en) * | 2017-10-25 | 2018-03-06 | 北京奇虎科技有限公司 | Audio conversion method and device based on tone color |
CN108200446B (en) * | 2018-01-12 | 2021-04-30 | 北京蜜枝科技有限公司 | On-line multimedia interaction system and method of virtual image |
CN108566558B (en) * | 2018-04-24 | 2023-02-28 | 腾讯科技(深圳)有限公司 | Video stream processing method and device, computer equipment and storage medium |
CN109271553A (en) * | 2018-08-31 | 2019-01-25 | 乐蜜有限公司 | A kind of virtual image video broadcasting method, device, electronic equipment and storage medium |
CN113286186B (en) * | 2018-10-11 | 2023-07-18 | 广州虎牙信息科技有限公司 | Image display method, device and storage medium in live broadcast |
-
2019
- 2019-05-05 CN CN202011508099.8A patent/CN112562705A/en active Pending
- 2019-05-05 CN CN201910368510.7A patent/CN110085244B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9026446B2 (en) * | 2011-06-10 | 2015-05-05 | Morgan Fiumi | System for generating captions for live video broadcasts |
CN102522084A (en) * | 2011-12-22 | 2012-06-27 | 广东威创视讯科技股份有限公司 | Method and system for converting voice data into text files |
CN105488135A (en) * | 2015-11-25 | 2016-04-13 | 广州酷狗计算机科技有限公司 | Live content classification method and device |
CN106601263A (en) * | 2016-12-01 | 2017-04-26 | 武汉斗鱼网络科技有限公司 | Method and system used for acquiring sound of sound card and microphone and audio mixing |
CN107886964A (en) * | 2017-09-25 | 2018-04-06 | 惠州市德赛西威汽车电子股份有限公司 | A kind of audio-frequency processing method and its system |
CN107731241A (en) * | 2017-09-29 | 2018-02-23 | 广州酷狗计算机科技有限公司 | Handle the method, apparatus and storage medium of audio signal |
CN108986190A (en) * | 2018-06-21 | 2018-12-11 | 珠海金山网络游戏科技有限公司 | A kind of method and system of the virtual newscaster based on human-like persona non-in three-dimensional animation |
CN109218761A (en) * | 2018-08-07 | 2019-01-15 | 邓德雄 | Method and system for switching between live video and video |
CN109151366A (en) * | 2018-09-27 | 2019-01-04 | 惠州Tcl移动通信有限公司 | A kind of sound processing method of video calling |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110956971A (en) * | 2019-12-03 | 2020-04-03 | 广州酷狗计算机科技有限公司 | Audio processing method, device, terminal and storage medium |
CN112164407A (en) * | 2020-09-22 | 2021-01-01 | 腾讯音乐娱乐科技(深圳)有限公司 | Tone conversion method and device |
CN112164407B (en) * | 2020-09-22 | 2024-06-18 | 腾讯音乐娱乐科技(深圳)有限公司 | Tone color conversion method and device |
CN112672207A (en) * | 2020-12-30 | 2021-04-16 | 广州繁星互娱信息科技有限公司 | Audio data processing method and device, computer equipment and storage medium |
CN115412773A (en) * | 2021-05-26 | 2022-11-29 | 武汉斗鱼鱼乐网络科技有限公司 | Method, device and system for processing audio data of live broadcast room |
CN113792853A (en) * | 2021-09-09 | 2021-12-14 | 北京百度网讯科技有限公司 | Training method of character generation model, character generation method, device and equipment |
CN113792853B (en) * | 2021-09-09 | 2023-09-05 | 北京百度网讯科技有限公司 | Training method of character generation model, character generation method, device and equipment |
CN113823300B (en) * | 2021-09-18 | 2024-03-22 | 京东方科技集团股份有限公司 | Voice processing method and device, storage medium and electronic equipment |
CN113823300A (en) * | 2021-09-18 | 2021-12-21 | 京东方科技集团股份有限公司 | Voice processing method and device, storage medium and electronic equipment |
CN114173142A (en) * | 2021-11-19 | 2022-03-11 | 广州繁星互娱信息科技有限公司 | Object live broadcast display method and device, storage medium and electronic equipment |
WO2023102932A1 (en) * | 2021-12-10 | 2023-06-15 | 广州虎牙科技有限公司 | Audio conversion method, electronic device, program product, and storage medium |
WO2024109375A1 (en) * | 2022-11-21 | 2024-05-30 | 腾讯科技(深圳)有限公司 | Method and apparatus for training speech conversion model, device, and medium |
CN116993918A (en) * | 2023-08-11 | 2023-11-03 | 无锡芯算智能科技有限公司 | Modeling system and method for anchor image based on deep learning |
CN116993918B (en) * | 2023-08-11 | 2024-02-13 | 无锡芯算智能科技有限公司 | Modeling system and method for anchor image based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN110085244B (en) | 2020-12-25 |
CN112562705A (en) | 2021-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110085244A (en) | Living broadcast interactive method, apparatus, electronic equipment and readable storage medium storing program for executing | |
CN110062267A (en) | Live data processing method, device, electronic equipment and readable storage medium storing program for executing | |
US10861210B2 (en) | Techniques for providing audio and video effects | |
US9547642B2 (en) | Voice to text to voice processing | |
CN111489424A (en) | Virtual character expression generation method, control method, device and terminal equipment | |
JP2019211747A (en) | Voice concatenative synthesis processing method and apparatus, computer equipment and readable medium | |
WO2021082823A1 (en) | Audio processing method, apparatus, computer device, and storage medium | |
CN109147800A (en) | Answer method and device | |
CN110071938A (en) | Virtual image interactive method, apparatus, electronic equipment and readable storage medium storing program for executing | |
CN113035199B (en) | Audio processing method, device, equipment and readable storage medium | |
CN108766413A (en) | Phoneme synthesizing method and system | |
CN111460094A (en) | Method and device for optimizing audio splicing based on TTS (text to speech) | |
CN112185340B (en) | Speech synthesis method, speech synthesis device, storage medium and electronic equipment | |
US20220308262A1 (en) | Method and apparatus of generating weather forecast video, electronic device, and storage medium | |
EP4345814A1 (en) | Video-generation system | |
CN116366872A (en) | Live broadcast method, device and system based on man and artificial intelligence | |
CN109525787A (en) | Real-time caption translating and network system realization towards live scene | |
CN116013274A (en) | Speech recognition method, device, computer equipment and storage medium | |
CN115690277A (en) | Video generation method, system, device, electronic equipment and computer storage medium | |
KR20220135203A (en) | Automatic recommendation music support system in streaming broadcasting | |
CN111757173B (en) | Commentary generation method and device, intelligent sound box and storage medium | |
CN116561294A (en) | Sign language video generation method and device, computer equipment and storage medium | |
Mayor et al. | Kaleivoicecope: voice transformation from interactive installations to video games | |
CN118138833B (en) | Digital person construction method and device and computer equipment | |
US20240321320A1 (en) | Harmonizing system for optimizing sound in content |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |