CN110085244A

CN110085244A - Living broadcast interactive method, apparatus, electronic equipment and readable storage medium storing program for executing

Info

Publication number: CN110085244A
Application number: CN201910368510.7A
Authority: CN
Inventors: 徐子豪; 刘炉
Original assignee: Guangzhou Huya Information Technology Co Ltd
Current assignee: Guangzhou Huya Information Technology Co Ltd
Priority date: 2019-05-05
Filing date: 2019-05-05
Publication date: 2019-08-02
Anticipated expiration: 2039-05-05
Also published as: CN110085244B; CN112562705A

Abstract

The embodiment of the present application provides a kind of living broadcast interactive method, apparatus, electronic equipment and readable storage medium storing program for executing, by extracting content characteristic figure from the first audio data that main broadcaster inputs and extracting content feature vector by preset characteristic vector pickup network, then content feature vector is converted using target tone color style corresponding style transformation model, obtains the style transition diagram with target tone color style.Then feature inverse transform is carried out to content characteristic pattern and style transition diagram, obtains the second audio data with the target tone color style.Finally, generating the interdynamic video stream of the corresponding virtual image of the main broadcaster according to second audio data, and it is sent to client and plays out.So, it can be directed to any main broadcaster, while not changing audio content, the tone color style during virtual image is broadcast live is converted to target tone color style to interact with spectators, interaction effect during raising live streaming in turn, more Shangdi transfer spectators are interacted with main broadcaster's.

Description

Living broadcast interactive method, apparatus, electronic equipment and readable storage medium storing program for executing

Technical field

This application involves internets, and field is broadcast live, in particular to living broadcast interactive method, apparatus, electronic equipment and can Read storage medium.

Background technique

Internet live streaming in, with virtual image replace main broadcaster reality image participate in living broadcast interactive, be at present compared with For a kind of popular direct-seeding.

In current direct-seeding, the tone color of virtual image mostly uses greatly the former tone color style or fixed in advance of main broadcaster A certain tone color style provides live data streams, can not be converted into other tone color styles and interact with spectators, such nothing Method meets certain particular demands of specific main broadcaster or niche audience, so that will lead to interaction live streaming effect reduces.Such as spectators can The sound that can prefer to hear is the tone color style of oneself liked star or the tone color style of people known to oneself.Example again Such as, main broadcaster may be not intended to the tone color style show of oneself exposing privacy concern to other spectators.

Summary of the invention

In view of this, the embodiment of the present application is designed to provide a kind of living broadcast interactive method, apparatus, electronic equipment and can Storage medium is read, to solve the above problems.

According to the one aspect of the embodiment of the present application, a kind of electronic equipment is provided, may include that one or more storages are situated between Matter and one or more processors communicated with storage medium.One or more storage mediums are stored with the executable machine of processor Device executable instruction.When electronic equipment operation, the processor executes the machine-executable instruction, to execute living broadcast interactive Method.

According to the another aspect of the embodiment of the present application, a kind of living broadcast interactive method is provided, is applied to main broadcaster end, the main broadcaster At least one style transformation model is stored in end, every kind of style transformation model is corresponding with a kind of tone color style, the method Include:

According to the tone color convert requests received, audio frequency characteristics figure, institute are extracted from the first audio data that main broadcaster inputs Stating audio frequency characteristics figure includes content characteristic figure, and the tone color convert requests include target tone color style；

The content characteristic figure is input to preset characteristic vector pickup network, extracts the content of the content characteristic figure Feature vector；

The content feature vector is converted using the target tone color style corresponding style transformation model, is obtained Style transition diagram with the target tone color style；

Feature inverse transform is carried out to the content characteristic figure and the style transition diagram, obtains that there is the target tone color style Second audio data；

The interdynamic video stream of the corresponding virtual image of the main broadcaster is generated according to the second audio data, and is sent to client End plays out.

According to the another aspect of the embodiment of the present application, a kind of living broadcast interactive device is provided, is applied to main broadcaster end, the main broadcaster At least one style transformation model is stored in end, every kind of style transformation model is corresponding with a kind of tone color style, described device Include:

Extraction module, for being extracted from the first audio data that main broadcaster inputs according to the tone color convert requests received Audio frequency characteristics figure, the audio frequency characteristics figure include content characteristic figure, and the tone color convert requests include target tone color style；

Input module extracts in described for the content characteristic figure to be input to preset characteristic vector pickup network Hold the content feature vector of characteristic pattern；

Conversion module, for using the corresponding style transformation model of the target tone color style to the content feature vector It is converted, obtains the style transition diagram with the target tone color style；

Inverse transform block is had for carrying out feature inverse transform to the content characteristic figure and the style transition diagram There is the second audio data of the target tone color style；

Sending module is generated, the interaction for generating the corresponding virtual image of the main broadcaster according to the second audio data regards Frequency flows, and is sent to client and plays out.

According to the another aspect of the embodiment of the present application, a kind of readable storage medium storing program for executing is provided, is stored on the readable storage medium storing program for executing The step of having machine-executable instruction, above-mentioned living broadcast interactive method can be executed when which is run by processor.

Based on any of the above-described aspect, compared to existing technologies, the embodiment of the present application by inputted from main broadcaster first Content characteristic figure is extracted in audio data and content feature vector is extracted by preset characteristic vector pickup network, is then used The corresponding style transformation model of target tone color style converts content feature vector, obtains the wind with target tone color style Lattice transition diagram.Then feature inverse transform is carried out to content characteristic pattern and style transition diagram, obtained with the target tone color style Second audio data.Finally, generating the interdynamic video stream of the corresponding virtual image of the main broadcaster according to second audio data, and send It is played out to client.The audio content provided for any main broadcaster is provided, while not changing audio content, Tone color style during virtual image is broadcast live is converted to target tone color style to interact with spectators, and then improves live streaming Interaction effect in the process, more Shangdi transfer spectators interact with main broadcaster's.

Detailed description of the invention

Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 shows the schematic diagram of live broadcast system provided by the embodiment of the present application；

Fig. 2 shows one of the flow diagrams of living broadcast interactive method provided by the embodiment of the present application；

Fig. 3 shows a kind of boundary that selection target tone color style in Internet application is broadcast live provided by the embodiment of the present application Face schematic diagram；

Fig. 4 shows the live streaming interface schematic diagram at main broadcaster end provided by the embodiment of the present application；

Fig. 5 shows two of the flow diagram of living broadcast interactive method provided by the embodiment of the present application；

Fig. 6 shows the stream for each sub-steps that step S101 shown in Fig. 5 provided by the embodiment of the present application includes Journey schematic diagram；

Fig. 7 shows the training flow diagram of style transformation model provided by the embodiment of the present application；

Fig. 8 shows the schematic diagram of electronic equipment provided by the embodiment of the present application.

Specific embodiment

To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.

Therefore, the detailed description of the embodiments herein provided in the accompanying drawings is not intended to limit below claimed Scope of the present application, but be merely representative of the selected embodiment of the application.Based on the embodiment in the application, this field is common Technical staff's every other embodiment obtained without making creative work belongs to the model of the application protection It encloses.

It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.

Shown in referring to Fig.1, Fig. 1 is the configuration diagram of live broadcast system 10 provided by the embodiments of the present application.For example, live streaming system System 10 can be the service platform for such as internet live streaming etc.Live broadcast system 10 may include direct broadcast server 200, master Broadcast end 100 and client 300, direct broadcast server 200 is communicated to connect with main broadcaster end 100 and client 300 respectively, for for Main broadcaster end 100 and client 300 provide direct broadcast service.Such as main broadcaster end 100 and each is can store in direct broadcast server 200 Corresponding relationship between a direct broadcast band, after client 300 selects direct broadcast band, direct broadcast server 200 can be according to each straight The corresponding relationship for broadcasting channel Yu main broadcaster end 100 sends live video stream to the client 300 belonged in same direct broadcast band.

In some implement scenes, main broadcaster end 100 and client 300 be may be used interchangeably.For example, the master at main broadcaster end 100 It broadcasts and main broadcaster end 100 can be used to provide live video service for spectators, or check that other main broadcasters provide straight as spectators Broadcast video.In another example client 300, which also can be used, in the spectators of client 300 watches the live streaming view that main broadcaster of interest provides Frequently, or as main broadcaster for other spectators provide live video service.In the present embodiment, main broadcaster end 100 and client 300 can be with Can include but is not limited to any hand-held electronic product based on intelligent operating system, can with user by keyboard, The input equipments such as dummy keyboard, touch tablet, touch screen and voice-operated device carry out human-computer interaction, such as smart phone, plate Computer, PC etc..Wherein, intelligent operating system includes but is not limited to any by answering to the various movements of mobile device offer For enriching the operating system of functions of the equipments, Android (Android), iOS, Windows Phone etc..Wherein, Zhu Boduan 100 and client 300 in can install for providing the internet product of internet direct broadcast service, for example, internet product can To be application APP relevant to internet direct broadcast service, Web page, small routine used in computer or smart phone Deng.

In the present embodiment, live broadcast system 10 can also include the video acquisition device for acquiring main broadcaster's video frame of main broadcaster 400, video acquisition device 400 is mounted directly or is integrated in main broadcaster end 100, can also be independently of main broadcaster end 100 and and main broadcaster 100 connection of end.

Referring to shown in Fig. 2, Fig. 2 shows the flow diagrams of living broadcast interactive method provided by the embodiments of the present application, this is straight Broadcast interactive approach can the main broadcaster end 100 as shown in Fig. 1 execute.It should be appreciated that in other embodiments, the live streaming of the present embodiment The sequence of interactive approach part step can be exchanged with each other according to actual needs or part steps therein or save Slightly or delete.The detailed step of the living broadcast interactive method is described below.

Step S110 extracts audio from the first audio data that main broadcaster inputs according to the tone color convert requests received Characteristic pattern.

Content characteristic figure is input to preset characteristic vector pickup network, extracts the interior of content characteristic figure by step S120 Hold feature vector.

Step S130 converts content feature vector using the corresponding style transformation model of target tone color style, obtains To the style transition diagram with target tone color style.

Step S140 carries out feature inverse transform to content characteristic pattern and style transition diagram, obtains having the target tone color wind The second audio data of lattice.

Step S150, the interdynamic video stream of the corresponding virtual image of the main broadcaster is generated according to second audio data, and is sent It is played out to client 300.

In the present embodiment, for step S110, any main broadcaster can be by clicking the live streaming installed on main broadcaster end 100 Internet application starts and enters live streaming interface and starts to be broadcast live, and live video stream can be generated during live streaming, picture is broadcast live, is straight Broadcast the data such as frequency, text barrage.

It optionally, may include spectators' selection of the main broadcaster or the direct broadcasting room into the main broadcaster in the tone color convert requests Target tone color style, which can be understood as the main broadcaster or enters the spectators of direct broadcasting room of the main broadcaster listening The tone color style heard is wished when aforementioned live audio.For example, the audio data that the main broadcaster may want to oneself output sounds The tone color style of the idol star liked such as oneself or the tone color style of friend known to oneself or oneself like Intonation of speaking (such as " Beijing chamber " " Taiwan chamber " etc.) tone color style.In another example may for a part of spectators Wish that the audio data of the main broadcaster oneself heard output sounds the tone color style of the similar idol star oneself liked, or The tone color style of friend known to oneself.Based on this, which can both be sent out by the corresponding main broadcaster end 100 of main broadcaster Out, it can also be issued by the client 300 for entering the spectators of the direct broadcasting room of the main broadcaster.

For example, needle can be set in the interface for the live streaming Internet application installed in main broadcaster end 100 or client 300 To the selection interface of the target tone color style, which shows the option of multiple and different tone color styles, the main broadcaster or Spectators into the direct broadcasting room of the main broadcaster can select mesh required for oneself from the respective option shown in the selection interface Then the corresponding option of mark with phonetic symbols color style generates corresponding tone color convert requests by main broadcaster end 100 or client 300.

Only as an example, being answered referring to Fig. 3, showing the live streaming internet installed in main broadcaster end 100 or client 300 Interface schematic diagram shows the option of different tone color styles in the interface, respectively include tone color style A, tone color style B, The spectators of tone color style C, tone color style D etc., the main broadcaster or the direct broadcasting room into the main broadcaster can select from the selection interface Select the corresponding option of target tone color style required for oneself.For example, the main broadcaster likes the tone color of oneself known friend A Style, and tone color style A is the tone color style of friend A, then the main broadcaster can choose tone color style A, then pass through main broadcaster end 100 Generate corresponding tone color convert requests.In another example the spectators of the direct broadcasting room of the main broadcaster like the tone color style that singer opens schoolmate, and Tone color style B is the tone color style that singer opens schoolmate, then the spectators can choose tone color style B, then raw by client 300 At corresponding tone color convert requests.

Wherein, the first audio data can be the audio data that the main broadcaster prerecords, and be also possible to during live streaming The audio data exported in real time, the present embodiment are not specifically limited this.

Through present inventor the study found that any a segment of audio data can be indicated by a series of waveform diagram, base In this, a kind of exemplary approach for extracting the corresponding audio frequency characteristics figure of the first audio data of the main broadcaster be may is that at interval of pre- If the first audio data is carried out cutting by the time (such as every 10 seconds), multiple audio fragments are obtained, each audio is then extracted Audiograph, spectrogram or the sound spectrograph of segment or audiograph, spectrogram or the sound spectrograph of each audio fragment carry out at image Transformed image is managed as audio frequency characteristics figure.The present embodiment, can be to avoid by the way that the first audio data is carried out cutting as a result, The Caton at the main broadcaster end 100 caused by the amount of audio data disposably handled is excessive, each audio that another aspect cutting obtains The time span of segment is consistent, can be in order to subsequent processing.

Audio frequency characteristics figure may include content characteristic figure and style and features figure, and style and features figure can be used to indicate that the first sound Style and features of frequency evidence, such as tone color style etc.；Content characteristic figure can be used to indicate that the content characteristic of the first audio data, Such as volume, speech content etc..

For step S120, which can use convolutional neural networks, convolutional Neural net Network is a kind of feedforward neural network, and artificial neuron can respond the surrounding cells in a part of coverage area, for image Processing has outstanding performance.The abstract characteristics that convolutional neural networks can extract object by multilayer convolution complete object identification. Based on this, the content feature vector of content characteristic figure can be extracted by convolutional neural networks.Optionally, the preset feature It is residual that visual graphics generator (Visual Graphics Generator, VGG) model, depth can be used in vector extraction network Poor network (Deep Residual Network, ResNet) model etc. is used to extract the model of the vector characteristics of image.

In the present embodiment, it is previously stored in main broadcaster end 100 at least one style transformation model in step S130, Every kind of style transformation model is corresponding with a kind of tone color style, and every kind of style transformation model can be used for the content of any main broadcaster Characteristic pattern is converted to the style transition diagram with target tone color style.

For step S140, since the style and features figure in original audio frequency characteristics figure, the step is substituted in style transition diagram In content characteristic figure and the style transition diagram after conversion can be understood as the audio frequency characteristics figure with the target tone color style.For The audio data that spectators can hear is generated, the present embodiment is also needed the style transition diagram after the content characteristic figure and conversion Feature inverse transform is carried out, the second audio data with the target tone color style is obtained.In this way, the second audio data combines The style and features of style transition diagram after the corresponding content characteristic figure of first audio data and conversion, thus do not change this While the content of one audio data, reach auditory effect corresponding to the target tone color style.

It is worth noting that although the function that can be changed voice in the prior art using some changes of voice (such as old man's sound, little Hai Sheng Sound etc.) to change one's voice in speech, but the sound effect converted in this scheme is unsatisfactory, is unable to reach preferable effect true to nature Fruit, and can not still be converted to required tone color style.The technical solution provided through this embodiment, the tone color after conversion are For the tone color of required target tone color style, there is extremely strong vivid effect.

It further needs exist for illustrating, since the style transformation model that this programme provides can learn corresponding sound The style and features vector of color style, therefore can be converted into for the arbitrary content of any main broadcaster output with corresponding Tone color style style transition diagram, without for each main broadcaster individually train style transformation model, greatly reduce trained work It measures.Wherein, it will be described in detail below for the specific training process of style transformation model.

It, can be in the display interface of direct broadcasting room in order to improve the interest during living broadcast interactive for step S140 Virtual image replaces the reality image of the main broadcaster to interact with spectators.For example, virtual image can imitate in real time the main broadcaster's The characteristic attributes such as expression, movement interact to represent the main broadcaster with spectators, i.e., spectators can pass through virtual image and the master It broadcasts and is interacted, which can be any one numerous subscribed in bean vermicelli of main broadcaster.In addition, the first virtual image can imitate The main broadcaster make to operation relevant with main broadcaster's content or movement, for example, holding a certain product, introducing a certain product etc..

After generating second audio data, be somebody's turn to do corresponding with frame audio frame each in second audio data can be generated in real time The interdynamic video frame of the corresponding virtual image of main broadcaster.For example, can be by each frame audio frame in identification second audio data Affective content or particular keywords, the virtual image is then controlled in the form of corresponding emotion behavior according to affective content and is held Row interaction movement, or go to search the execution interaction of the interaction form of expression corresponding to corresponding keyword according to particular keywords and move Make, and records the interdynamic video frame when virtual image executes interaction movement.

Then, each audio frame and its corresponding interdynamic video frame are synthesized, obtains the interdynamic video of virtual image Stream.For example, each audio frame can be directed to, the word content for including in the audio frame is parsed, then by the audio frame, the sound The corresponding interdynamic video frame of word content and the audio frame for including in frequency frame is synthesized, to obtain the corresponding void of the main broadcaster Intend the interdynamic video stream of image.On this basis, the interdynamic video stream of virtual image can be sent by direct broadcast server 200 It is played out to client 300.

For example, referring to Fig. 4, a kind of live streaming examples of interfaces figure for showing main broadcaster end 100 can in the live streaming interface To include that interface display frame, main broadcaster's video frame display box, barrage area, virtual image region and every frame audio frame of main broadcaster is broadcast live Word content XXXXX.Wherein, live streaming interface display frame for show currently be broadcast live the video flowing being broadcast live in platform or The complete video stream formed after the completion of live streaming, main broadcaster's video frame display box is for showing video acquisition device collected master in real time Video frame is broadcast, virtual image region is used to show the virtual image of main broadcaster and the interdynamic video frame of virtual image, and barrage area is used for Show the interaction content (such as AAAAA, BBBBB, CCCCC, DDDDD, EEEEE) between spectators and main broadcaster.

It is appreciated that live streaming interface shown in Fig. 4 is only to illustrate, the live streaming interface includes may be used also during actually live streaming With live information area, live information area may include direct broadcasting room title, main broadcaster's user account number, main broadcaster's head portrait, spectators' user account number, Spectators' head portrait, main broadcaster be concerned number, in the present ranking list that the popularity of main broadcaster, main broadcaster receive at least one of letter Breath.

In this way, the present embodiment being capable of tone color wind while not changing audio content, during virtual image is broadcast live Lattice are converted to target tone color style to interact with spectators, and then improve the interaction effect during live streaming, to a greater extent Ground transfer spectators interact with main broadcaster's.

As a kind of possible embodiment, provided in this embodiment straight referring to Fig. 5, before abovementioned steps S110 Broadcasting method can also include the following steps:

Step S101 obtains the corresponding style transformation model of target tone color style previously according to training sample training, specifically Referring to Fig. 6, step S101 may include following sub-step:

Sub-step S1011 obtains training sample, and training sample includes the first audio sample and the second audio of any main broadcaster Sample.

In the present embodiment, the first audio sample can be any audio sample with target tone color style.For example, if mesh Mark with phonetic symbols color style is the tone color style of some performer A, then can collect the audio data of a large amount of performer A as the first audio sample This.

In the present embodiment, the second audio sample is not specifically limited, and can be any main broadcaster or other any users Audio data can be collected as second audio sample.

Please refer to Fig. 7, the training process of the present embodiment be related to feature extraction network, characteristic vector pickup network with And initial conversion network.Exemplary elaboration is carried out below based on training process of the Fig. 7 to style transformation model in this step S101.

Sub-step S1012 extracts the reference style and features figure of the first audio sample and the content of the second audio sample respectively Characteristic pattern.

It is shown in Figure 7, it can be according to the above-mentioned side for extracting audio frequency characteristics figure from the first audio data that main broadcaster inputs Formula extracts the reference style and features figure of the first audio sample and the content characteristic of the second audio sample by feature extraction network Figure.

Sub-step S1013 is extracted corresponding with reference to style with reference to style and features figure respectively by characteristic vector pickup network Feature vector and the corresponding content feature vector of content characteristic figure.

Sub-step S1014 obtains mesh according to content feature vector and with reference to style and features vector training initial conversion model The corresponding style transformation model of mark with phonetic symbols color style, and be stored in main broadcaster end 100.

Exemplary elaboration is carried out below based on detailed training process of the Fig. 7 to this sub-step S1014.

The first, content feature vector is input in initial conversion model, the reference style for generating content feature vector turns Change figure.

The second, by characteristic vector pickup network extract with reference to style transition diagram it is corresponding with reference to style converting characteristic to Amount.

Third adjusts just according to content feature vector, with reference to style and features vector and with reference to style converting characteristic vector The network parameter of beginning transformation model.

In detail, the present embodiment can be calculated with reference to style and features vector and with reference to the between style converting characteristic vector One vector difference value, and with reference to the secondary vector difference value between style converting characteristic vector and content feature vector.It is optional The calculation of ground, aforementioned primary vector difference value and primary vector difference value may is that generation content feature vector is corresponding Content characteristic grayscale image, with reference to style and features vector it is corresponding with reference to style grayscale image and refer to style converting characteristic vector pair The reference style converting characteristic grayscale image answered.

Then, it calculates with reference to style grayscale image and with reference to the pixel difference value between style converting characteristic grayscale image as the One vector difference value.For example, can calculate with reference to the gray-scale pixel values of the pixel in style grayscale image and turn with reference to style The gray scale difference value between the gray-scale pixel values of the pixel of signature grey scale figure corresponding position is changed, and is calculated with reference in style grayscale image Each pixel and with reference to the squared difference value between corresponding position in style converting characteristic grayscale image.Then, to all The corresponding squared difference value of pixel is summed, and is obtained with reference to style grayscale image and is referred to style converting characteristic grayscale image Between pixel difference value as primary vector difference value.

Meanwhile it calculating with reference to the pixel difference value between style converting characteristic grayscale image and content characteristic grayscale image as the Two vector difference values.For example, can calculate with reference to the gray-scale pixel values of the pixel in style converting characteristic grayscale image and interior Hold the gray scale difference value between the gray-scale pixel values of the pixel of signature grey scale figure corresponding position, and calculates with reference to style converting characteristic Squared difference value in each pixel in grayscale image and content characteristic grayscale image between the pixel of corresponding position.Then, It sums, is obtained with reference to style converting characteristic grayscale image and content characteristic to the corresponding squared difference value of all pixels point Pixel difference value between grayscale image is as secondary vector difference value.

It is worth noting that those skilled in the art can also increase except aforementioned primary vector is poor in the hands-on stage Other loss functions except different value and secondary vector difference value, the application do not limit this in detail.

On aforementioned base, backpropagation training can be carried out according to primary vector difference value and secondary vector difference value, And calculate the gradient of the network parameter of initial conversion model.Then, according to the gradient being calculated, using stochastic gradient descent method Continue to train after updating the network parameter of initial conversion model, when initial conversion model meets training termination condition, output The corresponding style transformation model of target tone color style that training obtains.

Wherein, the calculating process of stochastic gradient descent method (can also edge along the direction that gradient declines solution minimum Gradient ascent direction solves maximum).The direction of gradient decline under normal circumstances, can work as ladder by obtaining to function derivation Degree vector shows that the amplitude of gradient is also 0 at this time to an extreme point when being 0, and it is optimal to use gradient descent algorithm to carry out When changing solution, the termination condition of algorithm iteration is the amplitude of gradient vector close to 0, and a very small constant threshold can be set Value.

Wherein, above-mentioned training termination condition may include at least one of following three kinds of conditions:

1) repetitive exercise number reaches setting number；2) primary vector difference value and secondary vector difference value are lower than setting threshold Value；3) primary vector difference value and secondary vector difference value no longer decline.

In addition, in the actual implementation process, can also be not limited to using above-mentioned example as training termination condition, this field Technical staff can design the training termination condition different from above-mentioned example according to actual needs.

Based on the corresponding style transformation model of target tone color style that above-mentioned steps obtain, can be used for any main broadcaster's The corresponding content characteristic figure of audio data is converted to the style and features transition diagram with target tone color style, is not changing any master While the audio content for the audio data broadcast, the tone color style during virtual image is broadcast live is converted to target tone color style To interact with spectators, and then the interaction effect during live streaming is improved, more Shangdi transfer spectators' and main broadcaster is mutual It is dynamic.And any audio content that the corresponding style transformation model of target tone color style can be exported for any main broadcaster makes With, no longer need to for each main broadcaster individually train style transformation model, greatly reduce training amount.

It is worth noting that the above is only in the training process of the corresponding style transformation model of preceding aim tone color style, For the training of the corresponding style transformation model of other tone color styles, it is referred to the associated description of above-described embodiment, herein not It repeats again.

Fig. 8 shows the schematic diagram of electronic equipment provided by the embodiments of the present application, and in the present embodiment, which can be with Refer to main broadcaster end 100 shown in FIG. 1 comprising storage medium 110, processor 120 and living broadcast interactive device 500.This implementation In example, storage medium 110 is respectively positioned in main broadcaster end 100 with processor 120 and the two is separately positioned.It is to be understood, however, that Storage medium 110 is also possible to independently of except main broadcaster end 100, and can be accessed by processor 120 by bus interface.It can Alternatively, storage medium 110 is also desirably integrated into processor 120, for example, it may be cache and/or general register.

Storage medium 110 is used as a kind of computer readable storage medium, and it is executable to can be used for storing software program, computer Program and module, the corresponding program instruction/module of living broadcast interactive method as described in the application any embodiment was (for example, should Extraction module 510, input module 520, conversion module 530, inverse transform block 540 and the life that living broadcast interactive device 500 includes At sending module 550).Storage medium 110 can mainly include storing program area and storage data area, wherein storing program area can deposit Application program needed for storing up operating system, at least one function；Storage data area can be stored to be created according to using for equipment Data etc..In addition, storage medium 110 may include high-speed random access memory, it can also include nonvolatile memory, example Such as at least one disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, it stores Medium 110 can further comprise the memory remotely located relative to processor 120, these remote memories can pass through network It is connected to equipment.The example of above-mentioned network include but is not limited to internet, intranet, local area network, mobile radio communication and its Combination.

The function of each functional module of the living broadcast interactive device 500 is described in detail separately below.

Extraction module 510, for being mentioned from the first audio data that main broadcaster inputs according to the tone color convert requests received Audio frequency characteristics figure is taken, it includes needing the first audio data in tone color convert requests that audio frequency characteristics figure, which includes content characteristic figure, The target tone color style that tone color style is converted.It is appreciated that the extraction module 510 can be used for executing above-mentioned steps S110, the detailed implementation about the extraction module 510 are referred to above-mentioned to the related content of step S110.

Input module 520 extracts content characteristic for content characteristic figure to be input to preset characteristic vector pickup network The content feature vector of figure.It is appreciated that the input module 520 can be used for executing above-mentioned steps S120, about the input mould The detailed implementation of block 520 is referred to above-mentioned to the related content of step S120.

Conversion module 530, for being carried out using the corresponding style transformation model of target tone color style to content feature vector Conversion, obtains the style transition diagram with target tone color style.It is appreciated that the conversion module 530 can be used for executing it is above-mentioned Step S130, the detailed implementation about the conversion module 530 are referred to above-mentioned to the related content of step S130.

Inverse transform block 540 obtains having the mesh for carrying out feature inverse transform to content characteristic pattern and style transition diagram The second audio data of mark with phonetic symbols color style.It is appreciated that the inverse transform block 540 can be used for executing above-mentioned steps S140, close It is referred in the detailed implementation of the inverse transform block 540 above-mentioned to the related content of step S140.

Sending module 550 is generated, the interaction for generating the corresponding virtual image of the main broadcaster according to second audio data regards Frequency flows, and is sent to client 300 and plays out.It is appreciated that the generation sending module 550 can be used for executing above-mentioned steps S150, the detailed implementation about the generation sending module 550 are referred to above-mentioned to the related content of step S150.

Further, the embodiment of the present application also provides a kind of computer readable storage medium, computer readable storage medium It is stored with machine-executable instruction, machine-executable instruction, which is performed, realizes living broadcast interactive method provided by the above embodiment.

The above, the only various embodiments of the application, but the protection scope of the application is not limited thereto, it is any Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain Lid is within the scope of protection of this application.Therefore, the protection scope of the application shall be subject to the protection scope of the claim.

Claims

1. a kind of living broadcast interactive method, which is characterized in that be applied to main broadcaster end, be stored at least one style in the main broadcaster end Transformation model, every kind of style transformation model are corresponding with a kind of tone color style, which comprises

According to the tone color convert requests received, audio frequency characteristics figure, the sound are extracted from the first audio data that main broadcaster inputs Frequency characteristic pattern includes content characteristic figure, and the tone color convert requests include target tone color style；

The content characteristic figure is input to preset characteristic vector pickup network, extracts the content characteristic of the content characteristic figure Vector；

The content feature vector is converted using the target tone color style corresponding style transformation model, is had The style transition diagram of the target tone color style；

Feature inverse transform is carried out to the content characteristic figure and the style transition diagram, obtains the with the target tone color style Two audio datas；

The interdynamic video stream of the corresponding virtual image of the main broadcaster is generated according to the second audio data, and be sent to client into Row plays.

2. living broadcast interactive method according to claim 1, which is characterized in that the style transformation model utilizes the first audio Sample and the second audio sample of any main broadcaster are obtained based on the neural metwork training of deep learning, wherein first audio Sample has the target tone color style.

3. living broadcast interactive method according to claim 1, which is characterized in that described to be asked according to the tone color conversion received It asks, before extracting audio frequency characteristics figure in the first audio data that main broadcaster inputs, the method also includes:

The corresponding style transformation model of the target tone color style is obtained previously according to training sample training, is specifically included:

Training sample is obtained, the training sample includes the first audio sample and the second audio sample of any main broadcaster, wherein institute The first audio sample is stated with the target tone color style；

The reference style and features figure of first audio sample and the content characteristic figure of second audio sample are extracted respectively；

It is extracted respectively by described eigenvector extraction network described corresponding with reference to style and features vector with reference to style and features figure Content feature vector corresponding with the content characteristic figure；

According to the content feature vector and the reference style and features vector training initial conversion model, the target sound is obtained The corresponding style transformation model of color style, and be stored in the main broadcaster end.

4. living broadcast interactive method according to claim 3, which is characterized in that described according to the content feature vector and institute It states with reference to style and features vector training initial conversion model, obtains the step of the corresponding style transformation model of the target tone color style Suddenly, comprising:

The content feature vector is input in initial conversion model, the reference style conversion of the content feature vector is generated Figure；

It is described corresponding with reference to style converting characteristic vector with reference to style transition diagram that network extraction is extracted by described eigenvector；

It is adjusted according to the content feature vector, the reference style and features vector and the reference style converting characteristic vector The network parameter of the initial conversion model.

5. living broadcast interactive method according to claim 4, which is characterized in that described according to the content feature vector, institute It states with reference to style and features vector and the network parameter for adjusting the initial conversion model with reference to style converting characteristic vector The step of, comprising:

Calculate it is described with reference to style and features vector and the primary vector difference value with reference between style converting characteristic vector with And the secondary vector difference value with reference between style converting characteristic vector and the content feature vector；

Backpropagation training is carried out according to the primary vector difference value and the secondary vector difference value, and is calculated described initial The gradient of the network parameter of transformation model；

According to the gradient being calculated, after the network parameter that the initial conversion model is updated using stochastic gradient descent method Continue to train, when the initial conversion model meets training termination condition, the obtained target tone color wind is trained in output The corresponding style transformation model of lattice.

6. living broadcast interactive method according to claim 5, which is characterized in that the calculating is described to refer to style and features vector With the primary vector difference value with reference between style converting characteristic vector and it is described with reference to style converting characteristic vector with The step of secondary vector difference value between the content feature vector, comprising:

It is corresponding with reference to wind to generate the corresponding content characteristic grayscale image of the content feature vector, the reference style and features vector Lattice grayscale image and the reference style converting characteristic vector are corresponding with reference to style converting characteristic grayscale image；

It calculates described with reference to style grayscale image and the pixel difference value with reference between style converting characteristic grayscale image is as institute Primary vector difference value is stated, and calculates the picture with reference between style converting characteristic grayscale image and the content characteristic grayscale image Plain difference value is as the secondary vector difference value.

7. living broadcast interactive method according to claim 6, which is characterized in that it is described calculate it is described with reference to style grayscale image with The step of pixel difference value with reference between style converting characteristic grayscale image is as the primary vector difference value, comprising:

Calculate the gray-scale pixel values with reference to the pixel in style grayscale image and described with reference to style converting characteristic gray scale Gray scale difference value between the gray-scale pixel values of the pixel of figure corresponding position, and calculate described with reference to each of style grayscale image Pixel and the squared difference value with reference between corresponding position in style converting characteristic grayscale image；

It sums to the corresponding squared difference value of all pixels point, obtains the reference style grayscale image and the reference Pixel difference value between style converting characteristic grayscale image；

The pixel difference value with reference between style converting characteristic grayscale image and the content characteristic grayscale image is calculated as institute The step of stating secondary vector difference value, comprising:

Calculate the gray-scale pixel values with reference to the pixel in style converting characteristic grayscale image and the content characteristic gray scale Gray scale difference value between the gray-scale pixel values of the pixel of figure corresponding position, and calculate described with reference to style converting characteristic grayscale image In each pixel and the content characteristic grayscale image in corresponding position pixel between squared difference value；

Sum to the corresponding squared difference value of all pixels point, obtain it is described with reference to style converting characteristic grayscale image with Pixel difference value between the content characteristic grayscale image.

8. living broadcast interactive method according to any one of claims 1-7, which is characterized in that described according to described second Audio data generates the interdynamic video stream of the corresponding virtual image of the main broadcaster, and is sent to the step of client plays out, packet It includes:

For each audio frame in the second audio data, the interdynamic video of the corresponding virtual image of the audio frame is generated Frame；

Each audio frame and its corresponding interdynamic video frame are synthesized, the interdynamic video stream of the virtual image is obtained, and The interdynamic video stream of the virtual image is sent to client to play out.

9. living broadcast interactive method according to claim 8, which is characterized in that described by each audio frame and its corresponding mutual The step of dynamic video frame is synthesized, and the interdynamic video stream of the virtual image is obtained, comprising:

For each audio frame, the word content for including in the audio frame is parsed；

The corresponding interdynamic video frame of word content and the audio frame for including in the audio frame, the audio frame is synthesized, from And obtain the interdynamic video stream of the corresponding virtual image of the main broadcaster.

10. a kind of living broadcast interactive device, which is characterized in that be applied to main broadcaster end, be stored at least one wind in the main broadcaster end Lattice transformation model, every kind of style transformation model is corresponding with a kind of tone color style, and described device includes:

Extraction module, for extracting audio from the first audio data that main broadcaster inputs according to the tone color convert requests received Characteristic pattern, the audio frequency characteristics figure include content characteristic figure, include needing first audio in the tone color convert requests The target tone color style that the tone color style of data is converted；

It is special to extract the content for the content characteristic figure to be input to preset characteristic vector pickup network for input module Levy the content feature vector of figure；

Conversion module, for being carried out using the corresponding style transformation model of the target tone color style to the content feature vector Conversion, obtains the style transition diagram with the target tone color style；

Inverse transform block is obtained having and be somebody's turn to do for carrying out feature inverse transform to the content characteristic figure and the style transition diagram The second audio data of target tone color style；

Sending module is generated, for generating the interdynamic video of the corresponding virtual image of the main broadcaster according to the second audio data Stream, and be sent to client and play out.

11. a kind of electronic equipment, which is characterized in that the electronic equipment includes one or more storage mediums and one or more The processor communicated with storage medium, one or more storage mediums are stored with the executable machine-executable instruction of processor, When electronic equipment operation, processor executes the machine-executable instruction, to realize described in any one of claim 1-9 Living broadcast interactive method.

12. a kind of readable storage medium storing program for executing, which is characterized in that the readable storage medium storing program for executing is stored with machine-executable instruction, described Machine-executable instruction, which is performed, realizes living broadcast interactive method described in any one of claim 1-9.