CN110062267A

CN110062267A - Live data processing method, device, electronic equipment and readable storage medium storing program for executing

Info

Publication number: CN110062267A
Application number: CN201910368522.XA
Authority: CN
Inventors: 徐子豪; 刘炉
Original assignee: Guangzhou Huya Information Technology Co Ltd
Current assignee: Guangzhou Huya Information Technology Co Ltd
Priority date: 2019-05-05
Filing date: 2019-05-05
Publication date: 2019-07-26

Abstract

The embodiment of the present application provides a kind of live data processing method, device, electronic equipment and readable storage medium storing program for executing, the first voice data with target tone color style is handled by network parameter learning model trained in advance to obtain the corresponding target network parameter of target tone color style, and style conversion is carried out to the second speech data that main broadcaster inputs by adjusting for the style switching network after target network parameter, the living broadcast interactive data flow of virtual live streaming image is generated according to the third voice data being converted to target tone color style, and it is sent to live streaming reception terminal and plays out.So, it can be directed to any main broadcaster, while not changing audio content, the tone color style during virtual live streaming image live streaming is converted into arbitrary tone color style to interact with spectators, interaction effect during raising live streaming in turn, more Shangdi transfer spectators are interacted with main broadcaster's.

Description

Live data processing method, device, electronic equipment and readable storage medium storing program for executing

Technical field

This application involves internets, and field is broadcast live, in particular to a kind of live data processing method, device, electronics Equipment and readable storage medium storing program for executing.

Background technique

In internet live streaming, replaces the reality image of main broadcaster to participate in living broadcast interactive so that image is virtually broadcast live, be mesh A kind of preceding more popular direct-seeding.

In current direct-seeding, the tone color of virtual live streaming image mostly uses greatly the former tone color style or solid in advance of main broadcaster Fixed a certain tone color style provides live data streams, can not be converted into other tone color styles and interact with spectators, such as This is unable to satisfy certain particular demands of specific main broadcaster or niche audience, so that will lead to interaction live streaming effect reduces.Such as it sees The sound that crowd may prefer to hear is the tone color style of oneself liked star or the tone color style of people known to oneself. In another example main broadcaster may be not intended to the tone color style show of oneself exposing privacy concern to other spectators.

Summary of the invention

In view of this, the embodiment of the present application is designed to provide a kind of live data processing method, device, electronic equipment And readable storage medium storing program for executing, to solve the above problems.

According to the one aspect of the embodiment of the present application, a kind of electronic equipment is provided, may include that one or more storages are situated between Matter and one or more processors communicated with storage medium.One or more storage mediums are stored with the executable machine of processor Device executable instruction.When electronic equipment operation, the processor executes the machine-executable instruction, to execute live data Processing method.

According to the another aspect of the embodiment of the present application, a kind of live data processing method is provided, is applied to live streaming and provides eventually End, which comprises

It parses the tone color convert requests received and obtains target tone color style；

First voice data with the target tone color style is obtained, and first voice data is input in advance In trained network parameter learning model, the corresponding target network parameter of the target tone color style is obtained；

The network parameter of the style switching network prestored is adjusted to the target network parameter, and according to wind adjusted Lattice switching network carries out style conversion to the second speech data that main broadcaster inputs, and obtains the third with the target tone color style Voice data；

The living broadcast interactive data flow of virtual live streaming image is generated according to the third voice data, and is sent to live streaming and is received Terminal plays out.

According to the another aspect of the embodiment of the present application, a kind of live data processing unit is provided, is applied to live streaming and provides eventually End, described device include:

Parsing module obtains target tone color style for parsing the tone color convert requests received；

Input module, for obtaining the first voice data with the target tone color style, and by first voice Data are input in network parameter learning model trained in advance, obtain the corresponding target network ginseng of the target tone color style Number；

Style conversion module, the network parameter of the style switching network for will prestore are adjusted to the target network ginseng Number, and style conversion is carried out to the second speech data that main broadcaster inputs according to style switching network adjusted, it obtains with institute State the third voice data of target tone color style；

Sending module is generated, for generating the living broadcast interactive data of virtual live streaming image according to the third voice data Stream, and be sent to live streaming reception terminal and play out.

According to the another aspect of the embodiment of the present application, a kind of readable storage medium storing program for executing is provided, is stored on the readable storage medium storing program for executing There is machine-executable instruction, the step of above-mentioned live data processing method can be executed when which is run by processor Suddenly.

Based on any of the above-described aspect, compared to existing technologies, the embodiment of the present application passes through network ginseng trained in advance Number learning model handles the first voice data with target tone color style corresponding to obtain the target tone color style Target network parameter, and by adjusting the second speech data inputted for the style switching network after target network parameter to main broadcaster Style conversion is carried out, the live streaming of virtual live streaming image is generated according to the third voice data being converted to target tone color style Interactive data stream, and be sent to live streaming reception terminal and play out.It so, it is possible do not changing in audio for any main broadcaster While appearance, it is mutual to carry out with spectators that the tone color style during virtual live streaming image live streaming is converted into arbitrary tone color style Dynamic, and then improve the interaction effect during live streaming, more interacting for spectators and main broadcaster is transferred in Shangdi.

Detailed description of the invention

Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 shows the schematic diagram of live broadcast system provided by the embodiment of the present application；

Fig. 2 shows one of the flow diagrams of live data processing method provided by the embodiment of the present application；

Fig. 3 shows a kind of boundary that selection target tone color style in Internet application is broadcast live provided by the embodiment of the present application Face schematic diagram；

Fig. 4 shows the schematic diagram of style conversion process provided by the embodiment of the present application；

Fig. 5 shows live streaming provided by the embodiment of the present application and provides the live streaming interface schematic diagram of terminal；

Fig. 6 shows two of the flow diagram of live data processing method provided by the embodiment of the present application；

Fig. 7 shows the stream for each sub-steps that step S101 shown in Fig. 6 provided by the embodiment of the present application includes Journey schematic diagram；

Fig. 8 shows the training flow diagram of style transformation model provided by the embodiment of the present application；

Fig. 9 shows the schematic diagram of electronic equipment provided by the embodiment of the present application.

Specific embodiment

To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.

Unless indicated to the contrary, the embodiment of the present application refers to " first ", " second ", third " etc. ordinal numbers be used for more A object distinguishes, and is not used in sequence, timing, position, priority or the significance level for limiting multiple objects.

Shown in referring to Fig.1, Fig. 1 is the configuration diagram of live broadcast system 10 provided by the embodiments of the present application.For example, live streaming system System 10 can be the service platform for such as internet live streaming etc.Live broadcast system 10 may include direct broadcast server 200, straight It broadcasts and terminal 100 and live streaming reception terminal 300 is provided, direct broadcast server 200 provides terminal 100 with live streaming respectively and live streaming connects It receives terminal 300 to communicate to connect, provides direct broadcast service for providing terminal 100 for live streaming and reception terminal 300 being broadcast live.For example, Live streaming, which provides terminal 100, can be sent to the live video stream of direct broadcasting room direct broadcast server 200, and spectators can be connect by live streaming It receives terminal 300 and pulls live video stream from direct broadcast server 200 to watch the live video of direct broadcasting room.In another example direct broadcast service Device 200 can also receive terminal 300 to the live streaming of the spectators when the direct broadcasting room that spectators subscribe to starts broadcasting and send a notification message.Live streaming Video flowing can be the complete video stream that the video flowing being broadcast live in platform is currently being broadcast live or is being formed after the completion of live streaming.

Live streaming, which provides, can install in terminal 100 and live streaming reception terminal 300 for providing the mutual of internet direct broadcast service Networked product, for example, internet product can be it is relevant to internet direct broadcast service used in computer or smart phone Application APP, Web page, small routine etc..

In the present embodiment, live broadcast system 10 can also include the video acquisition device for acquiring main broadcaster's video frame of main broadcaster 400, video acquisition device 400 is mounted directly or is integrated in live streaming and provides terminal 100, can also provide terminal independently of live streaming 100 and with live streaming provide terminal 100 connect.

Referring to shown in Fig. 2, Fig. 2 shows the flow diagram of live data processing method provided by the embodiments of the present application, Offer terminal 100 can be broadcast live as shown in Fig. 1 and execute for the live data processing method.The live data processing method it is detailed Step is described below.

Step S110 parses the tone color convert requests received and obtains target tone color style.

Step S120 obtains first voice data with target tone color style, and the first voice data is input to pre- First in trained network parameter learning model, the corresponding target network parameter of target tone color style is obtained.

The network parameter of the style switching network prestored is adjusted to target network parameter by step S130, and according to adjustment The second speech data that style switching network afterwards inputs main broadcaster carries out style conversion, obtains having the of target tone color style Three voice data.

Step S140, the living broadcast interactive data flow of virtual live streaming image is generated according to third voice data, and is sent to straight Reception terminal 300 is broadcast to play out.

In the present embodiment, for step S110, live streaming provides terminal 100 after receiving tone color convert requests, can be with The target tone color style for obtaining the main broadcaster or selecting into the spectators of the direct broadcasting room, the target are parsed from the tone color convert requests Tone color style can be understood as the main broadcaster or the spectators of the direct broadcasting room into the main broadcaster wish to listen when listening aforementioned live audio The tone color style arrived.For example, the main broadcaster may want to the audio data of oneself output, to sound like the idol oneself liked bright The tone color style of the tone color style of star or friend known to oneself or intonation of the speaking (such as " Taiwan oneself liked Chamber ", " Beijing chamber " etc.) tone color style.In another example may also wish the main broadcaster oneself heard for a part of spectators The audio data of output sounds the tone color style of the similar idol star oneself liked or the tone color of friend known to oneself Style.Based on this, which both can provide terminal 100 by the corresponding live streaming of main broadcaster and issue, can also be by entering The live streaming of the spectators of the direct broadcasting room of the main broadcaster receives terminal 300 and issues.

For example, live streaming provides terminal 100 or live streaming receives in the interface for the live streaming Internet application installed in terminal 300 The selection interface for the target tone color style can be set, which shows the choosing of multiple and different tone color styles , the spectators of the main broadcaster or the direct broadcasting room into the main broadcaster can select certainly from the respective option shown in the selection interface Then it is raw to receive terminal 300 by live streaming offer terminal 100 or live streaming for the corresponding option of target tone color style required for oneself At corresponding tone color convert requests.

Only as an example, referring to Fig. 3, showing that live streaming provides terminal 100 or live streaming is received and installed in terminal 300 The interface schematic diagram of Internet application is broadcast live, the option of different tone color styles is shown in the interface, respectively includes tone color style A, the spectators of tone color style B, tone color style C, tone color style D etc., the main broadcaster or the direct broadcasting room into the main broadcaster can be from this The corresponding option of target tone color style required for oneself is selected in selection interface.For example, the main broadcaster like oneself one be familiar with Friend A tone color style, and tone color style A be friend A tone color style, then the main broadcaster can choose tone color style A, then Terminal 100, which is provided, by live streaming generates corresponding tone color convert requests.In another example the spectators of the direct broadcasting room of the main broadcaster like some The tone color style of singer, and tone color style B is the tone color style of the singer, then the spectators can choose tone color style B, then lead to It crosses live streaming reception terminal 300 and generates corresponding tone color convert requests.

Referring to shown in Fig. 4, the schematic diagram of style conversion process in the embodiment of the present application is shown, below with reference to Fig. 4 to preceding Embodiment is stated to illustrate.

For step S120, live streaming, which provides terminal 100, can locally be previously stored with the corresponding audio of various tone color styles Data then can be from local the first voice data searched and have the target tone color style after determining target tone color style.Or Person, direct broadcast server 200 can also provide various tone color styles corresponding audio data, after determining target tone color style, then The first voice data with the target tone color style can be obtained from direct broadcast server 200.

On this basis, live streaming, which provides terminal 100, can be input to the first voice data network parameter trained in advance It practises in model, obtains the corresponding target network parameter of target tone color style.

Wherein, which can learn the corresponding style network parameter of various different tone color styles, example Second speech samples of the first speech samples and any main broadcaster that such as can use at least one tone color style are based on deep learning Neural metwork training obtain, wherein it is described at least one tone color style include the target tone color style.In this way, can be with needle To the audio data of any tone color style of input, audio data corresponding to the tone color style is exported, in this way without for every Kind tone color style individually trains style switching network again, greatly reduces training amount.

As a kind of possible embodiment, in the step s 120, it is corresponding with reference to wind that the first voice data is extracted first Lattice characteristic pattern will then be input in network parameter learning model with reference to style and features figure, it is corresponding to obtain target tone color style Target network parameter.

Through present inventor the study found that any a segment of audio data (such as first voice data) can be connected by one The waveform diagram of string indicates, is based on this, the first voice data for extracting the main broadcaster is corresponding with reference to the one of style and features figure Kind exemplary approach, which may is that, carries out cutting for the first voice data at interval of preset time (such as every 10 seconds), obtains more Then a data slot extracts audiograph, spectrogram or the sound spectrograph of each data slot or the sound wave of each data slot For image after figure, spectrogram or sound spectrograph progress image processing transformation as audio frequency characteristics figure, which then includes interior Hold characteristic pattern and above-mentioned with reference to style and features figure.It can be used to indicate that the style of the first voice data is special with reference to style and features figure Sign, such as tone color style etc.；Content characteristic figure can be used to indicate that the content characteristic of the first voice data, such as volume, Speech content etc..

The present embodiment, can be to avoid the audio data disposably handled by the way that the first voice data is carried out cutting as a result, It measures excessive caused live streaming and the Caton of terminal 100, the time span for each data slot that another aspect cutting obtains is provided It unanimously, can be in order to subsequent processing.

For step S130, join exporting the corresponding target network of the target tone color style by network parameter learning model After number, the network parameter of the style switching network prestored can be adjusted to target network parameter above-mentioned.In this way, adjusted The tone color style of the audio data of any main broadcaster can be converted to the target tone color style by style switching network, without being directed to The target tone color individually trains style switching network again.

The live streaming Internet application that any main broadcaster installs on through starting live streaming offer terminal 100 is opened with entering direct broadcasting room Begin that the data such as live video stream, live streaming picture, live audio, text barrage can be generated during live streaming after live streaming to pass through The live streaming that direct broadcast server 200 is sent into each spectators of the direct broadcasting room receives terminal 300.In above process, first The audio frequency characteristics figure of the second speech data of main broadcaster's input is extracted by feature extraction network, audio frequency characteristics figure includes content characteristic Figure and style and features figure.Then, the style switching network by adjusting after handles content characteristic pattern, obtains with target The style converting characteristic figure of tone color style.Finally, carrying out feature inverse transform to content characteristic pattern and style converting characteristic figure, obtain Third voice data with the target tone color style.

In detail, original in this way since the style and features figure in original audio frequency characteristics figure is substituted in style converting characteristic figure The style converting characteristic figure after content characteristic figure and conversion in audio frequency characteristics figure can be understood as having the target tone color style Audio frequency characteristics figure.On this basis, in order to generate the audio data that spectators can hear, the present embodiment is also needed the content Style transition diagram after characteristic pattern and conversion carries out feature inverse transform, obtains the third voice number with the target tone color style According to.In this way, the style conversion after the third voice data integration second speech data corresponding content characteristic figure and conversion The style and features of figure, to reach corresponding to the target tone color style while not changing the content of the second speech data Auditory effect.

It is worth noting that although the function that can be changed voice in the prior art using some changes of voice (such as old man's sound, little Hai Sheng Sound etc.) to change one's voice in speech, but the sound effect converted in this scheme is unsatisfactory, is unable to reach preferable effect true to nature Fruit, and can not still be converted to required tone color style.The technical solution provided through this embodiment, the tone color after conversion are For the tone color of required target tone color style, there is extremely strong vivid effect.

It, can be in the display interface of direct broadcasting room in order to improve the interest during living broadcast interactive for step S140 Virtual live streaming image replaces the reality image of the main broadcaster to interact with spectators.Virtual live streaming image can be outer with main broadcaster The virtual figure image that looks, posture, makings etc. are consistent, such as two-dimensional virtual figure image or three-dimensional personage can be used Image is also possible to cartoon character or true man's image etc..For example, virtual live streaming image can imitate the table of the main broadcaster in real time The characteristic attributes such as feelings, movement interact to represent the main broadcaster with spectators, i.e., spectators can be by being virtually broadcast live image and being somebody's turn to do Main broadcaster interacts, which can be any one numerous subscribed in bean vermicelli of main broadcaster.It specifically, can be during live streaming The limb action of main broadcaster, facial expression, audio data etc. are captured and identified, and virtual live streaming image is combined to carry out It plays, is then forwarded on direct broadcast server 200, receive terminal 300 from direct broadcast server 200 to enter the live streaming of direct broadcasting room In pull live data streams and watched.In this way, the virtual live streaming image that spectators experience in this way can have it is similar In the impression of practical main broadcaster true man movement and voice.For example, spectators it is seen that a cartoon dinosaur virtual figure image, but But the real time data of the movement from this main broadcaster and audio data is transmitted for the movement of this cartoon dinosaur and voice.

After live streaming offer terminal 100 generates third voice data above-mentioned, virtual live streaming image can be generated in real time Living broadcast interactive data flow, and be sent to live streaming reception terminal 300 and play out.For example, can according to setting time interval (such as 5 seconds, 10 seconds etc.) by third voice data cutting be multiple audio data sections, and be directed to each audio data section, identify the audio The content parameters of data segment, the content parameters may include content characteristic, emotional characteristics and amplitude characteristic, wherein emotional characteristics For controlling the emotional state of virtual live streaming image, amplitude characteristic is used to control the shape of the mouth as one speaks folding condition of virtual live streaming image.Example Such as, if recognizing emotional characteristics is the corresponding parameter of happy state, virtual live streaming image can be adjusted according to the emotional characteristics The value of emotion attribute is smile, and successively expression, movement and the posture of the virtual live streaming image of control.In another example if in recognizing Holding feature is " I am very happy ", then the content of the action attributes of the adjustable virtual live streaming image is to execute " applause " in real time Movement, while the expression attribute for adjusting the virtual live streaming image is smile.

In other possible embodiments, main broadcaster can also can be acquired by video acquisition device 400 shown in Fig. 1 Real-time expression, movement and posture.For example, where the face position of identification main broadcaster and angle, the profile of face, human face five-sense-organ Position, Rotation of eyeball position, eyelid eyebrow, the motion state of lip and gesture motion etc., the information that these are acquired in real time into The result of analysis is converted to customized control instruction set, and passes through the control interaction of these control instruction set by row analysis The virtual live streaming image on interface imitates institute collected expression, movement and posture in real time.Such as when collected gesture is dynamic When referring to downwards as hand, then the value for adjusting the action attributes of the virtual live streaming image is to execute the movement of " sitting down " in real time.

Thus, it is possible to which it is corresponding virtual straight to generate the audio data section according to content characteristic, emotional characteristics and amplitude characteristic The interdynamic video section of image is broadcast, and each audio data section and its corresponding interdynamic video section are synthesized, is obtained virtual straight Broadcast the living broadcast interactive data flow of image, will virtual live streaming image living broadcast interactive data flow be sent to live streaming receive terminal 300 into Row plays.

For example, referring to Fig. 5, showing live streaming provides a kind of live streaming examples of interfaces figure of terminal 100, at the live streaming interface In, it may include that interface display frame, main broadcaster's video frame display box, barrage area, virtual image region and every frame of main broadcaster is broadcast live The word content XXXXX of audio frame.Wherein, the video being broadcast live in platform is currently being broadcast live for showing in live streaming interface display frame The complete video stream formed after the completion of stream or live streaming, main broadcaster's video frame display box is for showing that video acquisition device acquires in real time The main broadcaster's video frame arrived, virtual image region are used to show the virtual image of main broadcaster and the living broadcast interactive data flow of virtual image, Barrage area is used to show the interaction content (such as AAAAA, BBBBB, CCCCC, DDDDD, EEEEE) between spectators and main broadcaster.

On this basis, live streaming, which provides terminal, can correspond to adjustment according to the interactive information for receiving terminal 300 from live streaming The characteristic attribute of virtual live streaming image, so that spectators can carry out virtual interactive with the virtual live streaming image.Still shown in Fig. 5 Live streaming interface for, spectators can by live streaming receive terminal 300 send interactive information, live streaming provide terminal 100 can basis Vivid characteristic attribute is virtually broadcast live to correspond to adjustment in these interactive information, thus completes live streaming reception terminal 300 and mutual arena Interaction on face between the shown virtual live streaming image.It should be appreciated that these interactive information and the virtual live streaming image Preset corresponding relationship may be present between characteristic attribute, this preset corresponding relationship can be established by pre- learning process, herein It is not illustrating one by one.

In this way, the present embodiment can be directed to any main broadcaster, while not changing audio content, will virtually be broadcast live vivid straight Tone color style during broadcasting is converted to arbitrary tone color style to interact with spectators, and then improves mutual during being broadcast live Dynamic effect, more Shangdi transfer spectators are interacted with main broadcaster's.

As a kind of possible embodiment, provided in this embodiment straight referring to Fig. 6, before abovementioned steps S110 Multicast data processing method can also include the following steps:

Step S101 obtains network parameter learning model previously according to training sample training, referring specifically to Fig. 7, step S101 may include following sub-step:

Sub-step S1011, obtain training sample, training sample include at least one tone color style the first speech samples and The second speech samples of any main broadcaster.

In the present embodiment, aforementioned at least one tone color style may include target tone color style and other tone color styles, First speech samples can be any speech samples with target tone color style and other tone color styles.For example, if target Tone color style is the tone color style of some known friend A, then can collect the audio data of a large amount of known friend A as one The first speech samples of part.

In the present embodiment, the second speech samples are not specifically limited, and can be any main broadcaster or other any users Audio data can be collected as second speech samples.

Please refer to Fig. 8, the training process of the present embodiment be related to feature extraction network, characteristic vector pickup network with And initial conversion network.Exemplary elaboration is carried out below based on training process of the Fig. 8 to style transformation model in this step S101.

Sub-step S1012 extracts corresponding content characteristic sample graph from the second speech samples of any main broadcaster.

It is shown in Figure 8, it can be according to the above-mentioned side for extracting audio frequency characteristics figure from the second speech data that main broadcaster inputs Formula extracts the content characteristic figure of the second speech samples by feature extraction network.

Sub-step S1013 extracts corresponding wind from the first speech samples of the tone color style for every kind of tone color style Lattice feature samples figure.

It is shown in Figure 8, it can be according to the above-mentioned side for extracting audio frequency characteristics figure from the second speech data that main broadcaster inputs Formula extracts the style and features sample graph of corresponding first speech samples of every kind of tone color style by feature extraction network.

Sub-step S1014, according to content characteristic sample graph and the corresponding style and features sample graph of every kind of tone color style to member Learning network is trained, and obtains network parameter learning model, and is stored in live streaming and is provided in terminal 100.

Exemplary elaboration is carried out below based on detailed training process of the Fig. 8 to this sub-step S1014.

The first, the corresponding style and features sample graph of every kind of tone color style is input in meta learning network, obtains every kind of sound The style network parameter of color style.

The second, it is carried out according to network parameter of the style network parameter of every kind of tone color style to preset style switching network Adjustment, and content characteristic sample graph is input in style switching network adjusted, obtain corresponding style converting characteristic sample This figure.

Third adjusts member according to the style and features sample graph of every kind of tone color style and corresponding style converting characteristic sample graph The network parameter of learning network obtains network parameter learning model.

In detail, as an implementation, the style and features sample graph of every kind of tone color style and corresponding can be calculated Loss function value between style converting characteristic sample graph, and according to loss function value update meta learning network network parameter after Repetitive exercise, when meta learning network meets training termination condition, obtained network parameter learning model is trained in output.

Wherein, above-mentioned training termination condition may include at least one of following three kinds of conditions:

1) repetitive exercise number reaches setting number；2) loss function value is lower than given threshold；3) loss function value is no longer Decline.

Wherein, in condition 1) in, in order to save operand, the maximum value of the number of iterations can be set, if the number of iterations Reach setting number, the iteration of this iteration cycle can be stopped, using the deep learning network finally obtained as tone color modulus of conversion Type.In condition 2) in, if loss function value is lower than given threshold, illustrate that current tone color transformation model can expire substantially Sufficient condition can stop iteration at this time.In condition 3) in, loss function value no longer declines, and shows to have formd optimal sound Color transformation model can stop iteration.

It should be noted that above-mentioned iteration stopping condition can be used in combination, a use can also be selected, for example, can be Loss function value, which no longer declines, stops iteration, alternatively, stopping iteration when the number of iterations reaches setting number, alternatively, losing Functional value stops iteration when no longer declining.Alternatively, given threshold can also be lower than in loss function value, and loss function value is not When declining again, stop iteration.

In addition, in the actual implementation process, can also be not limited to using above-mentioned example as training termination condition, this field Technical staff can design the training termination condition different from above-mentioned example according to actual needs.

Based on the network parameter learning model that above-mentioned steps obtain, it can be used for the sound of any tone color style according to input Frequency corresponds to the network parameter of tone color style according to exporting, and the style switching network after the subsequent parameter using aforementioned network can be not While changing the audio content of the audio data of any main broadcaster, the tone color style during virtual image is broadcast live is converted to pair The tone color style answered improves the interaction effect during live streaming to interact with spectators, and more Shangdi, which is transferred, sees Crowd interacts with main broadcaster's.Also, the present embodiment is no longer needed to for each main broadcaster, or individually trains wind for every kind of tone color style Lattice transformation model greatly reduces training amount.

Fig. 9 shows the schematic diagram of electronic equipment provided by the embodiments of the present application, and in the present embodiment, which can be with Refer to that live streaming shown in FIG. 1 provides terminal 100 comprising storage medium 110, processor 120 and live data processing unit 500。

Wherein, processor 120 can be a general central processing unit (CentralProcessing Unit, CPU), Microprocessor, application-specific integrated circuit (application-specificintegrated circuit, ASIC) or one Or the integrated circuit that multiple programs for controlling the live data processing method of above method embodiment offer execute.

Storage medium 110 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM or the other kinds of dynamic memory that can store information and instruction, are also possible to the read-only storage of electric erazable programmable Device (Electrically erasable programmabler-only memory, EEPROM), CD-ROM (compactdisc read-only memory, CD-ROM) or other optical disc storages, optical disc storage (including compression optical disc, swash Optical disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium or other magnetic storage apparatus or can use In carry or storage have instruction or data structure form desired program code and can by computer access it is any its His medium, but not limited to this.Storage medium 110, which can be, to be individually present, and is connected by communication bus with processor 120.It deposits Storage media 110 can also be integrated with processor.Wherein, the storage medium 110 executes application scheme for storing Application code, such as live data processing unit 500 shown in Fig. 9, and execution is controlled by processor 120.Institute Processor 120 is stated for executing the application code stored in the storage medium 110, such as live data processing unit 500, to execute the live data processing method of above method embodiment.

The application can carry out the division of functional module according to above method embodiment to live data processing unit 500, For example, each functional module of each function division can be corresponded to, two or more functions can also be integrated in one In processing module.Above-mentioned integrated module both can take the form of hardware realization, can also use the shape of software function module Formula is realized.It should be noted that be schematical, only a kind of logical function partition to the division of module in the application, it is real There may be another division manner when border is realized.For example, in the case where each function division of use correspondence each functional module, Live data processing unit 500 shown in Fig. 9 is a kind of schematic device, separately below to the live data processing unit The function of 500 each functional module is described in detail.

Parsing module 510 obtains target tone color style for parsing the tone color convert requests received.

Input module 520, for obtaining the first voice data with target tone color style, and the first voice data is defeated Enter into network parameter learning model trained in advance, obtains the corresponding target network parameter of target tone color style.

The network parameter of style conversion module 530, the style switching network for will prestore is adjusted to target network parameter, And style conversion is carried out to the second speech data that main broadcaster inputs according to style switching network adjusted, it obtains with target sound The third voice data of color style.

Sending module 540 is generated, for generating the living broadcast interactive data flow of virtual live streaming image according to third voice data, And it is sent to live streaming reception terminal 300 and plays out.

Since live data processing unit 500 provided by the embodiments of the present application is live data processing method shown in Fig. 2 Another way of realization, and live data processing unit 500 can be used for executing method provided by embodiment shown in Fig. 2, Therefore it, which can be obtained technical effect, can refer to above method embodiment, and details are not described herein.

Further, based on the same inventive concept, the embodiment of the present application also provides a kind of computer readable storage medium, It is stored with computer program on the computer readable storage medium, which executes above-mentioned live streaming when being run by processor The step of data processing method.

Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium Computer program when being run, be able to carry out above-mentioned live data processing method.

The embodiment of the present application is referring to according to the method for the embodiment of the present application, equipment (electronic equipment of such as Fig. 9) and calculating The flowchart and/or the block diagram of machine program product describes.It should be understood that can be realized by computer program instructions flow chart and/or The combination of the process and/or box in each flow and/or block and flowchart and/or the block diagram in block diagram.It can mention For the processing of these computer program instructions to general purpose computer, special purpose computer, Embedded Processor or other programmable datas The processor of equipment is to generate a machine, so that being executed by computer or the processor of other programmable data processing devices Instruction generation refer to for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of fixed function.

Although the application is described in conjunction with each embodiment herein, however, implementing the application claimed In the process, those skilled in the art are by checking the attached drawing, disclosure and the appended claims, it will be appreciated that and it is real Other variations of the existing open embodiment.In the claims, one word of " comprising " is not excluded for other components or step, "a" or "an" is not excluded for multiple situations.Single processor or other units may be implemented to enumerate in claim several Item function.Mutually different has been recited in mutually different dependent certain measures, it is not intended that these measures cannot group close To generate good effect.

More than, the only various embodiments of the application, but the protection scope of the application is not limited thereto, and it is any to be familiar with Those skilled in the art within the technical scope of the present application, can easily think of the change or the replacement, and should all cover Within the protection scope of the application.Therefore, the protection scope of the application shall be subject to the protection scope of the claim.

Claims

1. a kind of live data processing method, which is characterized in that be applied to live streaming and provide terminal, which comprises

First voice data with the target tone color style is obtained, and first voice data is input to preparatory training Network parameter learning model in, obtain the corresponding target network parameter of the target tone color style；

The network parameter of the style switching network prestored is adjusted to the target network parameter, and is turned according to style adjusted Switching network carries out style conversion to the second speech data that main broadcaster inputs, and obtains the third voice with the target tone color style Data；

The living broadcast interactive data flow of virtual live streaming image is generated according to the third voice data, and is sent to live streaming and is received terminal It plays out.

2. live data processing method according to claim 1, which is characterized in that described that first voice data is defeated Enter into network parameter learning model trained in advance, obtains the step of the corresponding target network parameter of the target tone color style Suddenly, comprising:

It is corresponding with reference to style and features figure to extract first voice data；

It is input to described in the network parameter learning model with reference to style and features figure, it is corresponding to obtain the target tone color style Target network parameter.

3. live data processing method according to claim 1, which is characterized in that described to be converted according to style adjusted Network carries out style conversion to the second speech data that main broadcaster inputs, and obtains the third voice number with the target tone color style According to the step of, comprising:

The audio frequency characteristics figure of the second speech data is extracted, the audio frequency characteristics figure includes content characteristic figure；

The content characteristic figure is handled by the style switching network adjusted, obtains that there is the target tone color The style converting characteristic figure of style；

Feature inverse transform is carried out to the content characteristic figure and the style converting characteristic figure, obtains that there is the target tone color style Third voice data.

4. live data processing method described in any one of -3 according to claim 1, which is characterized in that the network parameter Learning model is based on depth using the first speech samples of at least one tone color style and the second speech samples of any main broadcaster The neural metwork training of habit obtains, wherein at least one tone color style includes the target tone color style.

5. live data processing method described in any one of -3 according to claim 1, which is characterized in that described from reception To tone color convert requests in obtain target tone color style before, the method also includes:

The network parameter learning model is obtained previously according to training sample training, is specifically included:

Obtain training sample, the training sample include at least one tone color style the first speech samples and any main broadcaster the Two speech samples, wherein at least one tone color style includes the target tone color style；

Corresponding content characteristic sample graph is extracted from the second speech samples of any main broadcaster；

For every kind of tone color style, corresponding style and features sample graph is extracted from the first speech samples of the tone color style；

Meta learning network is carried out according to the content characteristic sample graph and every kind of tone color style corresponding style and features sample graph Training, obtains the network parameter learning model, and is stored in the live streaming and provides in terminal.

6. live data processing method according to claim 5, which is characterized in that described according to the content characteristic sample Scheme the step of style and features sample graph corresponding with every kind of tone color style is trained meta learning network, comprising:

The corresponding style and features sample graph of every kind of tone color style is input in the meta learning network, every kind of tone color style is obtained Style network parameter；

It is adjusted according to network parameter of the style network parameter of every kind of tone color style to preset style switching network, and will The content characteristic sample graph is input in style switching network adjusted, obtains corresponding style converting characteristic sample graph；

According to the style and features sample graph and the corresponding style converting characteristic sample graph adjustment meta learning of every kind of tone color style The network parameter of network obtains the network parameter learning model.

7. live data processing method according to claim 6, which is characterized in that the wind according to every kind of tone color style The step of lattice feature samples figure and corresponding style converting characteristic sample graph adjust the network parameter of the meta learning network, packet It includes:

Calculate the loss function between the style and features sample graph of every kind of tone color style and corresponding style converting characteristic sample graph Value；

Repetitive exercise after the network parameter of the meta learning network is updated according to the loss function value, until the meta learning net When network meets training termination condition, obtained network parameter learning model is trained in output.

8. live data processing method according to claim 7, which is characterized in that the trained termination condition includes following At least one of condition:

The loss function value no longer declines；

The loss function value is lower than setting value；

Repetitive exercise number reaches setting number.

9. live data processing method according to claim 1, which is characterized in that described according to the third voice data The living broadcast interactive data flow of virtual live streaming image is generated, and is sent to live streaming and receives the step of terminal plays out, comprising:

According to setting time interval by the third voice data cutting be multiple audio data sections；

For each audio data section, identify that the content parameters of the audio data section, the content parameters include content characteristic, feelings Thread feature and amplitude characteristic, the emotional characteristics are used to control the emotional state of the virtual live streaming image, the amplitude characteristic For controlling the shape of the mouth as one speaks folding condition of the virtual live streaming image；

The corresponding virtual live streaming image of the audio data section is generated according to the content characteristic, emotional characteristics and amplitude characteristic Interdynamic video section；

Each audio data section and its corresponding interdynamic video section are synthesized, the live streaming for obtaining the virtual live streaming image is mutual Dynamic data flow, and the living broadcast interactive data flow of the virtual live streaming image is sent to live streaming reception terminal and is played out.

10. a kind of live data processing unit, which is characterized in that be applied to live streaming and provide terminal, described device includes:

Input module, for obtaining the first voice data with the target tone color style, and by first voice data It is input in network parameter learning model trained in advance, obtains the corresponding target network parameter of the target tone color style；

Style conversion module, the network parameter of the style switching network for will prestore are adjusted to the target network parameter, and Style conversion is carried out to the second speech data that main broadcaster inputs according to style switching network adjusted, obtains that there is the target The third voice data of tone color style；

Sending module is generated, for generating the living broadcast interactive data flow of virtual live streaming image according to the third voice data, and Live streaming reception terminal is sent to play out.

11. a kind of electronic equipment, which is characterized in that the electronic equipment includes one or more storage mediums and one or more The processor communicated with storage medium, one or more storage mediums are stored with the executable machine-executable instruction of processor, When electronic equipment operation, processor executes the machine-executable instruction, to realize described in any one of claim 1-9 Live data processing method.

12. a kind of readable storage medium storing program for executing, which is characterized in that the readable storage medium storing program for executing is stored with machine-executable instruction, described Machine-executable instruction, which is performed, realizes live data processing method described in any one of claim 1-9.