CN109872724A

CN109872724A - Virtual image control method, virtual image control device and electronic equipment

Info

Publication number: CN109872724A
Application number: CN201910252003.7A
Authority: CN
Inventors: 王云刚; 徐子豪; 周志颖; 李政
Original assignee: Guangzhou Huya Information Technology Co Ltd
Current assignee: Guangzhou Huya Information Technology Co Ltd
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2019-06-11

Abstract

Virtual image control method, virtual image control device and electronic equipment provided by the present application, are related to direct seeding technique field.In detail, the voice messaging that the application is inputted by obtaining main broadcaster；And speech analysis processing is carried out to the voice messaging, obtain corresponding speech parameter.Then, the speech parameter is converted to by control parameter according to preset parametric inversion algorithm, and is controlled according to the shape of the mouth as one speaks of the control parameter to the virtual image.By the above method, it can improve and there is a problem of that precision is lower to the control of virtual image in the prior art.

Description

Virtual image control method, virtual image control device and electronic equipment

Technical field

This application involves direct seeding technique fields, control in particular to a kind of virtual image control method, virtual image Device and electronic equipment.

Background technique

In the prior art, in order to improve the interest of live streaming, can be existed using the reality image of virtual image substitution main broadcaster It is shown in live streaming picture.But it is generally lower to the control precision of virtual image in existing direct seeding technique, in user It there is a problem that Experience Degree is lower when watching the virtual image shown.

Summary of the invention

In view of this, the application be designed to provide a kind of virtual image control method, virtual image control device and Electronic equipment has that precision is lower to the control of virtual image in the prior art to improve.

To achieve the above object, the embodiment of the present application adopts the following technical scheme that

A kind of virtual image control method, be applied to live streaming equipment, for live streaming picture shown in virtual image into Row control, which comprises

Obtain the voice messaging of main broadcaster's input；

Speech analysis processing is carried out to the voice messaging, obtains corresponding speech parameter；

The speech parameter is converted into control parameter according to preset parametric inversion algorithm, and according to the control parameter pair The shape of the mouth as one speaks of the virtual image is controlled.

It is described that the voice is believed in above-mentioned virtual image control method in the embodiment of the present application preferably selects The step of breath carries out speech analysis processing, obtains corresponding speech parameter, comprising:

The voice messaging is subjected to segment processing, and extracts the language for setting timed length after segmentation in each section of voice messaging Tablet section；

Speech analysis processing is carried out to each sound bite of extraction respectively, obtains the corresponding voice ginseng of each sound bite Number.

It is described to believe the voice in above-mentioned virtual image control method in the embodiment of the present application preferably selects Breath carries out segment processing, and extracts the step of sound bite in length is set in each section of voice messaging after segmentation, specifically:

The sound bite in the voice messaging in the setting length is extracted according at interval of setting length.

It is described to believe the voice in above-mentioned virtual image control method in the embodiment of the present application preferably selects Breath carries out segment processing, and extracts the step of sound bite in duration is set in each section of voice messaging after segmentation, specifically:

Segment processing is carried out to the voice messaging according to the continuity of the voice messaging, and extracts each section of language after segmentation The sound bite in length is set in message breath.

In the embodiment of the present application preferably selects, in above-mentioned virtual image control method, each of described pair of extraction The step of sound bite carries out speech analysis processing, obtains each sound bite corresponding speech parameter, comprising:

Extract the amplitude information of each sound bite；

For each sound bite, the corresponding voice of the sound bite is calculated according to the amplitude information of the sound bite Parameter.

It is described according to the voice sheet in above-mentioned virtual image control method in the embodiment of the present application preferably selects The step of sound bite corresponding speech parameter is calculated in the amplitude information of section, specifically:

It is calculated according to the frame length information of the sound bite and the amplitude information according to normalization algorithm, obtains the language The corresponding speech parameter of tablet section.

In the embodiment of the present application preferably selects, in above-mentioned virtual image control method, the control parameter includes At least one of both lip spacing and corners of the mouth angle between the upper lower lip of the virtual image.

In the embodiment of the present application preferably selects, in above-mentioned virtual image control method, when the control parameter packet When including the lip spacing, the lip spacing is according to the speech parameter and preset maximum lip spacing corresponding with the virtual image It is calculated according to preset parametric inversion algorithm；

When the control parameter includes the corners of the mouth angle, the corners of the mouth angle according to the speech parameter and it is preset with The corresponding maximum corners of the mouth angle of the virtual image is calculated according to preset parametric inversion algorithm.

In the embodiment of the present application preferably selects, in above-mentioned virtual image control method, when the control parameter packet When including the lip spacing, the maximum lip spacing is arranged according to the lip spacing of the main broadcaster；

When the control parameter includes the corners of the mouth angle, the maximum corners of the mouth angle is according to the corners of the mouth angle of the main broadcaster Degree setting.

The embodiment of the present application also provides a kind of virtual image control devices, are applied to live streaming equipment, for in live streaming Virtual image controlled, described device includes:

Voice obtains module, for obtaining the voice messaging of main broadcaster's input；

Speech analysis module obtains corresponding speech parameter for carrying out speech analysis processing to the voice messaging；

Shape of the mouth as one speaks control module, for the speech parameter to be converted to control parameter according to preset parametric inversion algorithm, And it is controlled according to the shape of the mouth as one speaks of the control parameter to the virtual image.

On the basis of the above, the embodiment of the present application also provides a kind of electronic equipment, including memory, processor and storage It is real when which runs on the processor in the memory and the computer program that can run on the processor The step of showing above-mentioned virtual image control method.

On the basis of the above, the embodiment of the present application also provides a kind of computer readable storage mediums, are stored thereon with meter Calculation machine program, the program are performed the step of realizing above-mentioned virtual image control method.

Virtual image control method, virtual image control device and electronic equipment provided by the present application, by obtaining main broadcaster Voice messaging, and the shape of the mouth as one speaks of virtual image is controlled based on the voice messaging and preset parametric inversion algorithm so that The shape of the mouth as one speaks consistency with higher of the voice and virtual image that are played in live streaming, so as to improve in the prior art because to virtual The control precision of image is lower and leads to the unmatched problem of the shape of the mouth as one speaks of the voice and virtual image that play when live streaming, and then effectively Improve user experience in ground.

To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.

Detailed description of the invention

Fig. 1 is the block diagram of electronic equipment provided by the embodiments of the present application.

Fig. 2 is the flow diagram of virtual image control method provided by the embodiments of the present application.

Fig. 3 is the flow diagram for the sub-step that step S130 includes in Fig. 2.

Fig. 4 is the flow diagram for the sub-step that step S133 includes in Fig. 3.

Fig. 5 is the schematic diagram of 20 frame voice data provided by the embodiments of the present application.

Fig. 6 is the lip spacing of virtual image provided by the embodiments of the present application and the schematic diagram of corners of the mouth angle.

Fig. 7 is the interaction schematic diagram of live streaming equipment provided by the embodiments of the present application.

Fig. 8 is a kind of schematic diagram at live streaming interface provided by the embodiments of the present application.

Fig. 9 is the block diagram for the functional module that virtual image control device provided by the embodiments of the present application includes.

Icon: 10- electronic equipment；12- memory；14- processor；20- first terminal；30- second terminal；40- service Device；100- virtual image control device；110- voice obtains module；130- speech analysis module；150- shape of the mouth as one speaks control module.

Specific embodiment

To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment only It is a part of the embodiment of the application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings The component of embodiment can be arranged and be designed with a variety of different configurations.

Therefore, the detailed description of the embodiments herein provided in the accompanying drawings is not intended to limit below claimed Scope of the present application, but be merely representative of the selected embodiment of the application.Based on the embodiment in the application, this field is common Technical staff's every other embodiment obtained without creative efforts belongs to the model of the application protection It encloses.

It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.In the description of the present application In, term " first ", " second ", " third ", " the 4th " etc. are only used for distinguishing description, and should not be understood as only or imply opposite Importance.

As shown in Figure 1, the embodiment of the present application provides a kind of electronic equipment 10.Wherein, which can be used as A kind of live streaming equipment, for example, it may be the terminal device (such as mobile phone, tablet computer, computer) that main broadcaster uses in live streaming, Can be and main broadcaster live streaming when using terminal equipment communication connection background server.

In detail, the electronic equipment 10 may include memory 12, processor 14 and virtual image control device 100. It is directly or indirectly electrically connected between the memory 12 and processor 14, to realize the transmission or interaction of data.For example, phase It can be realized and be electrically connected by one or more communication bus or signal wire between mutually.The virtual image control device 100 wraps Include at least one software function module that can be stored in the form of software or firmware (firmware) in the memory 12.Institute Processor 14 is stated for executing the executable computer program stored in the memory 12, for example, the virtual image control Software function module and computer program etc. included by device 100 processed, to realize virtual image control method, and then guaranteeing can To carry out the control of degree of precision to virtual image.

Wherein, the memory 12 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc.. Wherein, memory 12 is for storing program, and the processor 14 executes described program after receiving and executing instruction.

The processor 14 may be a kind of IC chip, the processing capacity with signal.Above-mentioned processor 14 It can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP), system on chip (System on Chip, SoC) etc.；It can also be digital signal processor (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate Or transistor logic, discrete hardware components.May be implemented or execute disclosed each method in the embodiment of the present application, Step and logic diagram.General processor can be microprocessor or the processor is also possible to any conventional processor Deng.

It is appreciated that structure shown in FIG. 1 is only to illustrate, the electronic equipment 10 may also include more than shown in Fig. 1 Perhaps less component or with the configuration different from shown in Fig. 1, for example, it is also possible to include for other live streaming equipment Carry out the communication unit of information exchange.Wherein, each component shown in Fig. 1 can be realized using hardware, software, or its combination.

In conjunction with Fig. 2, the embodiment of the present application also provides a kind of virtual image controlling party that can be applied to above-mentioned electronic equipment 10 Method, for controlling virtual image shown in live streaming picture.Wherein, the related process of the virtual image control method Defined method and step can be realized by the electronic equipment 10.Detailed process shown in Fig. 2 will be explained in detail below It states.

Step S110 obtains the voice messaging of main broadcaster's input.

In the present embodiment, the electronic equipment 10 can be by voice capture device (such as the microphone of mobile phone or connection Microphone etc.), the voice messaging of main broadcaster's input is obtained in real time.In a kind of example, if the electronic equipment 10 is main broadcaster's use Terminal device, the language of main broadcaster's input can be directly obtained by the microphone of connection or the voice capture devices such as included microphone Message breath.In another example, if the electronic equipment 10 is background server, the terminal device that main broadcaster uses gets master After the voice messaging broadcast, which can be sent to the background server.

Step S130 carries out speech analysis processing to the voice messaging, obtains corresponding speech parameter.

In the present embodiment, after getting voice messaging by step S110, which can be analyzed Processing, to obtain corresponding speech parameter.Wherein, it to guarantee to analyze obtained speech parameter accuracy with higher, is holding Before row step S130, voice messaging can also be pre-processed, which, which illustrates, is described below.

Firstly, acquisition voice messaging can be converted to the voice messaging of narrowband by method for resampling；Then, pass through band logical Filter is filtered to obtain the voice messaging for the passband that frequency belongs to the bandpass filter to voice messaging is obtained, In, the passband of the bandpass filter is determined based on the fundamental frequency and formant of the sound of people；Finally, passing through audio defeat algorithm pair The audio data of the user of acquisition carries out noise filtering processing.

It should be noted that a cut-off can be set in view of the fundamental frequency of the sound of people typically belongs to (90,600) Hz Frequency is the high-pass filter of 60Hz.Then, with regard to fundamental frequency and formant (may include the first formant, the second formant and Third formant) it is recognised that the main frequency of the sound of people is in 3kHz hereinafter, therefore, a cutoff frequency can be set For the low-pass filter of 3kHz.That is, the high-pass filtering that the bandpass filter can be 60Hz by a cutoff frequency The low-pass filter that device and a cutoff frequency are 3kHz forms, so that frequency is not belonging to the voice messaging energy of (60,3000) Hz It is enough to be effective filtered out, efficiently avoid the problem of ambient noise interferes speech analysis processing.

The speech parameter is converted to control parameter according to preset parametric inversion algorithm by step S150, and according to this Control parameter controls the shape of the mouth as one speaks of the virtual image.

In the present embodiment, after obtaining speech parameter by step S130, preset parametric inversion algorithm can be based on The voice messaging is converted into corresponding control parameter, then, is controlled based on the shape of the mouth as one speaks of the control parameter to virtual image.

By the above method, the voice messaging that can be inputted based on main broadcaster controls the shape of the mouth as one speaks of virtual image, so that The shape of the mouth as one speaks consistency with higher of the voice and virtual image that play in live streaming, effectively improves the experience of user.And And since the shape of the mouth as one speaks of virtual image is determined based on voice messaging, that is to say, that the corresponding shape of the mouth as one speaks of different voice messagings is not Together, the variation based on the shape of the mouth as one speaks can also improve smart when virtual image is broadcast live, to further increase the interest of live streaming.

Optionally, it is unrestricted with the concrete mode being analyzed and processed to voice messaging to execute step S130, Ke Yigen It is selected according to practical application request.For example, in the example that one kind can substitute, step S130 may include step in conjunction with Fig. 3 Rapid S131 and step S133, particular content are as described below.

The voice messaging is carried out segment processing by step S131, and is set in each section of voice messaging after extracting segmentation Sound bite in length.

In the present embodiment, segment processing can be carried out to the voice messaging based on preset rules, obtains at least one section Voice messaging.Then, for each section of voice messaging extract in this section of voice messaging set length in sound bite, obtain to A few sound bite.

Wherein, the setting length is either length in time, for example, it may be 1s, 2s, 3s etc.；It can also be with It is the length in other dimensions, for example, it may be being based on corresponding number of words (such as 2 words, 3 words, 4 words).

Step S133 carries out speech analysis processing to each sound bite of extraction respectively, obtains each sound bite pair The speech parameter answered.

It in the present embodiment, can be to each sound bite after obtaining at least one sound bite by step S133 Speech analysis processing is carried out respectively, obtains the corresponding speech parameter of each sound bite.Accordingly, at least one available language Sound parameter.

Wherein, unrestricted with the specific side for carrying out voice messaging segment processing in execution step S131, it can be according to reality Border application demand is selected.In the example that one kind can substitute, step S131 is specifically as follows: growing according at interval of setting Degree extracts the sound bite in the voice messaging in the setting length.For example, the voice messaging obtained can be 1s, setting length Degree can be 0.2s, and accordingly, after segment processing, available length is 5 sound bites of 0.2s.In another example obtaining The voice messaging taken can be able to be 5 words for the length of 20 words, setting, accordingly, available after segment processing Length is 4 sound bites of 5 words.

In the example that another kind can substitute, segment processing can be carried out based on the continuity of voice messaging.In detail, Step S131 is specifically as follows: carrying out segment processing to the voice messaging according to the continuity of the voice messaging, and extracts and divide The sound bite in length is set after section in each section of voice messaging.

That is, can be identified to the voice messaging after getting the voice messaging, to judge the section With the presence or absence of pausing, (mode for carrying out pause judgement be can be, and analyze the waveform of voice messaging, if wave in voice messaging There is interruption in shape and intermittent duration is greater than preset duration, it is possible to determine that there is pause).For example, if voice messaging is " today Live streaming finishes, our tomorrows ... ", then, by being identified to the voice messaging, it can be determined that go out in the position of ", " It sets and has occurred pausing, therefore, available one section of voice messaging is " live streaming of today finishes ".It is then possible in this section of language It is the sound bite for setting length that length is extracted in message breath.Wherein, the specific size of the setting length is unrestricted, Ke Yigen It is selected according to practical application request.

For example, setting length can be less than the length (example of corresponding segment voice messaging in the example that one kind can substitute Such as, the length of one section of voice messaging is 0.8s, and setting length can be 0.6s；In other words, the length of one section of voice messaging is 8 Word, setting length can be 6 words) so that the data volume of obtained sound bite is less than the number of the voice messaging of main broadcaster's input According to amount, to reduce the treating capacity or operand of data when executing step S133 and step S150, and then effectively guarantee virtual The live streaming real-time with higher of image.Also, it since the treating capacity of data reduces, can also reduce to 14 performance of processor Requirement be convenient for marketing to improve the adaptability of the virtual image control method.

It should be noted that can be configured when carrying out segment processing to voice based on continuity for each section of voice messaging Different setting length.In the example that one kind can substitute, correspondence can be respectively configured based on the length of each section of voice messaging Setting length.For example, the setting duration of configuration can be 0.3s if the length of one section of voice messaging is 0.6s (or 6 words) (or 3 words)；If the length of one section of voice messaging is 0.4s (or 4 words), the setting duration of configuration can be (or 2 for 0.2s Word).In another example if the length of one section of voice messaging is 0.6s (or 6 words), the setting duration of configuration can be (or 5 for 0.5s Word)；If the length of one section of voice messaging is 0.4s (or 4 words), the setting duration of configuration can be 0.3s (or 3 words).

Also, after the length configuration of setting length, initial position (such as initial time or starting of the setting length Word) or rest position (cut-off time or cut-off word) it is unrestricted, can be configured according to practical application request.For example, In a kind of example that can be substituted, it can extract initial time in one section of voice messaging or cut-off time is any time One sound bite.

In another example in the example that another kind can substitute, it can be in one section of voice messaging, extracting cut-off time is to be somebody's turn to do The sound bite of the cut-off time of section voice messaging.For example, the initial time of one section of voice messaging be " 15h:40min: 10.23s ", cut-off time are " 15h:40min:10.99s ", when setting a length of 0.50s, then the cut-off of the sound bite extracted Moment is " 15h:40min:10.99s ", initial time is " 15h:40min:10.49s ".

Pass through above-mentioned setting, it is ensured that each section of voice messaging is in the content and corresponding shape of the mouth as one speaks tool close to cut-off time There is consistency, so that while reducing the operand of data, moreover it is possible to which so that spectators is not easy discovery, there are voice messagings and the shape of the mouth as one speaks Not corresponding situation, the shape of the mouth as one speaks of reduction main broadcaster more true to nature, effectively to guarantee spectators' Experience Degree with higher.For example, In above-mentioned example " live streaming of today finishes, our tomorrows ... ", if can guarantee " finishing " corresponding voice and mouth Type is with uniformity, and spectators can be made to be not easy to find or ignore " live streaming of today " corresponding voice and the shape of the mouth as one speaks, and there are inconsistent The problem of, so that spectators think the shape of the mouth as one speaks consistency with higher of the voice and virtual image that play when live streaming.

It should be noted that when carrying out segment processing to voice messaging based on continuity, after having detected pause, if The voice messaging of main broadcaster's input is detected again, such as above-mentioned example " live streaming of today finishes, our tomorrows ... " In, after a pause, the voice messaging of " our tomorrows ... " is detected once more, at this time the body to further increase spectators Degree of testing can also extract the sound bite of preset length in the voice messaging.For example, can be with the starting of the voice messaging when The sound bite that preset length (such as 0.4s) is extracted for initial time is carved, alternatively, can be with the first character of the voice messaging A sound bite of preset length (such as 2 two words) is extracted for banner word.

That is, stem and tail in this section of voice messaging can be respectively obtained for each section of obtained voice messaging Two sound bites in portion, and the shape of the mouth as one speaks of virtual image is controlled based on two sound bites.For example, showing in above-mentioned In example " live streaming of today finishes " example, " today " and " finishing " two sound bites can be extracted, so that two languages The corresponding content of tablet section and the shape of the mouth as one speaks are with uniformity, so that spectators think that " live streaming of today finishes corresponding content and mouth Type is all with uniformity.

It should be noted that in the examples described above, if extracting two sound bites respectively for one section of voice messaging, this two The length of a sound bite may be the same or different.Wherein, it in the length difference of two sound bites, can be The length of the sound bite of tail portion is greater than the length of the sound bite of stem.

Further, it is also unrestricted with the concrete mode for carrying out speech analysis processing to execute step S133, it can basis Practical application request is selected.For example, can based in voice messaging amplitude information and/or frequency information carry out at analysis Reason.

In detail, in the example that one kind can substitute, speech analysis processing can be carried out based on amplitude information.In conjunction with figure 4, step S133 may include step S133a and step S133b, and particular content is as described below.

Step S133a extracts the amplitude information of each sound bite.

In the present embodiment, after obtaining at least one sound bite by step S131, each language can first be extracted The amplitude information of tablet section.

The sound bite is calculated according to the amplitude information of the sound bite for each sound bite in step S133b Corresponding speech parameter.

In the present embodiment, after obtaining the amplitude information of each sound bite by step S131, the vibration can be based on Width information calculates the corresponding speech parameter of each sound bite.Wherein, which can be any one of (0,1) section Value, that is to say, that obtained amplitude information can be handled based on normalization algorithm, to obtain corresponding speech parameter.

In detail, it can be calculated, be obtained according to normalization algorithm according to the frame length information and amplitude information of sound bite To the corresponding speech parameter of the sound bite.

Wherein, it should be noted that different based on the mode for extracting sound bite, the length of obtained sound bite is not Together, the mode for calculating the speech parameter of each sound bite can also be different.For example, if a sound bite is longer, it can be with needle Each frame voice data in the sound bite is calculated separately to obtain a speech parameter；If a sound bite is shorter, can Therefore, a speech parameter can be calculated based on the frame voice data, as the voice sheet as a frame voice data The corresponding speech parameter of section.

That is, it can be directed to each frame voice data, frame length information and the amplitude letter based on the frame voice data The numerical value for belonging to (0,1) section is calculated according to normalization algorithm in breath, and the numerical value is corresponding as the frame voice data Speech parameter.For example, 20 frame languages can be extracted in above-mentioned example " live streaming of today finishes, our tomorrows ... " Then sound data are normalized calculating to the amplitude of each frame voice data, obtain 20 numerical value, as the 12 frame voice Corresponding 20 speech parameters (as shown in Figure 5) of data.

Wherein, the particular content of normalization algorithm is unrestricted, can be selected according to practical application request.For example, The quadratic sum of the amplitude information at each moment in a frame voice data, then, the frame based on the frame voice data can first be calculated The quadratic sum mean value of the long amplitude information for calculating the frame voice data, and extraction of square root operation is carried out to the quadratic sum mean value and is obtained pair The speech parameter answered.

Optionally, it is unrestricted with the concrete mode that speech parameter is converted to control parameter to execute step S150, it can be with It is selected according to practical application request.That is, the particular content of the parametric inversion algorithm is unrestricted.For example, root Different according to the particular content of the control parameter, the particular content of the parametric inversion algorithm can also be different.

Wherein, the control parameter may include, but the lip spacing that is not limited between the upper lower lip of virtual image and At least one of both corners of the mouth angles.

In detail, when the control parameter includes the lip spacing, the lip spacing can according to the speech parameter and Preset maximum lip spacing corresponding with the virtual image is calculated according to preset parametric inversion algorithm.When the control When parameter includes the corners of the mouth angle, which can be according to the speech parameter and the preset and virtual image pair The maximum corners of the mouth angle answered is calculated according to preset parametric inversion algorithm.

For example, the control parameter can wrap if the maximum lip spacing is 5cm, the speech parameter after normalization is 0.5 Include 0.5*5=2.5cm, that is to say, that can control at this time lip spacing between the upper lower lip of virtual image be 2.5cm (such as It is shown in fig. 6 h).Similarly, if the maximum corners of the mouth angle is 120 °, the speech parameter after normalization is 0.5, the control ginseng Number may include 0.5*120=60 °, that is to say, that the corners of the mouth angle that can control virtual image at this time is 60 ° (as shown in Figure 6 A).

Optionally, when the control parameter includes the lip spacing, the specific value of the maximum lip spacing is unrestricted System, can be configured according to practical application request.In the example that one kind can substitute, the maximum lip spacing can be based on The lip spacing of main broadcaster is arranged.

For example, it is directed to A main broadcaster, it, can be corresponding by the main broadcaster if the maximum lip spacing of the main broadcaster is 5cm by test The maximum lip spacing of virtual image is set as 5cm；It,, can if the maximum lip spacing of the main broadcaster is 6cm by test for B main broadcaster To set 6cm for the maximum lip spacing of the corresponding virtual image of the main broadcaster.

Similarly, when the control parameter includes the corners of the mouth angle, the specific value of the maximum corners of the mouth angle not by Limitation, can be configured according to practical application request.In the example that can substitute of one kind, the maximum corners of the mouth angle can be with Corners of the mouth angle setting based on main broadcaster.

For example, it is directed to A main broadcaster, it, can be corresponding by the main broadcaster if the maximum corners of the mouth angle of the main broadcaster is 120 ° by test The maximum corners of the mouth angle of virtual image be set as 120 °；For B main broadcaster, by test, if the maximum corners of the mouth angle of the main broadcaster It is 135 °, 135 ° can be set by the maximum corners of the mouth angle of the corresponding virtual image of the main broadcaster.

By above-mentioned setting, the shape of the mouth as one speaks of virtual image and the practical shape of the mouth as one speaks of corresponding main broadcaster can be allowed with higher Consistency, to realize image display more true to nature in live streaming.Also, due to different main broadcasters generally have it is different most Big lip spacing and maximum corners of the mouth angle, therefore, the maximum lip spacing of the corresponding virtual image of different main broadcasters and maximum corners of the mouth angle Degree also can be different, so that spectators are when watching the live streaming of the different corresponding virtual images of main broadcaster, it can be seen that the different shape of the mouth as one speaks (maximum lip spacing and/or maximum corners of the mouth angle are different), so that virtual image has higher smart, and then improve live streaming Interest can further increase the experience of spectators.

It should be noted that the electronic equipment 10 of above-mentioned virtual image control method application is either main broadcaster end is corresponding Terminal device is also possible to be applied to the corresponding background server of server-side.For example, as shown in fig. 7, present embodiments providing one Kind of live broadcast system, the live broadcast system may include first terminal 20, second terminal 30 and respectively with the first terminal 20 and this The server 40 of two terminals 30 communication connection.

In detail, the first terminal 20 can be used as the corresponding terminal device in main broadcaster end, and the second terminal 30 can be with As the corresponding terminal device of viewer end.Wherein, the first terminal 20 can also be connected with microphone.It is broadcast live in main broadcaster When, the first terminal 20 acquires the voice messaging of main broadcaster's input by the microphone.On the one hand, the first terminal 20 can With the virtual image control method above-mentioned by execution, to control the virtual image, to generate corresponding video Stream, and the second terminal 30 is sent to by the server 40, so that the second terminal 30 can play the video flowing, into And realize the live streaming (as shown in Figure 8) based on virtual image.

On the other hand, the voice messaging can also be sent to the server 40, the service by the first terminal 20 Device 40 can execute above-mentioned virtual image control method, to control the virtual image, to generate corresponding view Frequency flows, and is respectively sent to the first terminal 20 and the second terminal 30 by the server 40, so that first end End 20 and the second terminal 30 can play the video flowing, and then realize the live streaming based on virtual image.

In conjunction with Fig. 9, the embodiment of the present application also provides a kind of virtual image control dress that can be applied to above-mentioned electronic equipment 10 100 are set, for controlling virtual image shown in live streaming picture.Wherein, the virtual image control device 100 can be with Module 110, speech analysis module 130 and shape of the mouth as one speaks control module 150 are obtained including voice.

The voice obtains module 110, for obtaining the voice messaging of main broadcaster's input.In the present embodiment, the voice Obtaining module 110 can be used for executing step S110 shown in Fig. 2, and the related content for obtaining module 110 about the voice can be with Referring to the description to step S110 above.

The speech analysis module 130 obtains corresponding voice for carrying out speech analysis processing to the voice messaging Parameter.In the present embodiment, the speech analysis module 130 can be used for executing step S130 shown in Fig. 2, about the voice The related content of analysis module 130 is referred to the description to step S130 above.

The shape of the mouth as one speaks control module 150, for the speech parameter to be converted to control according to preset parametric inversion algorithm Parameter processed, and controlled according to the shape of the mouth as one speaks of the control parameter to the virtual image.In the present embodiment, the shape of the mouth as one speaks control Module 150 can be used for executing step S150 shown in Fig. 2, and the related content about the shape of the mouth as one speaks control module 150 is referred to Above to the description of step S150.

In detail, in the present embodiment, the speech analysis module 130 may include information segmenting submodule and information point Analyse submodule.

The information segmenting submodule for the voice messaging to be carried out segment processing, and extracts after segmentation each section The sound bite in length is set in voice messaging.In the present embodiment, the information segmenting submodule can be used for executing Fig. 3 institute The step S131 shown, the related content about the information segmenting submodule are referred to the description to step S131 above.

The information analysis submodule is obtained for carrying out speech analysis processing respectively to each sound bite of extraction The corresponding speech parameter of each sound bite.In the present embodiment, the information analysis submodule can be used for executing shown in Fig. 3 Step S133, the related content about the information analysis submodule are referred to the description to step S133 above.

Wherein, different based on the mode being segmented to the voice messaging, the specific work of the information segmenting submodule With can be different.For example, the information segmenting submodule is specifically used in the example that can substitute of one kind: according at interval of Setting length extracts the sound bite in the voice messaging in the setting length.In another example showing what another kind can substitute In example, the information segmenting submodule is specifically used for: being segmented according to the continuity of the voice messaging to the voice messaging Processing, and extract the sound bite set in each section of voice messaging after segmentation in length.

It should be noted that the above two different effect of the information segmenting submodule is referred to above to described The respective explanations of virtual image control method illustrate that this is no longer going to repeat them.

In the embodiment of the present application, correspond to Fig. 2-virtual image control method shown in Fig. 8, additionally provide a kind of calculating Machine readable storage medium storing program for executing is stored with computer program in the computer readable storage medium, which executes when running Each step of above-mentioned virtual image control method.

Wherein, each step executed when aforementioned computer program is run, this is no longer going to repeat them, can refer to above to institute State the explanation of virtual image control method.

In conclusion virtual image control method provided by the present application, virtual image control device and electronic equipment, pass through The voice messaging of main broadcaster is obtained, and the shape of the mouth as one speaks of virtual image is controlled based on the voice messaging and preset parametric inversion algorithm System, so that the shape of the mouth as one speaks consistency with higher of the voice and virtual image played in live streaming, so as to improve in the prior art Lead to the unmatched problem of the shape of the mouth as one speaks of the voice and virtual image that play when live streaming because the control precision to virtual image is lower, And then effectively improve user experience.

In several embodiments provided by the embodiment of the present application, it should be understood that disclosed device and method, it can also To realize by another way.Device and method embodiment described above is only schematical, for example, in attached drawing Flow chart and block diagram show that the devices of multiple embodiments according to the application, method and computer program product are able to achieve Architecture, function and operation.In this regard, each box in flowchart or block diagram can represent module, a program A part of section or code, a part of the module, section or code include that one or more is patrolled for realizing defined Collect the executable instruction of function.It should also be noted that in some implementations as replacement, function marked in the box It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.

In addition, each functional module in each embodiment of the application can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.

It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, electronic equipment or network equipment etc.) execute all or part of step of each embodiment the method for the application Suddenly.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), deposits at random The various media that can store program code such as access to memory (RAM, Random Access Memory), magnetic or disk. It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to the packet of nonexcludability Contain, so that the process, method, article or equipment for including a series of elements not only includes those elements, but also including Other elements that are not explicitly listed, or further include for elements inherent to such a process, method, article, or device. In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including the element Process, method, article or equipment in there is also other identical elements.

The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.

Claims

1. a kind of virtual image control method, which is characterized in that be applied to live streaming equipment, for void shown in live streaming picture Quasi- image is controlled, which comprises

Obtain the voice messaging of main broadcaster's input；

The speech parameter is converted into control parameter according to preset parametric inversion algorithm, and according to the control parameter to described The shape of the mouth as one speaks of virtual image is controlled.

2. virtual image control method according to claim 1, which is characterized in that described to carry out language to the voice messaging The step of cent analysis is handled, and obtains corresponding speech parameter, comprising:

The voice messaging is subjected to segment processing, and extracts the voice sheet set in each section of voice messaging after segmentation in length Section；

Speech analysis processing is carried out to each sound bite of extraction respectively, obtains the corresponding speech parameter of each sound bite.

3. virtual image control method according to claim 2, which is characterized in that described to be divided the voice messaging Section processing, and the step of sound bite in length is set in each section of voice messaging after segmentation is extracted, specifically:

4. virtual image control method according to claim 2, which is characterized in that described to be divided the voice messaging Section processing, and the step of sound bite in length is set in each section of voice messaging after segmentation is extracted, specifically:

Segment processing is carried out to the voice messaging according to the continuity of the voice messaging, and extracts each section of voice letter after segmentation The sound bite in length is set in breath.

5. virtual image control method according to claim 2, which is characterized in that each sound bite of described pair of extraction The step of carrying out speech analysis processing, obtaining each sound bite corresponding speech parameter, comprising:

Extract the amplitude information of each sound bite；

For each sound bite, the corresponding voice of the sound bite is calculated according to the amplitude information of the sound bite and is joined Number.

6. virtual image control method according to claim 5, which is characterized in that the amplitude according to the sound bite The step of sound bite corresponding speech parameter is calculated in information, specifically:

It is calculated according to the frame length information of the sound bite and the amplitude information according to normalization algorithm, obtains the voice sheet The corresponding speech parameter of section.

7. virtual image control method described in -6 any one according to claim 1, which is characterized in that the control parameter packet Include at least one of both lip spacing and corners of the mouth angles between the upper lower lip of the virtual image.

8. virtual image control method according to claim 7, which is characterized in that when the control parameter includes the lip When spacing, the lip spacing is according to the speech parameter and preset maximum lip spacing corresponding with the virtual image according to default Parametric inversion algorithm be calculated；

When the control parameter includes the corners of the mouth angle, the corners of the mouth angle according to the speech parameter and it is preset with it is described The corresponding maximum corners of the mouth angle of virtual image is calculated according to preset parametric inversion algorithm.

9. virtual image control method according to claim 8, which is characterized in that when the control parameter includes the lip When spacing, the maximum lip spacing is arranged according to the lip spacing of the main broadcaster obtained in advance；

When the control parameter includes the corners of the mouth angle, the maximum corners of the mouth angle is according to the main broadcaster's obtained in advance The setting of corners of the mouth angle.

10. a kind of virtual image control device, which is characterized in that be applied to live streaming equipment, for the virtual image in live streaming It is controlled, described device includes:

Shape of the mouth as one speaks control module, for the speech parameter to be converted to control parameter, and root according to preset parametric inversion algorithm It is controlled according to the shape of the mouth as one speaks of the control parameter to the virtual image.

11. a kind of electronic equipment, which is characterized in that including memory, processor and be stored in the memory and can be at this The computer program run on reason device, realizes claim 1-9 any one when which runs on the processor The step of virtual image control method.

12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is performed The step of realizing virtual image control method described in claim 1-9 any one.