CN110392273A

CN110392273A - Method, apparatus, electronic equipment and the storage medium of audio-video processing

Info

Publication number: CN110392273A
Application number: CN201910641537.9A
Authority: CN
Inventors: 李美卓; 范威
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-07-16
Filing date: 2019-07-16
Publication date: 2019-10-29
Anticipated expiration: 2039-07-16
Also published as: CN110392273B

Abstract

The embodiment of the present disclosure provides a kind of audio/video processing method, device, electronic equipment and storage medium, the method is applied to server, include: obtain Virtual Space in the first electronic equipment issue dub instruction, wherein, the first electronic equipment is with the electronic equipment that permission is broadcast live in the Virtual Space；Corresponding preset of instruction is dubbed described in determination dubs type；It determines wait match audio-video；When dubbing sign on of the first electronic equipment sending is obtained, is dubbed described in type broadcasting according to described preset wait match the corresponding no voice video of audio-video；During playing the no voice video, the acquisition no voice video is corresponding to dub audio, while the audio of dubbing is sent to the second electronic equipment, wherein the second electronic equipment is the electronic equipment with the viewing live streaming permission in the Virtual Space.It uses this programme user that can interact in a manner of dubbing in Virtual Space, increases the diversity of interaction mode, user experience is improved.

Description

Method, apparatus, electronic equipment and the storage medium of audio-video processing

Technical field

This disclosure relates to field of computer technology, more particularly to a kind of audio/video processing method, device, electronic equipment and Storage medium.

Background technique

Network direct broadcasting is rapidly developed in recent years, and obtains liking for people.In network direct broadcasting field, live streaming is installed The terminal of application program is properly termed as user terminal, and the user terminal that main broadcaster's live streaming is watched during live streaming is then viewer end.

When carrying out network direct broadcasting, main broadcaster can be broadcast live in several ways, can also be with spectators or other main broadcasters It is interacted.For example, spectators can with main broadcaster chat, gifts between main broadcaster, each main broadcaster can company of progress wheat live streaming, company Wheat battle etc..But at present in network direct broadcasting, no matter between main broadcaster and spectators or between main broadcaster and main broadcaster, interaction mode list One.

Summary of the invention

To overcome the problems in correlation technique, the embodiment of the present disclosure provide the processing method of audio-video a kind of, device, Electronic equipment and storage medium.Specific technical solution is as follows:

According to the first aspect of the embodiments of the present disclosure, a kind of processing method of audio-video is provided, server is applied to, it is described Method includes:

Obtain Virtual Space in the first electronic equipment issue dub instruction, wherein first electronic equipment for The electronic equipment of permission is broadcast live in the Virtual Space；

Corresponding preset of instruction is dubbed described in determination dubs type；

It determines wait match audio-video；

When dubbing sign on of the first electronic equipment sending is obtained, is dubbed described in type broadcasting according to described preset Wait match the corresponding no voice video of audio-video；

During playing the no voice video, obtain that the no voice video is corresponding to dub audio, while by institute It states and dubs audio and be sent to the second electronic equipment, wherein second electronic equipment is with watching in the Virtual Space The electronic equipment of permission is broadcast live.

As an implementation, the default type of dubbing is main broadcaster's show type；

It is described to preset the step of dubbing described in type broadcasting to no voice video corresponding with audio-video, packet according to described It includes:

It controls first electronic equipment and second electronic equipment while playing described wait match the corresponding nothing of audio-video Voice video.

As an implementation, the default type of dubbing is that more main broadcasters fight type；

Determine the corresponding battle sequence of corresponding first electronic equipment of each main broadcaster；

According to the battle sequence, controls first electronic equipment and its corresponding second electronic equipment plays in order institute It states wait match the corresponding no voice video of audio-video.

As an implementation, the default type of dubbing is that more people dub type；

Corresponding each second electronic equipment of user in instant messaging region is controlled in the Virtual Space, while described in playing Wait match the corresponding no voice video of audio-video.

As an implementation, user corresponding each second in instant messaging region in the control Virtual Space Electronic equipment, at the same play it is described to no voice video corresponding with audio-video the step of, comprising:

When obtaining the broadcast message that first electronic equipment is sent, send described wait match audio-video and sign on extremely Corresponding each second electronic equipment of user in instant messaging region in the Virtual Space, so that each second electronic equipment exists When receiving the sign on, while playing described wait match the corresponding no voice video of audio-video.

As an implementation, the step of determination is wait match audio-video, comprising:

Obtain the video that first electronic equipment uploads；

The video of the upload is determined as wait match audio-video.

As an implementation, the acquisition modes of the no voice video, comprising:

Determine the corresponding amplitude spectrum of audio signal wait match audio-video；

The amplitude spectrum is inputted into the network model that training is completed in advance, is obtained described to be covered with the corresponding voice of audio-video Film matrix, wherein the network model is based on the amplitude spectrum sample and its corresponding voice exposure mask matrix obtained in advance trained It arrives, the network model includes the corresponding relationship of amplitude spectrum Yu voice exposure mask matrix；

Using the voice exposure mask matrix and the amplitude spectrum, unmanned acoustic amplitude spectrum is calculated；

It is determined based on the unmanned acoustic amplitude spectrum described wait match the corresponding no voice video of audio-video.

As an implementation, the acquisition modes of the no voice video, comprising:

The amplitude spectrum is inputted into the network model that training is completed in advance, is obtained described wait match the corresponding no voice of audio-video Audio, wherein the network model is obtained based on the amplitude spectrum sample and its corresponding unmanned sound audio training obtained in advance, institute State the corresponding relationship that network model includes amplitude spectrum Yu unmanned sound audio；

It is determined based on the unmanned sound audio described wait match the corresponding no voice video of audio-video.

According to the second aspect of an embodiment of the present disclosure, a kind of processing method of audio-video is provided, is set applied to the first electronics It is standby, wherein first electronic equipment is with the electronic equipment that permission is broadcast live in Virtual Space, which comprises

Acquisition dubs instruction in the Virtual Space；

It determines wait match audio-video；

When sign on is dubbed in acquisition, according to it is described it is default dub type play it is described to audio-video it is corresponding nobody Sound video；

During playing the no voice video, obtain that the no voice video is corresponding to dub audio, while by institute It states and dubs audio and be sent to server.

Broadcasting is described wait match the corresponding no voice video of audio-video, and broadcasting is described wait match simultaneously for the second electronic equipment of control The corresponding no voice video of audio-video, wherein second electronic equipment is with watching broadcasting right in the Virtual Space The electronic equipment of limit.

The broadcast message of transmission is to the server, so that server transmission is described wait match audio-video and sign on Corresponding each second electronic equipment of user into instant messaging region in the Virtual Space, so that each second electronic equipment When receiving the sign on, while playing described wait match the corresponding no voice video of audio-video.

Obtain the video that user uploads；

The video of the upload is determined as wait match audio-video.

As an implementation, the acquisition modes of the no voice video, comprising:

According to the third aspect of an embodiment of the present disclosure, a kind of processing method of audio-video is provided, is set applied to the second electronics It is standby, wherein second electronic equipment is the electronic equipment with the viewing live streaming permission in the Virtual Space, the method Include:

Get in Virtual Space when dubbing sign on, play obtain in advance to audio-video it is corresponding nobody Sound video；

During playing the no voice video, when getting that the no voice video is corresponding to dub audio, play It is described to dub audio.

As an implementation, it is described get in Virtual Space when dubbing sign on, play in advance obtain To with audio-video corresponding no voice video the step of, comprising:

Receive in the Virtual Space that server is sent when matching audio-video and sign on, play received wait match The corresponding no voice video of audio-video.

As an implementation, the acquisition modes of the no voice video, comprising:

According to a fourth aspect of embodiments of the present disclosure, a kind of processing unit of audio-video is provided, server is applied to, it is described Device includes:

It dubs instruction first and dubs instruction acquisition module, be configured as executing the first electronic equipment hair in acquisition Virtual Space Out dub instruction, wherein first electronic equipment be in the Virtual Space be broadcast live permission electronic equipment；

It is default to dub that type first is default to dub determination type module, be configured as executing determine described in dub instruction and correspond to Default dub type；

It determines to, wait match audio-video determining module, be configured as executing with audio-video determining module first wait match audio-video；

Without voice video first without voice video playback module, it is configured as executing acquisition the first electronic equipment sending When dubbing sign on, according to it is described it is default dub type play it is described wait match the corresponding no voice video of audio-video；

It dubs audio first and dubs audio sending module, be configured as executing during playing the no voice video, It obtains that the no voice video is corresponding to dub audio, while the audio of dubbing is sent to the second electronic equipment, wherein institute Stating the second electronic equipment is the electronic equipment with the viewing live streaming permission in the Virtual Space.

Described first includes: without voice video playback module

First without voice video playing submodule, is configured as executing control first electronic equipment and its described second Electronic equipment plays described wait match the corresponding no voice video of audio-video simultaneously.

Described first includes: without voice video playback module

Battle sequence determines submodule, is configured as executing the corresponding battle of corresponding first electronic equipment of determining each main broadcaster Sequentially；

Second without voice video playing submodule, is configured as executing according to the battle sequence, controls first electricity Sub- equipment and its corresponding second electronic equipment play in order described wait match the corresponding no voice video of audio-video.

Described first includes: without voice video playback module

Third is configured as executing in the control Virtual Space in instant messaging region without voice video playing submodule Corresponding each second electronic equipment of user, while playing described wait match the corresponding no voice video of audio-video.

As an implementation, the third includes: without voice video playing submodule

First without voice video playback unit, is configured as executing and disappears obtaining the broadcast that first electronic equipment is sent When breath, send it is described to audio-video and sign on into the Virtual Space user corresponding each the in instant messaging region Two electronic equipments, so that each second electronic equipment is regarded when receiving the sign on, while described in broadcasting wait dub Frequently corresponding no voice video.

As an implementation, described first to include: with audio-video determining module

First video acquisition submodule is configured as executing the video for obtaining the first electronic equipment upload；

First to determine submodule with audio-video, is configured as executing the video of the upload is determined as wait dub view Frequently.

As an implementation, the audio-video processing unit further includes first without voice video determining module；

Described first includes: without voice video determining module

First amplitude, which is composed, determines submodule, is configured as executing the determining corresponding width of audio signal wait match audio-video Value spectrum；

The first aural masking matrix determines submodule, is configured as executing inputting the amplitude spectrum and trains completion in advance Network model obtains described wait match the corresponding voice exposure mask matrix of audio-video, wherein the network model based on obtaining in advance Amplitude spectrum sample and its training of corresponding voice exposure mask matrix obtain, and the network model includes amplitude spectrum and voice exposure mask matrix Corresponding relationship；

First unmanned acoustic amplitude, which is composed, determines submodule, is configured as execution and utilizes the voice exposure mask matrix and the amplitude Unmanned acoustic amplitude spectrum is calculated in spectrum；

First determines submodule without voice video, is configured as executing determining described wait match based on the unmanned acoustic amplitude spectrum The corresponding no voice video of audio-video.

As an implementation, the audio-video processing unit further includes second without voice video determining module；

Described second includes: without voice video determining module

Second amplitude spectrum determines submodule, is configured as executing the determining corresponding width of audio signal wait match audio-video Value spectrum；

First unmanned sound audio determines submodule, is configured as executing the net that the amplitude spectrum is inputted to training completion in advance Network model obtains described wait match the corresponding unmanned sound audio of audio-video, wherein the network model is based on the amplitude obtained in advance Spectrum sample and its corresponding unmanned sound audio training obtain, and the network model, which includes that amplitude spectrum is corresponding with unmanned sound audio, to close System；

Second determines submodule without voice video, is configured as executing determining described wait dub based on the unmanned sound audio The corresponding no voice video of video.

According to a fifth aspect of the embodiments of the present disclosure, a kind of processing unit of audio-video is provided, is set applied to the first electronics It is standby, wherein first electronic equipment is the electronic equipment with the live streaming permission in Virtual Space, and described device includes:

Second dubs instruction acquisition module, is configured as executing obtaining and dubs instruction in the Virtual Space；

Second it is default dub determination type module, be configured as executing determine described in dub that instruction is corresponding default to dub class Type；

Second wait match audio-video determining module, be configured as executing determining wait match audio-video；

Second without voice video playback module, is configured as executing when sign on is dubbed in acquisition, according to described default It is described wait match the corresponding no voice video of audio-video to dub type broadcasting；

Second dubs audio sending module, is configured as executing during playing the no voice video, described in acquisition No voice video is corresponding to dub audio, while the audio of dubbing is sent to server.

Described second includes: without voice video playback module

4th without voice video playing submodule, is configured as executing described in broadcasting to the corresponding no voice view of audio-video Frequently, it and controls described in the broadcasting simultaneously of the second electronic equipment wait match the corresponding no voice video of audio-video, wherein second electronics Equipment is the electronic equipment with the viewing live streaming permission in the Virtual Space.

Described second includes: without voice video playback module

5th without voice video playing submodule, is configured as executing according to the battle sequence, controls first electricity Sub- equipment and its corresponding second electronic equipment play in order described wait match the corresponding no voice video of audio-video.

Described second includes: without voice video playback module

6th without voice video playing submodule, is configured as executing in the control Virtual Space in instant messaging region Corresponding each second electronic equipment of user, while playing described wait match the corresponding no voice video of audio-video.

As an implementation, the described 6th includes: without voice video playing submodule

Second without voice video playback unit, is configured as executing the broadcast message sent to the server, so that institute State server send it is described to which with audio-video and sign on, into the Virtual Space, user is corresponding in instant messaging region Each second electronic equipment so that each second electronic equipment is when receiving the sign on, while playing described wait match The corresponding no voice video of audio-video.

As an implementation, described second to include: with audio-video determining module

Second video acquisition submodule is configured as executing the video for obtaining user's upload；

Second to determine submodule with audio-video, is configured as executing the video of the upload is determined as wait dub view Frequently.

As an implementation, the audio-video processing unit further includes third without voice video determining module；

The third includes: without voice video determining module

Third amplitude spectrum determines submodule, is configured as executing the determining corresponding width of audio signal wait match audio-video Value spectrum；

Second voice exposure mask matrix determines submodule, is configured as executing inputting the amplitude spectrum and trains completion in advance Network model obtains described wait match the corresponding voice exposure mask matrix of audio-video, wherein the network model based on obtaining in advance Amplitude spectrum sample and its training of corresponding voice exposure mask matrix obtain, and the network model includes amplitude spectrum and voice exposure mask matrix Corresponding relationship；

Second unmanned acoustic amplitude, which is composed, determines submodule, is configured as execution and utilizes the voice exposure mask matrix and the amplitude Unmanned acoustic amplitude spectrum is calculated in spectrum；

Third determines submodule without voice video, is configured as executing determining described wait match based on the unmanned acoustic amplitude spectrum The corresponding no voice video of audio-video.

As an implementation, the audio-video processing unit further includes the 4th without voice video determining module；

Described 4th includes: without voice video determining module

4th amplitude spectrum determines submodule, is configured as executing the determining corresponding width of audio signal wait match audio-video Value spectrum；

Second unmanned sound audio determines submodule, is configured as executing the net that the amplitude spectrum is inputted to training completion in advance Network model obtains described wait match the corresponding unmanned sound audio of audio-video, wherein the network model is based on the amplitude obtained in advance Spectrum sample and its corresponding unmanned sound audio training obtain, and the network model, which includes that amplitude spectrum is corresponding with unmanned sound audio, to close System；

4th determines submodule without voice video, is configured as executing determining described wait dub based on the unmanned sound audio The corresponding no voice video of video.

According to a sixth aspect of an embodiment of the present disclosure, a kind of processing unit of audio-video is provided, is set applied to the second electronics It is standby, wherein second electronic equipment is the electronic equipment with the viewing live streaming permission in the Virtual Space, described device Include:

Third is configured as executing and dubs sign on getting in Virtual Space without voice video playback module When, play obtain in advance wait match the corresponding no voice video of audio-video；

Audio playing module is dubbed, is configured as executing during playing the no voice video, gets the nothing Voice video is corresponding when dubbing audio, dubs audio described in broadcasting.

As an implementation, the third includes: without voice video playback module

7th without voice video playing submodule, be configured as executing receive in the Virtual Space that server is sent wait match When audio-video and sign on, play received wait match the corresponding no voice video of audio-video.

As an implementation, the audio-video processing unit further includes the 5th without voice video determining module；

Described 5th includes: without voice video determining module

5th amplitude spectrum determines submodule, is configured as executing the determining corresponding width of audio signal wait match audio-video Value spectrum；

Third party's aural masking matrix determines submodule, is configured as executing inputting the amplitude spectrum and trains completion in advance Network model obtains described wait match the corresponding voice exposure mask matrix of audio-video, wherein the network model based on obtaining in advance Amplitude spectrum sample and its training of corresponding voice exposure mask matrix obtain, and the network model includes amplitude spectrum and voice exposure mask matrix Corresponding relationship；

The unmanned acoustic amplitude of third, which is composed, determines submodule, is configured as execution and utilizes the voice exposure mask matrix and the amplitude Unmanned acoustic amplitude spectrum is calculated in spectrum；

5th determines submodule without voice video, is configured as executing determining described wait match based on the unmanned acoustic amplitude spectrum The corresponding no voice video of audio-video.

Described 5th includes: without voice video determining module

6th amplitude spectrum determines submodule, is configured as executing the determining corresponding width of audio signal wait match audio-video Value spectrum；

The unmanned sound audio of third determines submodule, is configured as executing the net that the amplitude spectrum is inputted to training completion in advance Network model obtains described wait match the corresponding unmanned sound audio of audio-video, wherein the network model is based on the amplitude obtained in advance Spectrum sample and its corresponding unmanned sound audio training obtain, and the network model, which includes that amplitude spectrum is corresponding with unmanned sound audio, to close System；

6th determines submodule without voice video, is configured as executing determining described wait dub based on the unmanned sound audio The corresponding no voice video of video.

According to the 7th of the embodiment of the present disclosure the aspect, a kind of server is provided, comprising:

Processor；

For storing the memory of the processor-executable instruction；

Wherein, the processor is configured to described instruction is executed, to realize audio-video described in above-mentioned first aspect Processing method.

According to the eighth aspect of the embodiment of the present disclosure, a kind of electronic equipment is provided, comprising:

Processor；

For storing the memory of the processor-executable instruction；

Wherein, the processor is configured to described instruction is executed, to realize described in above-mentioned second aspect or the third aspect Audio-video processing method.

According to the 9th of embodiment of the present disclosure aspect, provide a kind of storage medium, when the instruction in the storage medium by When the processor of electronic equipment executes, so that electronic equipment is able to carry out the processing side of audio-video described in any of the above-described aspect Method.

In scheme provided by the embodiment of the present disclosure, the first electronic equipment is issued in the available Virtual Space of server Instruction is dubbed, determines that dubbing corresponding preset of instruction dubs type, then determines wait match audio-video, and then obtaining the first electronics When dubbing sign on of equipment sending plays according to default type of dubbing to broadcast with the corresponding no voice video of audio-video During putting no voice video, obtains and dub audio without voice video is corresponding, while audio will be dubbed and be sent to the second electronics Equipment.Wherein, the first electronic equipment be in Virtual Space be broadcast live permission electronic equipment, the second electronic equipment be with The electronic equipment of viewing live streaming permission in Virtual Space.Use this programme user can Virtual Space in a manner of dubbing into Row interaction, increases the diversity of interaction mode, user experience is improved.It should be understood that above general description is with after Text datail description be only it is exemplary and explanatory, do not limit the disclosure.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure, do not constitute the improper restriction to the disclosure.

Fig. 1 is the flow chart of the first audio/video processing method shown according to an exemplary embodiment；

Fig. 2 is a kind of schematic diagram for dubbing button shown according to an exemplary embodiment；

Fig. 3 is the first flow chart of step S104 in embodiment illustrated in fig. 1 shown according to an exemplary embodiment；

Fig. 4 is the first flow chart of the acquisition modes of no voice video shown according to an exemplary embodiment；

Fig. 5 is second of flow chart of the acquisition modes of no voice video shown according to an exemplary embodiment；

Fig. 6 is the flow chart of second of audio/video processing method shown according to an exemplary embodiment；

Fig. 7 is the flow chart of the third audio/video processing method shown according to an exemplary embodiment；

Fig. 8 is the structural block diagram of the first audio-video processing unit shown according to an exemplary embodiment；

Fig. 9 is the structural block diagram of second of audio-video processing unit shown according to an exemplary embodiment；

Figure 10 is the structural block diagram of the third audio-video processing unit shown according to an exemplary embodiment；

Figure 11 is the structural block diagram of a kind of electronic equipment shown according to an exemplary embodiment.

Specific embodiment

In order to make ordinary people in the field more fully understand the technical solution of the disclosure, below in conjunction with attached drawing, to this public affairs The technical solution opened in embodiment is clearly and completely described.

It should be noted that the specification and claims of the disclosure and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to embodiment of the disclosure described herein can in addition to illustrating herein or Sequence other than those of description is implemented.Embodiment described in following exemplary embodiment does not represent and disclosure phase Consistent all embodiments.On the contrary, they are only and as detailed in the attached claim, the disclosure some aspects The example of consistent device and method.

In order to enrich the interaction mode in Virtual Space, user experience is improved, the embodiment of the present disclosure provides a kind of sound view Frequency processing method, device, server, electronic equipment and computer readable storage medium.

The first audio/video processing method provided by the embodiment of the present disclosure is introduced first below.The disclosure is implemented The first audio/video processing method provided by example can be applied to the server of live streaming application program.

As shown in Figure 1, a kind of processing method of audio-video, is applied to server, which comprises

In step s101, obtain Virtual Space in the first electronic equipment issue dub instruction；

Wherein, first electronic equipment is with the electronic equipment that permission is broadcast live in the Virtual Space.

In step s 102, corresponding preset of instruction is dubbed described in determining dubs type；

In step s 103, it determines wait match audio-video；

In step S104, when dubbing sign on of the first electronic equipment sending is obtained, according to the pre- establishing Sound type plays described wait match the corresponding no voice video of audio-video；

In step s105, during playing the no voice video, the acquisition no voice video is corresponding to be dubbed Audio, while the audio of dubbing is sent to the second electronic equipment.

Wherein, second electronic equipment is the electronic equipment with the viewing live streaming permission in the Virtual Space.

As it can be seen that in scheme provided by the embodiment of the present disclosure, the first electronic equipment in the available Virtual Space of server What is issued dubs instruction, determine dub instruction it is corresponding it is default dub type, then determine wait match audio-video, and then in acquisition the When dubbing sign on of one electronic equipment sending plays according to default type of dubbing to regard with the corresponding no voice of audio-video Frequently, it during playing without voice video, obtains and dubs audio without voice video is corresponding, while audio will be dubbed and be sent to the Two electronic equipments.Wherein, the first electronic equipment is with the electronic equipment that permission is broadcast live in Virtual Space, the second electronic equipment For the electronic equipment with the viewing live streaming permission in Virtual Space.Use this programme user can be in Virtual Space to dub Mode interacts, and increases the diversity of interaction mode, and user experience is improved.

Above-mentioned first electronic equipment is the electronic equipment with the live streaming permission in Virtual Space, and main broadcaster can use first Electronic equipment is broadcast live.During main broadcaster is broadcast live, can by the way of dubbing with spectators or other main broadcasters into Row interaction, at this point, main broadcaster, which can issue, dubs instruction.It, can in the live streaming interface of the first electronic equipment for the ease of user's operation To provide user interface, for example, as shown in Fig. 2, the live streaming interface of the first electronic equipment can show " object for appreciation is dubbed " button 201, Main broadcaster can click the button 201 and dub instruction to issue.

In turn, in above-mentioned steps S101, what server can obtain that the first electronic equipment in Virtual Space issues matches Sound instruction shows that main broadcaster needs to interact by the way of dubbing with spectators or other main broadcasters at this time.Due in Virtual Space The mode of dubbing can there are many, so at this time server can determine it is acquired dub instruction it is corresponding it is default dub type, Namely execute step S102.

In one embodiment, different user interfaces can be provided in the live streaming interface of the first electronic equipment, respectively It is corresponding it is different it is default dub type, user dubs instruction by which user interface sending, can determine that this dubs instruction Corresponding preset kind is that corresponding preset of the user interface dubs type.

Wherein, it presets and dubs type and can be arranged according to user demand, for example, can be dubbed for one people of main broadcaster performance, It can carry out dubbing battle for multiple main broadcasters, one section of completion can also be cooperated to dub for main broadcaster and spectators, do not done herein specific It limits.

After what the first electronic equipment issued in obtaining Virtual Space dubs instruction, server can execute above-mentioned steps S103, that is, determine wait match audio-video.It is used as in order to facilitate the video that user selects to be suitble to oneself to need wait match audio-video, the Selection panel of videos can be shown in the live streaming interface of one electronic equipment, wherein may include the video of main broadcaster's downloading, on network Compare popular video, recommend the more suitable video of user etc., it is not specifically limited herein.Main broadcaster can choose wherein one A video, server are also assured that the video is wait match audio-video.

The content wait match audio-video is familiar in order to facilitate user, so that dubbed effect is more preferable, the first electronic equipment can be broadcast It puts wait match audio-video for main broadcaster's viewing, meanwhile, server can control each second electronic equipment and be played simultaneously wait dub view Frequently, for each spectators viewing.Wherein, the second electronic equipment is the electronic equipment with the viewing live streaming permission in Virtual Space, Spectators can use the live streaming of the second electronic equipment viewing main broadcaster.

Next, illustrating that user needs to start to match obtaining when dubbing sign on of the first electronic equipment sending Sound, then default can dub type according to above-mentioned and play wait match the corresponding no voice video of audio-video.In order to facilitate user It operates, respective user interfaces can be provided in the live streaming interface of the first electronic equipment, for example, the live streaming interface of the first electronic equipment It can show " starting to dub " button, main broadcaster, which clicks the button to issue, dubs sign on.

Obtain when dubbing sign on of the first electronic equipment sending, server can dub Type Control the according to default The first electronic equipment that one electronic equipment, the second electronic equipment and other main broadcasters use starts to play to corresponding with audio-video Without voice video.Wherein, no voice video is the video for removing voice and only retaining background music.

As an implementation, no voice video can be to be pre-stored within server or electronics that each user uses Equipment local, when no voice video is stored in server, server can will be sent to each user without voice video and use Electronic equipment so that the electronic equipment that uses of each user is played without voice video.As another embodiment, it determines wait match After audio-video, server can be treated and be handled with audio-video, obtain to the corresponding no voice video of audio-video in case With this is all reasonable.

During playing without voice video, the available no voice video of server is corresponding to dub audio, simultaneously will It dubs audio and is sent to the second electronic equipment, that is, execute above-mentioned steps S105, so that performance is dubbed in spectators' viewing.It is playing During voice video, main broadcaster and/or spectators and/or other main broadcasters can issue audio signal to carry out role in video It dubs, corresponding user terminal electronic equipment can collect the audio signal of user's sending at this time, that is, dub audio, into And it is sent to server.

What server can also receive that each user terminal electronic equipment sends dubs audio, and then will dub audio transmission To each second electronic equipment.Each second electronic equipment is playing wait match the corresponding no voice video of audio-video at this time, this Sample is dubbed audio and is played together with no voice video, and spectators, which can watch, dubs performance.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default to dub type can be main broadcaster's show type.? That is only one people of main broadcaster dubs in dubbing process, spectators viewing main broadcaster's dubs performance.

It is above-mentioned to be played according to the default type of dubbing for presetting and dubbing the case where type is main broadcaster's show type It is described to audio-video corresponding no voice video the step of, may include:

It controls first electronic equipment and its second electronic equipment while playing described to corresponding with audio-video Without voice video.

Since in this case, main broadcaster dubs, spectators viewing main broadcaster's dubs performance, then server can be controlled Make the first electronic equipment and its corresponding second electronic equipment and meanwhile play it is above-mentioned to the corresponding no voice video of audio-video, this Sample, when main broadcaster dubs, server will dub audio and be sent to each second electronic equipment, and each second electronic equipment can be While broadcasting to no voice video corresponding with audio-video, audio is dubbed in broadcasting, and spectators can watch dubbing for main broadcaster Performance.

As it can be seen that in the present embodiment, main broadcaster can carry out dubbing performance, to be interacted with spectators, can be enhanced virtual The interactivity and interest in space improve user experience.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default type of dubbing can fight type for more main broadcasters. That is, multiple main broadcasters can be respectively to dub with audio-video, spectators can watch dubbing between multiple main broadcasters Battle performance.

It in one embodiment, can be in the second-level menu of intelligence and art battle function in the live streaming interface of the first electronic equipment Show " object for appreciation is dubbed " button, main broadcaster, which clicks, is somebody's turn to do " object for appreciation is dubbed " button, can determine that the main broadcaster wants the more main broadcaster's battles of progress and matches Sound.

Server can be by the corresponding each first electronic equipment progress of the main broadcaster for currently selecting more main broadcaster's battles to dub Match, as the first electronic equipment for dubbing of battle will be carried out.It is above-mentioned to be selected by wherein any one main broadcaster wait match audio-video, It can certainly be determined according to other rules, for example, can be selected for the least main broadcaster of spectator attendance in Virtual Space, with Increase the popularity etc. of the main broadcaster.

For presetting and dubbing the case where type is main broadcaster's show type, as shown in figure 3, above-mentioned according to the pre- establishing The step of described in the broadcasting of sound type to no voice video corresponding with audio-video, may include:

S301 determines the corresponding battle sequence of corresponding first electronic equipment of each main broadcaster；

Due to needing to carry out dubbing battle there is currently multiple main broadcasters, in order to guarantee that the sense of battle is dubbed in spectators' viewing By each main broadcaster needs to carry out one by one to dub performance, so server can determine that corresponding first electronic equipment of each main broadcaster is corresponding Battle sequence.

In one embodiment, server can determine the corresponding battle sequence of above-mentioned each first electronic equipment at random, And inform the corresponding battle sequence of each first electronic equipment.In another embodiment, it can be determined by one of main broadcaster The corresponding battle sequence of each first electronic equipment.In another embodiment, it can be decided through consultation in such a way that each main broadcaster is by even wheat Battle sequence, this is all reasonable.

S302 controls first electronic equipment and its corresponding second electronic equipment successively according to the battle sequence It plays described wait match the corresponding no voice video of audio-video.

After above-mentioned battle sequence has been determined, each main broadcaster can start to dub battle, that is to say, that according to battle sequence For to dub with audio-video, to the last a main broadcaster dubs completion since first main broadcaster.In the process, it services Device can control each first electronic equipment and its corresponding second electronic equipment is playd in order wait match the corresponding no voice of audio-video Video, main broadcaster can dub, and what spectators can watch each main broadcaster dubs performance.

When each main broadcaster dubs, corresponding first electronic equipment can acquire the voice letter of main broadcaster sending Number, and then it is sent to server, server can be sent to other first electronic equipments as audio is dubbed and own Corresponding second electronic equipment of first electronic equipment, each main broadcaster and spectators, which can watch, dubs battle performance.

As it can be seen that can carry out dubbing battle performance in the present embodiment, between multiple main broadcasters, with other main broadcasters and sight Crowd interacts, and can further enhance the interactivity and interest of Virtual Space, further increase user experience.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default type of dubbing can dub type for more people.? That is main broadcaster and spectators can be respectively to be completed jointly to dub with the different role in audio-video wait match audio-video Dub.The general spectators are user in instant messaging region in Virtual Space.For example, can be in chatroom in direct broadcasting room User.

In this case, to be selected by main broadcaster according to the number for participating in dubbing with audio-video.It can also be by servicing For device according to the quantity of user recommends suitable video in instant messaging region in Virtual Space, this is all reasonably, not do herein It is specific to limit.It is dubbed for convenience, main broadcaster and the user for participating in dubbing can decide through consultation role in instant communications zone Distribution.

It is above-mentioned to be played according to the default type of dubbing for presetting and dubbing the case where type dubs type for more people It is described to audio-video corresponding no voice video the step of, may include:

In order to guarantee that user can smoothly complete to dubbing with audio-video, the first electricity in main broadcaster and instant messaging region The corresponding each second electronic equipment needs of user are played simultaneously to corresponding with audio-video in sub- equipment and instant messaging region Without voice video, in this way, main broadcaster and each user can just smoothly complete and dub interaction.

As it can be seen that in the present embodiment, performance is dubbed in user's completion that can cooperate in main broadcaster and instant messaging region, Interacting between main broadcaster and spectators is stronger, and spectators' sense of participation enhancing can further enhance the interactivity and entertaining of Virtual Space Property, further increase user experience.

As a kind of embodiment of the embodiment of the present disclosure, used in instant messaging region in the above-mentioned control Virtual Space Corresponding each second electronic equipment in family, at the same play it is described to no voice video corresponding with audio-video the step of, may include:

In order to which corresponding each second electronic equipment of user for guaranteeing that the first electronic equipment and participation are dubbed can be broadcast simultaneously No voice video is put, the first electronic equipment can issue by way of sending broadcast message starts to dub instruction, and server exists When obtaining the broadcast message of the first electronic equipment transmission, just send it is above-mentioned to audio-video and sign on into Virtual Space i.e. When communications zone in corresponding each second electronic equipment of user.

In this way, just starting to play wait match the corresponding no voice of audio-video when each second electronic equipment receives sign on Video guarantees that each second electronic equipment starts to play without voice video in synchronization.

In dubbing process, in order to guarantee that each user terminal plays dub audio be it is synchronous, in one embodiment, Voice transfer can be carried out by the way of connecting wheat in real time.For example, voice signal can be adopted using 20 milliseconds of time intervals Collection, coded data can be transmitted by udp (User Datagram Protocol, User Datagram Protocol) data packet, be passed through FEC (Forward Error Correction, forward error correction) mode handles Network Packet Loss, after receiving end receives rear data packet, It can carry out data packet sequencing by sequence number to restore the data packet of loss by PLC, in this manner it is ensured that hair The data packet of sending end can be transferred to receiving end in 400 milliseconds.Also ensure that each user terminal dubs sound being played simultaneously Frequently.

As it can be seen that in the present embodiment, server can when obtaining the broadcast message that the first electronic equipment is sent, send to With audio-video and sign on into Virtual Space corresponding each second electronic equipment of user in instant messaging region so that each Two electronic equipments are played when receiving sign on to guarantee to be played simultaneously with the corresponding no voice video of audio-video Without voice video, it is ensured that dubbing can go on smoothly.

As a kind of embodiment of the embodiment of the present disclosure, the step of above-mentioned determination is wait match audio-video, may include:

Obtain the video that first electronic equipment uploads；The video of the upload is determined as wait match audio-video.

Determining that main broadcaster can choose the video oneself liked, will thereon by the first electronic equipment when matching audio-video Server is reached, server can also obtain the video of the first electronic equipment upload, and in turn, server can be by the first electronics The video that equipment uploads is determined as wait match audio-video.

Server can also carry out subtitle recognition to the video that the first electronic equipment uploads, and obtain recognition result, and will know Other result is added to the video of upload, to facilitate each user to check subtitle when dubbing.For the concrete mode of subtitle recognition, originally Open embodiment is not specifically limited again, as long as the subtitle of video can be identified.First electronic equipment can also incite somebody to action The video of main broadcaster's selection is stored in local, and the video uploaded can be used in each live streaming.

As it can be seen that in the present embodiment, the video that available first electronic equipment of server uploads, and then by the upload Video is determined as wait match audio-video.In this way, can satisfy the demand of main broadcaster, user experience is further increased.

As a kind of embodiment of the embodiment of the present disclosure, as shown in figure 4, the acquisition modes of above-mentioned no voice video, it can To include:

S401 determines the corresponding amplitude spectrum of audio signal wait match audio-video；

In order to be handled with audio-video band, its corresponding no voice video is obtained, it is necessary first to determine wait dub view The corresponding amplitude spectrum of the audio signal of frequency.Specifically, the audio signal with audio-video can be treated and carry out sub-frame processing, obtained Every frame audio signal, and then every frame audio signal is transformed into frequency-region signal, obtain the amplitude spectrum of its every frame audio signal.

For example, being 16KHz to the audio signal with audio-video, monophonic, 16 audio signals quantified, then can be first Framing operation is carried out to the audio signal, frame length is 512 sampled points, and it is 256 sampled points that frame, which moves, obtains every frame audio letter Number, in turn, to every frame audio signal carry out Short Time Fourier Transform, can obtain the corresponding phase spectrum of every frame audio signal and Amplitude spectrum.

The amplitude spectrum is inputted the network model that training is completed in advance by S402, is obtained described to corresponding with audio-video Voice exposure mask matrix；

Next, server can be by the corresponding amplitude spectrum input network mould that training is completed in advance of every frame audio signal Type.Wherein, which can be obtained based on the amplitude spectrum sample obtained in advance and its training of corresponding voice exposure mask matrix, It may include the corresponding relationship of amplitude spectrum Yu voice exposure mask matrix.Therefore, which can be according to amplitude spectrum and voice The corresponding relationship of exposure mask matrix determines voice exposure mask matrix corresponding to the corresponding amplitude spectrum of every frame audio signal.

Wherein, voice exposure mask matrix is the exposure mask matrix that can remove voice, each element in voice exposure mask matrix Value is 0~1, indicates to indicate more to have kept off voice closer to 1, can pass through setting in this way closer to there is voice closer to 0 The element for being lower than threshold value in voice exposure mask matrix is all set to 0, indicates that its corresponding audio signal parts is someone by threshold value Sound.

Above-mentioned network model can be convolutional neural networks, Recognition with Recurrent Neural Network even depth learning network model, again not It is specifically limited.

Unmanned acoustic amplitude spectrum is calculated using the voice exposure mask matrix and the amplitude spectrum in S403；

Next, the above-mentioned amplitude spectrum by voice exposure mask matrix and every frame audio signal can be done dot product by server, i.e., The amplitude spectrum of signal after available separation, it is to be understood that signal is unmanned sound audio after separation.

S404 is determined described wait match the corresponding no voice video of audio-video based on the unmanned acoustic amplitude spectrum.

After being separated after the amplitude spectrum of signal, server can be by the amplitude spectrum of signal after separation and above-mentioned phase spectrum knot It closes, then transforms it into time-domain signal, the time-domain signal of separation signal, that is, unmanned sound audio can be obtained.

In turn, which combines with to the image section with audio-video, can obtain wait match audio-video pair Answer without voice video.

As it can be seen that in the present embodiment, server can use the network model of training completion in advance and obtain wait match audio-video pair Answer without voice video, can rapidly and accurately determine to further increase user's body with the corresponding no voice video of audio-video It tests.

As a kind of embodiment of the embodiment of the present disclosure, as shown in figure 5, the acquisition modes of above-mentioned no voice video, it can To include:

S501 determines the corresponding amplitude spectrum of audio signal wait match audio-video；

Step S501 is identical as above-mentioned steps S401, and related place may refer to the description and explanation of the part step S401, Details are not described herein.

The amplitude spectrum is inputted the network model that training is completed in advance by S502, is obtained described to corresponding with audio-video Unmanned sound audio；

Amplitude spectrum obtained in step S501 can be inputted the network model that training is completed in advance by server, wherein should Network model can be obtained based on the amplitude spectrum sample and its corresponding unmanned sound audio training obtained in advance, may include width The corresponding relationship of value spectrum and unmanned sound audio.Therefore, which can close according to amplitude spectrum is corresponding with unmanned sound audio System determines the corresponding unmanned sound audio of amplitude spectrum of input, and then outputs it.

Specifically, which can first determine voice exposure mask corresponding to the corresponding amplitude spectrum of every frame audio signal Then the amplitude spectrum of voice exposure mask matrix and every frame audio signal is done dot product by matrix, the amplitude spectrum of signal after being separated, then By the amplitude spectrum of signal after separation in conjunction with above-mentioned phase spectrum, then time-domain signal is transformed it into, separation signal can be obtained Time-domain signal, that is, unmanned sound audio.

Above-mentioned network model may be convolutional neural networks, Recognition with Recurrent Neural Network even depth learning network model, again It is not specifically limited.

S503 is determined described wait match the corresponding no voice video of audio-video based on the unmanned sound audio.

In turn, server can be by unmanned sound audio that upper network model exports and the image portion split-phase knot wait match audio-video It closes, can obtain wait match the corresponding no voice video of audio-video.

Above-mentioned no voice video can be removal voice, retain the video of background music, be also possible to voice and background sound Ledu removal, retains the video of some cadence informations, can also be the video of absolutely not sound, this is all reasonably, specifically It can be according to demand setting voice exposure mask matrix be dubbed, to reach corresponding effect.

As a kind of embodiment of the embodiment of the present disclosure, after obtaining above-mentioned unmanned sound audio, in the first embodiment party In formula, the above method can also include:

Determine the corresponding amplitude spectrum of unmanned sound audio；The amplitude spectrum is inputted into the network model that training is completed in advance, is obtained It is calculated to the corresponding musical instrument exposure mask matrix of the unmanned sound audio using the musical instrument exposure mask matrix and the amplitude spectrum Target musical instrument amplitude spectrum；The corresponding target instrumental audio of the unmanned sound audio is determined based on the target musical instrument amplitude spectrum.

Wherein, the network model is based on the amplitude spectrum sample and its corresponding musical instrument exposure mask matrix obtained in advance trained It arrives, the network model includes the corresponding relationship of amplitude spectrum Yu musical instrument exposure mask matrix.Musical instrument exposure mask matrix is that can remove it His audio signal, retains the matrix of certain instrumental audio signal.

Since the method for determination of target instrumental audio and the method for determination of the first above-mentioned unmanned sound audio are essentially identical, In This is repeated no more.

In the second embodiment, the above method can also include:

Determine the corresponding amplitude spectrum of unmanned sound audio；The amplitude spectrum is inputted into the network model that training is completed in advance, is obtained To the corresponding target instrumental audio of the unmanned sound audio.

Wherein, the network model is obtained based on the amplitude spectrum sample obtained in advance and its training of corresponding instrumental audio, The network model includes the corresponding relationship of amplitude spectrum and instrumental audio.Due to the method for determination of target instrumental audio and above-mentioned the The method of determination of two kinds of unmanned sound audios is essentially identical, and details are not described herein.

Above-mentioned target musical instrument can be set according to actual needs, for example, can be the musical instruments such as piano, guitar, drum.

As it can be seen that available various target instrumental audios, server can use other musical instruments in the way of above two Audio replaces the target instrumental audio in unmanned sound audio, and the cadence information of unmanned sound audio can also be determined according to instrumental audio Deng providing convenience for the multifarious mode of dubbing, further enhance the diversity for dubbing interaction, improve user experience.

As a kind of embodiment of the embodiment of the present disclosure, after the completion of dubbing, in the uploading instructions for receiving user's sending Afterwards, server can dub above-mentioned audio and be and to be distributed to live streaming software platform with audio-video without voice Video coding, with It is checked for user's downloading.

The embodiment of the present disclosure additionally provides the processing method of second of audio-video, second provided by the embodiment of the present disclosure Audio/video processing method can be applied to the first electronic equipment for being equipped with live streaming application program.

Wherein, the first electronic equipment is with the electronic equipment that permission is broadcast live in Virtual Space, and main broadcaster can be by the One electronic equipment is broadcast live.

As shown in fig. 6, a kind of processing method of audio-video, is applied to the first electronic equipment, wherein first electronics is set Standby is with the electronic equipment that permission is broadcast live in Virtual Space, which comprises

In step s 601, it obtains and dubs instruction in the Virtual Space；

In step S602, corresponding preset of instruction is dubbed described in determination and dubs type；

In step S603, determine wait match audio-video；

In step s 604, it when sign on is dubbed in acquisition, dubs described in type broadcasting according to described preset wait dub The corresponding no voice video of video；

In step s 605, during playing the no voice video, the acquisition no voice video is corresponding to be dubbed Audio, while the audio of dubbing is sent to server.

As it can be seen that in scheme provided by the embodiment of the present disclosure, the first electronic equipment available matching in Virtual Space Sound instruction determines that dubbing corresponding preset of instruction dubs type, then determines wait match audio-video, and then dub in acquisition and start to refer to When enabling, play according to default type of dubbing to be obtained during playing without voice video with the corresponding no voice video of audio-video It takes no voice video is corresponding to dub audio, while audio will be dubbed and be sent to server.It can be in void using this programme user Quasi- space is interacted in a manner of dubbing, and increases the diversity of interaction mode, user experience is improved.

During main broadcaster is broadcast live, it can be interacted by the way of dubbing with spectators or other main broadcasters, this When, main broadcaster can be issued by the first electronic equipment and dub instruction.In turn, in above-mentioned steps S101, the first electronic equipment is just In available Virtual Space main broadcaster issue dub instruction, show at this time main broadcaster need by the way of dubbing and spectators or Other main broadcasters interaction.Due to the mode of dubbing in Virtual Space can there are many, so the first electronic equipment can determine at this time Acquired corresponding preset of instruction of dubbing dubs type, that is, executes step S602.

After what main broadcaster issued in obtaining Virtual Space dubs instruction, the first electronic equipment can execute above-mentioned steps S603, that is, determine wait match audio-video.Next, illustrating main broadcaster's needs obtaining when dubbing sign on of main broadcaster's sending Start to be dubbed, then default can dub type according to above-mentioned and play wait match the corresponding no voice video of audio-video.

In turn, obtain that main broadcaster issues when dubbing sign on, the first electronic equipment can dub type and broadcast according to default It puts to audio-video, during playing this without voice video, obtains and dub audio without voice video is corresponding, while will dub Audio is sent to server.Server can be sent to the second electronic equipment for audio is dubbed and other main broadcasters use First electronic equipment.Wherein, no voice video is the video for removing voice and only retaining background music.Second electronic equipment is tool There is the electronic equipment of the viewing live streaming permission in Virtual Space.

It is dubbed since the first electronic equipment is determining and instructs the corresponding default mode for dubbing type, determines wait match audio-video Mode and acquisition can be determined with above-mentioned server respectively without the corresponding mode for dubbing audio of voice video dub instruction The corresponding default mode for dubbing type determines to the mode with audio-video and obtains and dub audio without voice video is corresponding Mode it is identical, so details are not described herein.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default to dub type can be main broadcaster's show type.

Correspondingly, above-mentioned play the step wait match the corresponding no voice video of audio-video according to the default type of dubbing Suddenly, may include:

Broadcasting is described wait match the corresponding no voice video of audio-video, and broadcasting is described wait match simultaneously for the second electronic equipment of control The corresponding no voice video of audio-video.

First electronic equipment plays when no voice video corresponding with audio-video, can send a request to server, with So that server is controlled the second electronic equipment while playing wait match the corresponding no voice video of audio-video.Guarantee that the spectators of main broadcaster can Watch main broadcaster's to dub performance simultaneously.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default type of dubbing can fight type for more main broadcasters.

Determine the corresponding battle sequence of corresponding first electronic equipment of each main broadcaster；Sequentially according to the battle, described in control First electronic equipment and its corresponding second electronic equipment play in order described wait match the corresponding no voice video of audio-video.

Due to needing to carry out dubbing battle there is currently multiple main broadcasters, in order to guarantee that the sense of battle is dubbed in spectators' viewing By each main broadcaster needs to carry out one by one to dub performance, so the first electronic equipment that above-mentioned main broadcaster uses can determine each main broadcaster couple The first electronic equipment answered corresponding battle sequence, and then according to battle sequence controls the first electronic equipment and its corresponding Second electronic equipment plays in order described wait match the corresponding no voice video of audio-video.

In one embodiment, the first electronic equipment, which can send to dub, handovers request to server, and server receives After dubbing switching request to this, each first electronic equipment can be controlled and its corresponding second electronic equipment is playd in order wait match The corresponding no voice video of audio-video, main broadcaster can dub, and what spectators can watch each main broadcaster dubs performance.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default type of dubbing can dub type for more people.

Obtain the video that user uploads；The video of the upload is determined as wait match audio-video.

Determining that main broadcaster can choose the video oneself liked and upload, the first electronic equipment when matching audio-video The video of user's upload can be obtained, in turn, the video that user uploads can be determined as wait dub view by the first electronic equipment Frequently.

As it can be seen that in the present embodiment, the video that the available user of the first electronic equipment uploads, and then by the view of the upload Frequency is determined as wait match audio-video.In this way, can satisfy the demand of main broadcaster, user experience is further increased.

As a kind of embodiment of the embodiment of the present disclosure, the acquisition modes of above-mentioned no voice video may include:

Determine the corresponding amplitude spectrum of audio signal wait match audio-video；The amplitude spectrum is inputted training in advance to complete Network model, obtain described to match the corresponding voice exposure mask matrix of audio-video, wherein the network model is based on acquisition in advance Amplitude spectrum sample and its corresponding voice exposure mask matrix training obtain, the network model includes amplitude spectrum and voice exposure mask square The corresponding relationship of battle array；Using the voice exposure mask matrix and the amplitude spectrum, unmanned acoustic amplitude spectrum is calculated；Based on the nothing Voice amplitude spectrum determines described wait match the corresponding no voice video of audio-video.

Determine the corresponding amplitude spectrum of audio signal wait match audio-video；The amplitude spectrum is inputted training in advance to complete Network model, obtain described wait match the corresponding unmanned sound audio of audio-video, wherein the network model based on obtaining in advance Amplitude spectrum sample and its corresponding unmanned sound audio training obtain, and the network model includes pair of amplitude spectrum Yu unmanned sound audio It should be related to；It is determined based on the unmanned sound audio described wait match the corresponding no voice video of audio-video.

The mode of mode and the acquisition of above-mentioned server without voice video without voice video is obtained due to the first electronic equipment It is identical, it may refer to the explanation that above-mentioned server obtains the mode part without voice video, details are not described herein.

The embodiment of the present disclosure additionally provides the processing method of the third audio-video, provided by the embodiment of the present disclosure the third Audio/video processing method can be applied to the second electronic equipment for being equipped with live streaming application program.

Wherein, the second electronic equipment is the electronic equipment with the viewing live streaming permission in Virtual Space, and spectators can lead to Cross the viewing live streaming of the second electronic equipment.

As shown in fig. 7, a kind of processing method of audio-video, is applied to the second electronic equipment, which comprises

In step s 701, get in Virtual Space when dubbing sign on, play obtain in advance wait dub The corresponding no voice video of video；

In step S702, during playing the no voice video, get that the no voice video is corresponding to match When sound audio, audio is dubbed described in broadcasting.

As it can be seen that the second electronic equipment can be in getting Virtual Space in scheme provided by the embodiment of the present disclosure When dubbing sign on, play obtain in advance to play without voice video process with the corresponding no voice video of audio-video In, when getting that no voice video is corresponding to dub audio, audio is dubbed in broadcasting.It can be in Virtual Space using this programme user It is interacted in a manner of dubbing, increases the diversity of interaction mode, user experience is improved.

Spectators can watch the live streaming of main broadcaster by above-mentioned second electronic equipment, and the second electronic equipment is getting virtual sky Between in when dubbing sign on, illustrate that main broadcaster at this time or other spectators will start to carry out dubbing performance, then second electricity Sub- equipment can play obtain in advance wait match the corresponding no voice video of audio-video.

Wherein, dubbing sign on can generate and send for server to the second electronic equipment, be also possible to first Electronic equipment is sent to server, and server is forwarded to the second electronic equipment, this is all reasonable.

It is determined after with audio-video in server or the first electronic equipment, this can be waited for being sent to second with audio-video Electronic equipment can also will be sent to the second electronic equipment to the mark with audio-video, and the second electronic equipment is also assured that The corresponding video of the mark is and then to obtain wait match the corresponding no voice video of audio-video wait match audio-video.

In above-mentioned steps S702, during playing without voice video, the second electronic equipment gets the no voice Video is corresponding when dubbing audio, can play and dub audio, spectators, which are also perceived by, dubs performance, wherein dubs What the second electronic equipment that audio can receive the first electronic equipment for server or other spectators use was sent dubs sound Frequently, and it is forwarded to the second electronic equipment.

When carrying out dubbing performance, what the available main broadcaster of the first electronic equipment issued dubs audio and is sent to main broadcaster To server.When carrying out dubbing performance, the available spectators of the second electronic equipment that other spectators use issue other spectators Dub audio and send it to server.

It is above-mentioned to dub sign on getting in Virtual Space as a kind of embodiment of the embodiment of the present disclosure When, play obtain in advance to audio-video corresponding no voice video the step of, may include:

The mode of mode and the acquisition of above-mentioned server without voice video without voice video is obtained due to the second electronic equipment It is identical, it may refer to the explanation that above-mentioned server obtains the mode part without voice video, details are not described herein.

Fig. 8 is the processing unit block diagram of the first audio-video shown according to an exemplary embodiment.

As shown in figure 8, a kind of processing unit of audio-video, is applied to server, described device includes:

First dubs instruction acquisition module 810, is configured as executing what the first electronic equipment in acquisition Virtual Space issued Dub instruction；

First it is default dub determination type module 820, be configured as executing determine described in dub instruction corresponding pre- establishing Sound type；

First wait match audio-video determining module 830, be configured as executing determining wait match audio-video；

First without voice video playback module 840, is configured as executing dubbing for acquisition the first electronic equipment sending When sign on, dub described in type broadcasting according to described preset wait match the corresponding no voice video of audio-video；

First dubs audio sending module 850, is configured as executing during playing the no voice video, obtains institute It states that no voice video is corresponding to dub audio, while the audio of dubbing is sent to the second electronic equipment.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default to dub type can be main broadcaster's show type；

Above-mentioned first may include: without voice video playback module 840

First (is not shown) without voice video playing submodule in Fig. 8, is configured as execution control first electronics and sets Standby and described second electronic equipment plays described wait match the corresponding no voice video of audio-video simultaneously.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default type of dubbing can fight type for more main broadcasters；

Above-mentioned first may include: without voice video playback module 840

Battle sequence determines submodule (being not shown in Fig. 8), is configured as executing corresponding first electronics of determining each main broadcaster The corresponding battle sequence of equipment；

Second (is not shown) without voice video playing submodule in Fig. 8, is configured as executing according to the battle sequence, control Make first electronic equipment and its corresponding second electronic equipment play in order it is described to the corresponding no voice view of audio-video Frequently.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default type of dubbing can dub type for more people；

Above-mentioned first may include: without voice video playback module 840

Third (is not shown) without voice video playing submodule in Fig. 8, is configured as executing in the control Virtual Space Corresponding each second electronic equipment of user in instant messaging region, while playing described to the corresponding no voice view of audio-video Frequently.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned third may include: without voice video playing submodule

First (is not shown) without voice video playback unit in Fig. 8, is configured as execution and sets in acquisition first electronics When the broadcast message that preparation is sent, send it is described to audio-video and sign on into the Virtual Space in instant messaging region Corresponding each second electronic equipment of user, so that each second electronic equipment is broadcast simultaneously when receiving the sign on It puts described wait match the corresponding no voice video of audio-video.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned first to may include: with audio-video determining module 830

First video acquisition submodule (being not shown in Fig. 8) is configured as executing acquisition the first electronic equipment upload Video；

First, to determine submodule (being not shown in Fig. 8) with audio-video, is configured as executing that the video of the upload is true It is set to wait match audio-video.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned audio-video processing unit can also include first without voice Video determining module (is not shown) in Fig. 8；

Above-mentioned first may include: without voice video determining module

First amplitude, which is composed, determines submodule (being not shown in Fig. 8), is configured as executing the determining sound wait match audio-video The corresponding amplitude spectrum of frequency signal；

The first aural masking matrix determines submodule (being not shown in Fig. 8), is configured as execution and inputs the amplitude spectrum The network model that training is completed in advance obtains described wait match the corresponding voice exposure mask matrix of audio-video；

Wherein, the network model is based on the amplitude spectrum sample and its corresponding voice exposure mask matrix obtained in advance trained It arrives, the network model includes the corresponding relationship of amplitude spectrum Yu voice exposure mask matrix.

First unmanned acoustic amplitude, which is composed, determines submodule (being not shown in Fig. 8), is configured as execution and utilizes the voice exposure mask Unmanned acoustic amplitude spectrum is calculated in matrix and the amplitude spectrum；

First determines submodule (being not shown in Fig. 8) without voice video, is configured as executing based on the unmanned acoustic amplitude Spectrum determines described wait match the corresponding no voice video of audio-video.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned apparatus can also determine mould without voice video including second Block (is not shown) in Fig. 8；

Above-mentioned second may include: without voice video determining module

Second amplitude spectrum determines submodule (being not shown in Fig. 8), is configured as executing the determining sound wait match audio-video The corresponding amplitude spectrum of frequency signal；

First unmanned sound audio determines submodule (being not shown in Fig. 8), is configured as executing that amplitude spectrum input is pre- The first network model that training is completed obtains described wait match the corresponding unmanned sound audio of audio-video；

Wherein, the network model is based on the amplitude spectrum sample and its corresponding unmanned sound audio obtained in advance trained It arrives, the network model includes the corresponding relationship of amplitude spectrum Yu unmanned sound audio.

Second determines submodule (being not shown in Fig. 8) without voice video, is configured as executing based on the unmanned sound audio It determines described wait match the corresponding no voice video of audio-video.

Fig. 9 is the processing unit block diagram of second of audio-video shown according to an exemplary embodiment.

As shown in figure 9, a kind of processing unit of audio-video, is applied to the first electronic equipment, wherein first electronics is set Standby is the electronic equipment with the live streaming permission in Virtual Space, and described device includes:

Second dubs instruction acquisition module 910, is configured as executing obtaining and dubs instruction in the Virtual Space；

Second it is default dub determination type module 920, be configured as executing determine described in dub instruction corresponding pre- establishing Sound type；

Second wait match audio-video determining module 930, be configured as executing determining wait match audio-video；

Second without voice video playback module 940, is configured as executing when sign on is dubbed in acquisition, according to described pre- Establishing sound type plays described wait match the corresponding no voice video of audio-video；

Second dubs audio sending module 950, is configured as executing during playing the no voice video, obtains institute It states that no voice video is corresponding to dub audio, while the audio of dubbing is sent to server.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default type of dubbing is main broadcaster's show type；

Above-mentioned second may include: without voice video playback module 940

4th (is not shown) without voice video playing submodule in Fig. 9, is configured as executing described in broadcasting wait match audio-video Corresponding no voice video, and control described in the broadcasting simultaneously of the second electronic equipment wait match the corresponding no voice video of audio-video, In, second electronic equipment is the electronic equipment with the viewing live streaming permission in the Virtual Space.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default type of dubbing is that more main broadcasters fight type；

Above-mentioned second may include: without voice video playback module 940

Battle sequence determines submodule (being not shown in Fig. 9), is configured as executing corresponding first electronics of determining each main broadcaster The corresponding battle sequence of equipment；

5th (is not shown) without voice video playing submodule in Fig. 9, is configured as executing according to the battle sequence, control Make first electronic equipment and its corresponding second electronic equipment play in order it is described to the corresponding no voice view of audio-video Frequently.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned default type of dubbing is that more people dub type；

Above-mentioned second may include: without voice video playback module 940

6th (is not shown) without voice video playing submodule in Fig. 9, is configured as executing in the control Virtual Space Corresponding each second electronic equipment of user in instant messaging region, while playing described to the corresponding no voice view of audio-video Frequently.

May include: without voice video playing submodule as a kind of embodiment of the embodiment of the present disclosure, the above-mentioned 6th

Second (is not shown) without voice video playback unit in Fig. 9, is configured as executing the broadcast message sent to described Server, so that server transmission is described wait match audio-video and sign on the instant messaging region into the Virtual Space Corresponding each second electronic equipment of middle user, so that each second electronic equipment is when receiving the sign on, simultaneously It plays described wait match the corresponding no voice video of audio-video.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned second to may include: with audio-video determining module 930

Second video acquisition submodule (being not shown in Fig. 9) is configured as executing the video for obtaining user's upload；

Second, to determine submodule (being not shown in Fig. 9) with audio-video, is configured as executing that the video of the upload is true It is set to wait match audio-video.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned audio-video processing unit can also include third without voice Video determining module (is not shown) in Fig. 9；

The third may include: without voice video determining module

Third amplitude spectrum determines submodule (being not shown in Fig. 9), is configured as executing the determining sound wait match audio-video The corresponding amplitude spectrum of frequency signal；

Second voice exposure mask matrix determines submodule (being not shown in Fig. 9), is configured as execution and inputs the amplitude spectrum The network model that training is completed in advance obtains described wait match the corresponding voice exposure mask matrix of audio-video；

Second unmanned acoustic amplitude, which is composed, determines submodule (being not shown in Fig. 9), is configured as execution and utilizes the voice exposure mask Unmanned acoustic amplitude spectrum is calculated in matrix and the amplitude spectrum；

Third determines submodule (being not shown in Fig. 9) without voice video, is configured as executing based on the unmanned acoustic amplitude Spectrum determines described wait match the corresponding no voice video of audio-video.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned audio-video processing unit can also include the 4th without voice Video determining module (is not shown) in Fig. 9；

Described 4th may include: without voice video determining module

4th amplitude spectrum determines submodule (being not shown in Fig. 9), is configured as executing the determining sound wait match audio-video The corresponding amplitude spectrum of frequency signal；

Second unmanned sound audio determines submodule (being not shown in Fig. 9), is configured as executing that amplitude spectrum input is pre- The first network model that training is completed obtains described wait match the corresponding unmanned sound audio of audio-video；

4th determines submodule (being not shown in Fig. 9) without voice video, is configured as executing based on the unmanned sound audio It determines described wait match the corresponding no voice video of audio-video.

Figure 10 is the processing unit block diagram of the third audio-video shown according to an exemplary embodiment.

As shown in Figure 10, a kind of processing unit of audio-video is applied to the second electronic equipment, wherein second electronics Equipment is the electronic equipment with the viewing live streaming permission in the Virtual Space, and described device includes:

Third is configured as executing and starts to refer to getting dubbing in Virtual Space without voice video playback module 1010 When enabling, play obtain in advance wait match the corresponding no voice video of audio-video；

Audio playing module 1020 is dubbed, is configured as executing during playing the no voice video, gets institute When stating that no voice video is corresponding to dub audio, audio is dubbed described in broadcasting.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned third can wrap without voice video playback module 1010 It includes:

7th (is not shown) without voice video playing submodule in Figure 10, is configured as executing the void for receiving server transmission In quasi- space when matching audio-video and sign on, play received wait match the corresponding no voice video of audio-video.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned audio-video processing unit can also include the 5th without voice Video determining module；

Described 5th may include: without voice video determining module

5th amplitude spectrum determines submodule (being not shown in Figure 10), is configured as executing the determining sound wait match audio-video The corresponding amplitude spectrum of frequency signal；

Third party's aural masking matrix determines submodule (being not shown in Figure 10), is configured as execution and inputs the amplitude spectrum The network model that training is completed in advance obtains described wait match the corresponding voice exposure mask matrix of audio-video；

The unmanned acoustic amplitude of third, which is composed, determines submodule (being not shown in Figure 10), is configured as execution and utilizes the voice exposure mask Unmanned acoustic amplitude spectrum is calculated in matrix and the amplitude spectrum；

5th determines submodule (being not shown in Figure 10) without voice video, is configured as executing based on the unmanned acoustic amplitude Spectrum determines described wait match the corresponding no voice video of audio-video.

As a kind of embodiment of the embodiment of the present disclosure, above-mentioned audio-video processing unit can also include the 5th without voice Video determining module (is not shown) in Figure 10；

Described 5th may include: without voice video determining module

6th amplitude spectrum determines submodule (being not shown in Figure 10), is configured as executing the determining sound wait match audio-video The corresponding amplitude spectrum of frequency signal；

The unmanned sound audio of third determines submodule (being not shown in Figure 10), is configured as executing that amplitude spectrum input is pre- The first network model that training is completed obtains described wait match the corresponding unmanned sound audio of audio-video；

6th determines submodule (being not shown in Figure 10) without voice video, is configured as executing based on the unmanned sound audio It determines described wait match the corresponding no voice video of audio-video.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

The embodiment of the present disclosure additionally provides a kind of electronic equipment, and as shown in figure 11, electronic equipment may include processor 1101, communication interface 1102, memory 1103 and communication bus 1104, wherein processor 1101, communication interface 1102, storage Device 1103 completes mutual communication by communication bus 1104,

Memory 1103, for storing computer program；

Processor 1101 when for executing the program stored on memory 1103, realizes any institute in above-described embodiment The audio/video processing method stated.Specifically, electronic equipment can be server, processor 1101, for executing memory When the program stored on 1103, the first audio/video processing method described in any of the above-described embodiment is realized.Electronic equipment can Think that above-mentioned first electronic equipment, processor 1101 when for executing the program stored on memory 1103, realize above-mentioned Second of audio/video processing method described in one embodiment.Electronic equipment can be above-mentioned second electronic equipment, processor 1101, When for executing the program stored on memory 1103, the third audio-video processing side described in any of the above-described embodiment is realized Method.

As it can be seen that using this programme user that can interact in a manner of dubbing in Virtual Space, increase interaction mode Diversity, user experience are improved.

The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.

Communication interface is for the communication between above-mentioned electronic equipment and other equipment.

Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.

Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.；It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.

The embodiment of the present disclosure additionally provides a kind of computer readable storage medium, when the instruction in the storage medium is by taking When the processor of business device executes, enable the server to execute any audio/video processing method in above-described embodiment.

The embodiment of the present disclosure additionally provides a kind of application product, and the application product for executing at runtime State any audio/video processing method in embodiment.

Those skilled in the art will readily occur to its of the disclosure after considering specification and practicing application disclosed herein Its embodiment.The disclosure is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by above Claim is pointed out.

It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

Claims

1. a kind of processing method of audio-video, which is characterized in that be applied to server, which comprises

What the first electronic equipment issued in acquisition Virtual Space dubs instruction, wherein first electronic equipment is in institute State the electronic equipment that permission is broadcast live in Virtual Space；

It determines wait match audio-video；

During playing the no voice video, the acquisition no voice video is corresponding to dub audio, while matching by described in Sound audio is sent to the second electronic equipment, wherein second electronic equipment is with the viewing live streaming in the Virtual Space The electronic equipment of permission.

2. the method as described in claim 1, which is characterized in that the default type of dubbing is main broadcaster's show type；

It is described to preset the step of dubbing described in type broadcasting to no voice video corresponding with audio-video according to described, comprising:

It controls first electronic equipment and second electronic equipment while playing described wait match the corresponding no voice of audio-video Video.

3. the method as described in claim 1, which is characterized in that the default type of dubbing is that more main broadcasters fight type；

According to battle sequence, control first electronic equipment and its corresponding second electronic equipment play in order it is described to With the corresponding no voice video of audio-video.

4. the method as described in claim 1, which is characterized in that the default type of dubbing is that more people dub type；

Corresponding each second electronic equipment of user in instant messaging region is controlled in the Virtual Space, while being played described wait match The corresponding no voice video of audio-video.

5. method as claimed in claim 4, which is characterized in that used in instant messaging region in the control Virtual Space Corresponding each second electronic equipment in family, at the same play it is described to no voice video corresponding with audio-video the step of, comprising:

When obtaining the broadcast message that first electronic equipment is sent, send it is described to audio-video and sign on to described Corresponding each second electronic equipment of user in instant messaging region in Virtual Space, so that each second electronic equipment is receiving When to the sign on, while playing described wait match the corresponding no voice video of audio-video.

6. the method according to claim 1 to 5, which is characterized in that the step of determination is wait match audio-video, comprising:

Obtain the video that first electronic equipment uploads；

The video of the upload is determined as wait match audio-video.

7. the method according to claim 1 to 5, which is characterized in that the acquisition modes of the no voice video, comprising:

The amplitude spectrum is inputted into the network model that training is completed in advance, is obtained described wait match the corresponding voice exposure mask square of audio-video Battle array, wherein the network model is obtained based on the amplitude spectrum sample obtained in advance and its training of corresponding voice exposure mask matrix, institute State the corresponding relationship that network model includes amplitude spectrum Yu voice exposure mask matrix；

8. the method according to claim 1 to 5, which is characterized in that the acquisition modes of the no voice video, comprising:

The amplitude spectrum is inputted into the network model that training is completed in advance, is obtained described wait match the corresponding unmanned sound of audio-video Frequently, wherein the network model is obtained based on the amplitude spectrum sample and its corresponding unmanned sound audio training obtained in advance, described Network model includes the corresponding relationship of amplitude spectrum Yu unmanned sound audio；

9. a kind of processing method of audio-video, which is characterized in that be applied to the first electronic equipment, wherein first electronics is set Standby is with the electronic equipment that permission is broadcast live in Virtual Space, which comprises

Acquisition dubs instruction in the Virtual Space；

It determines wait match audio-video；

When sign on is dubbed in acquisition, regarded described in type broadcasting wait match the corresponding no voice of audio-video according to default dub Frequently；

During playing the no voice video, the acquisition no voice video is corresponding to dub audio, while matching by described in Sound audio is sent to server.

10. a kind of processing method of audio-video, which is characterized in that be applied to the second electronic equipment, wherein second electronics is set Standby is the electronic equipment with the viewing live streaming permission in the Virtual Space, which comprises

Get in Virtual Space when dubbing sign on, play obtain in advance to the corresponding no voice view of audio-video Frequently；

During playing the no voice video, when getting that the no voice video is corresponding to dub audio, described in broadcasting Dub audio.