CN109410912A

CN109410912A - Method, apparatus, electronic equipment and the computer readable storage medium of audio processing

Info

Publication number: CN109410912A
Application number: CN201811400323.4A
Authority: CN
Inventors: 马永振; 朱旭光; 梅航; 叶希喆
Original assignee: Shenzhen Tencent Information Technology Co Ltd
Current assignee: Shenzhen Tencent Information Technology Co Ltd
Priority date: 2018-11-22
Filing date: 2018-11-22
Publication date: 2019-03-01
Anticipated expiration: 2038-11-22
Also published as: CN109410912B

Abstract

The embodiment of the present application provides method, apparatus, electronic equipment and the computer readable storage medium of a kind of audio processing, it is related to multimedia technology field, this method comprises: obtaining audio-frequency information to be processed and the audio-frequency information by number of people microphone records, then the audio-frequency information of preset kind is determined from audio-frequency information to be processed, and handle the audio-frequency information of preset kind by preset plug-in, it then will be by the audio-frequency informations of number of people microphone records and treated that audio-frequency information carries out sound stereo process.The embodiment of the present application can promote localization of sound sense and spatial impression, and then can promote audio experience of the user when watching video.

Description

Method, apparatus, electronic equipment and the computer readable storage medium of audio processing

Technical field

This application involves multimedia technology fields, specifically, this application involves a kind of method, apparatus of audio processing, Electronic equipment and computer readable storage medium.

Background technique

With the development of information technology, video field also further develops, for example, hand is swum computerized image CG, virtually showed Real VR game CG and dynamic caricature etc. are needed to synthesis in video to allow user preferably to experience video content Audio-frequency information in appearance performs corresponding processing, therefore how to handle audio-frequency information of the synthesis in video content, with Allow user to have better audio experience when watching video content, becomes a critical issue.

In the prior art, duplication Ambisonics is rung by clear stereo to the audio-frequency information synthesized with video content Mode handled, but due to Ambisonics technology itself be blurred auditory localization technological means, and by Poor limiting factor is positioned in sound far field, localization of sound sense and spatial impression performance are insufficient, and then user is caused to see Audio experience when seeing video is poor.

Summary of the invention

This application provides a kind of method, apparatus of audio processing, electronic equipment and computer readable storage mediums, are used for It solves localization of sound sense and spatial impression performance is insufficient and user experiences poor problem when watching video.The skill Art scheme is as follows:

In a first aspect, a kind of method of audio processing is provided, this method comprises:

Obtain audio-frequency information to be processed and the audio-frequency information by number of people microphone records；

The audio-frequency information of preset kind is determined from audio-frequency information to be processed, and the audio-frequency information of preset kind is passed through Preset plug-in is handled.

In a possible implementation, audio-frequency information to be processed and the sound by number of people microphone records are obtained Frequency information, before further include:

In audio-frequency information recording process, determine what current recording used based on the distance between sound source and each microphone Microphone；

It carries out recording corresponding audio-frequency information by the microphone determined.

In a possible implementation, determine that current record uses based on the distance between sound source and each microphone Microphone；It carries out recording corresponding audio-frequency information by the microphone determined, comprising:

When detecting that the distance between sound source and number of people microphone meet the first preset condition, determine that current record uses Microphone be number of people microphone, and pass through the corresponding audio-frequency information of number of people microphone records；

When detecting that the distance between sound source and capacitance microphone meet the second preset condition, determine that current record uses Microphone be capacitance microphone, and corresponding audio-frequency information is recorded by capacitance microphone.

In a possible implementation, the audio-frequency informations of number of people microphone records and treated audio will be passed through Information carries out sound stereo process, comprising:

It will be in such a way that the audio-frequency information of number of people microphone records and treated audio-frequency information be by linear superposition Carry out sound stereo process.

In a possible implementation, the audio-frequency informations of number of people microphone records and treated audio will be passed through Information carries out sound stereo process by way of linear superposition, comprising:

It will be by the audio-frequency informations of number of people microphone records and treated that audio-frequency information carries out linear superposition；

Audio signal after linear superposition is divided at least two audio signal intensity intervals according to audio intensity size；

Audio intensity contraction is carried out to each audio signal intensity interval respectively using corresponding shrinkage rates；

At least two audio signal intensity intervals shunk by audio intensity are overlapped；

Wherein, the audio intensity corresponding with audio signal intensity interval of shrinkage rates used by audio signal section is in anti- Proportionate relationship.

It is described by the audio-frequency information by number of people microphone records and processing in a possible implementation Audio-frequency information afterwards carries out sound stereo process, later further include:

Audio-frequency information after stereo process is synthesized with video information to be synthesized.

In a possible implementation, the audio-frequency information after stereo process is closed with video information to be synthesized At, comprising:

By after stereo process audio-frequency information and video information to be synthesized encode respectively, after obtaining coded treatment Audio-frequency information and coded treatment after video information；

Video information after audio-frequency information and coded treatment after coded treatment is synthesized.

In a possible implementation, by after stereo process audio-frequency information and video information to be synthesized distinguish It is encoded, the video information after audio-frequency information and coded treatment after obtaining coded treatment, later further include:

The corresponding video frame rate of video information after determining coding；

Based on the corresponding video frame rate of encoded video information to encoded audio information and encoded video information into Row interweaves, and interweave queue after being encoded；

Video information after audio-frequency information and coded treatment after coded treatment is synthesized, comprising:

Intertexture queue after coding is synthesized.

In a possible implementation, this method further include:

Preset plug-in is head related transfer function HRTF plug-in unit.

Second aspect provides a kind of device of audio processing, which includes:

Module is obtained, for obtaining audio-frequency information to be processed and by the audio-frequency information of number of people microphone records；

First determining module, for determining the sound of preset kind from the audio-frequency information to be processed that acquisition module is got Frequency information；

Plug-in unit processing module, for by the first determining module determine preset kind audio-frequency information by preset plug-in into Row processing；

Sound stereo process module, at will be by the audio-frequency information and plug-in unit processing module of number of people microphone records Audio-frequency information after reason carries out sound stereo process.

In one possible implementation, it is included at least one of the following: in audio-frequency information to be processed

Ambient sound information；Audio information；The audio-frequency information recorded by capacitance microphone；Background music information.

In one possible implementation, device further include: the second determining module records module；

Second determining module, for being based on the distance between sound source and each microphone in audio-frequency information recording process The microphone used is currently recorded in determination；

Module is recorded, the microphone for determining by the second determining module carries out recording corresponding audio-frequency information.

In one possible implementation, the second determining module detects sound source and number of people microphone specifically for working as The distance between when meeting the first preset condition, determine that currently recording the microphone that uses is number of people microphone；

Module is recorded, is believed specifically for the corresponding audio of number of people microphone records determined by the second determining module Breath；

Second determining module detects that the distance between sound source and capacitance microphone meet the second default item specifically for working as When part, determine that the microphone currently recorded and used is capacitance microphone；

Module is recorded, corresponding audio is recorded specifically for the capacitance microphone determined by the second determining module and believes Breath.

In one possible implementation, sound stereo process module is specifically used for that number of people microphone records will be passed through Audio-frequency information and treated that audio-frequency information carries out sound stereo process by way of linear superposition.

In one possible implementation, sound stereo process module includes: that superpositing unit, division unit, audio are strong Degree shrinks unit；

Superpositing unit, for will be by the audio-frequency informations of number of people microphone records and treated that audio-frequency information carries out line Property superposition；

Division unit, for the audio signal after superpositing unit linear superposition to be divided at least according to audio intensity size Two audio signal intensity intervals；

Audio intensity shrinks unit, each audio mixing for being marked off respectively to division unit using corresponding shrinkage rates Signal strength section carries out audio intensity contraction；

Superpositing unit is also used to that at least two audio signals that unit audio intensity is shunk will be shunk by audio intensity strong Degree section is overlapped；

In a possible implementation, the device further include: synthesis module；

Synthesis module, for by after sound stereo process module stereo process audio-frequency information and video information to be synthesized It is synthesized.

In one possible implementation, synthesis module includes: coding unit, synthesis unit；

Coding unit, for by after stereo process audio-frequency information and video information to be synthesized encode respectively, The video information after audio-frequency information and coded treatment after obtaining coded treatment；

Synthesis unit, for by the video information after the audio-frequency information and coded treatment after coding unit coded treatment into Row synthesis.

In one possible implementation, device further include: third determining module, interleaving block；

Third determining module, for determining the corresponding video frame rate of video information after encoding；

Interleaving block, the corresponding video frame rate of encoded video information for being determined based on third determining module is to volume Audio-frequency information and encoded video information are interleaved after code, and interweave queue after being encoded；

Synthesis module is synthesized specifically for intertexture queue after encoding interleaving block.

In a possible implementation, the device further include:

Preset plug-in is head related transfer function HRTF plug-in unit.

The third aspect provides a kind of electronic equipment, which includes:

One or more processors；

Memory；

One or more application program, wherein one or more application programs be stored in memory and be configured as by One or more processors execute, and one or more programs are configured to: executing the audio processing according to shown in first aspect Method.

Fourth aspect provides a kind of computer readable storage medium, is stored thereon with computer program, which is located Manage the method for realizing audio processing shown in first aspect when device executes.

Technical solution provided by the embodiments of the present application has the benefit that

It is and existing this application provides a kind of method, apparatus of audio processing, electronic equipment and computer readable storage medium Have in technology and the audio-frequency information with Video Composition is handled in such a way that clear stereo rings duplication Ambisonics Compare, audio-frequency information of the application by obtaining audio-frequency information to be processed and by number of people microphone records, then to The audio-frequency information of preset kind is determined in the audio-frequency information of processing, and the audio-frequency information of preset kind is carried out by preset plug-in Processing, then will be by the audio-frequency informations of number of people microphone records and treated that audio-frequency information carries out sound stereo process. That is the application by after the audio-frequency information for belonging to preset kind is handled by preset plug-in, then with pass through number of people Mike The audio-frequency information that wind is recorded is synthesized, due to record and by preset plug-in to audio-frequency information by number of people microphone Handled, the space orientation effect of audio-frequency information can be improved, so as to improve the sound location sense of audio-frequency information with And spatial impression, and then the audio experience of user can be improved, especially the audio experience when watching video.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, institute in being described below to the embodiment of the present application Attached drawing to be used is needed to be briefly described.

Fig. 1 is a kind of method flow schematic diagram of audio processing provided by the embodiments of the present application；

Fig. 2 is a kind of apparatus structure schematic diagram of audio processing provided by the embodiments of the present application；

Fig. 3 is the apparatus structure schematic diagram of another audio processing provided by the embodiments of the present application；

Fig. 4 is a kind of structural schematic diagram of the electronic equipment of audio processing provided by the embodiments of the present application；

Fig. 5 is the schematic diagram of number of people microphone；

Fig. 6 is the schematic diagram by loading effect device to classification samples group；

Fig. 7 is the plug-in unit schematic diagram handled the audio-frequency information of preset kind；

Fig. 8 is the schematic diagram of flute card ear model coordinate system；

Fig. 9 is by inputting schematic diagram of the numerical value to determine the position on sound three-dimensional in plug-in unit；

Figure 10 is the schematic diagram of sound and hearer's positional relationship；

Figure 11 is by adjusting the gain G AIN button in plug-in unit, to adjust the schematic diagram of the volume of input sound source；

Figure 12 is the schematic diagram for adjusting six faces and hearer's distance；

Figure 13 is the schematic diagram for optimizing the damping button of sound frequency range；

Figure 14 is real-time auralisation (REALTIME AURALISATION) button schematic diagram；

Figure 15 is the schematic diagram of reverberation processing mode；

Figure 16 a is that the parameter of the output for the 3D audio put outside earphone adjusts schematic diagram；

Figure 16 b is that the parameter of the output for the 3D audio put outside loudspeaker adjusts schematic diagram；

Figure 17 is between the three-dimensional audio that the three-dimensional audio regained by ears sound and ears sound rendering obtain Transition degree schematic diagram；

Figure 18 is the schematic diagram for adjusting volume by adjusting compression effectiveness device in mix process；

Figure 19 is the schematic diagram of intertexture queue before encoding in the embodiment of the present application；

Figure 20 be synthesis after multimedia messages in video format parameter schematic diagram；

Figure 21 is the Format adjusting schematic diagram of output audio-frequency information after synthesis；

Figure 22 be the embodiment of the present application in by taking sound caricature as an example integral manufacturing flow diagram.

Specific embodiment

Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and is only used for explaining the application, and is not construed as limiting the claims.

Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.

To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party Formula is described in further detail.

First to this application involves several nouns be introduced and explain:

Three-dimensional (3Dimensions, 3D) audio: 3D audio is a concept in contrast, if existing surround Sound, such as 5.1 standards or 7.1 standards, they are the sound standards on a two-dimensional surface, and the sound standard of 3D will be a packet The standard of sound height and depth can be experienced by including hearer.The viewpoint of mainstream is will to apply the sound of HRTF playback as 3D at present The presentation mode of audio.

Ambisonics (full name in English: Ambisonics): the theoretical basis of Ambisnonics is simply divided The acoustic pressure for from the point of view of analysis being a plane in certain known region can be by the acoustic pressure that any point is calculated of acoustic pressure gradient；? In stereophonic field, the coordinate system of three dimensional stress is spherical coordinate system, and each layer of spherical surface is known as single order, and Ambisonics is exactly to pass through Pick up the acoustic pressure after original sound field three-dimensional exploded on each direction (plane), finally in playback system on spherical structure rationally Original sound field is rebuild in equally distributed loudspeaker.

Double-ear type sound-recording: Double-ear type sound-recording is exactly sound from institute after the filtering for traveling to intelligent's ear process experience human body self structure Obtain a kind of technology for being different from normal stereo pickup；It is achieved in that the left and right ear by the way that microphone to be placed on to headform Pickup is gone in road.

Head related transfer function (Head Related Transfer Functions, HRTF): come in conjunction with Double-ear type sound-recording It sees, the filtering of human body is exactly evolved into the coding of filtering, and HRTF coding obtains the mode that can refer to convolution reverberation: passing through It is influenced in seldom environment in a space environment, picks up impulse response using Double-ear type sound-recording and (usually transient pulse or sweep Frequency signal), the impulse response signals after obtaining Double-ear type sound-recording are referred to as head-related impulse response (Head Related Impulse Response, HRIR), HRTF coding is obtained with original impulse response comparing calculation；We can be by right in this way The monophonic sounds of recording carry out HRTF coding and obtain the different sound of left and right ear, that is, biphonic, wherein pass through The biphonic of HRTF coding includes three-dimensional spatial information.

Sound spatialization: being that perception sound is removed just with our ears in space in our true acoustic surroundings Information, the spatial impression including localization of sound and sound (sound source is for we itself distance)；

Capacitance microphone: using the variation of capacitance size, voice signal is converted to the microphone of electric signal；

Number of people microphone: number of people microphone is to have auricle, ear canal, cranium, hair and shoulder or even skin and bone It is also using the microphone of the material manufacture closest with human body, which carries out the record of two-channel by " the emulation number of people " Sound mode carries out audio recording: two miniature omnidirectional microphones being placed in the dummy of one with the several leading striking resemblances of true man (close to the position of human ear eardrum) in the ear canal of head, simulation human ear hears the whole process of sound.

In the prior art, in the video or virtual reality (Virtual Reality, VR) view at 360 degree of rotatable visual angles In frequency, sound source is placed by using the mode of Ambisonics and is encoded, carries out the discrete transform of sound source according to demand, And be decoded by HRTF, final earphone output is reset.

Sound effect produced by this technology follows camera lens real time rotation with sound and generates the effect of real-time change Fruit, but positioned since itself is a the technological means of blurring auditory localization by Ambisonics, and in sound far field Under poor limiting factor, localization of sound and spatial impression is caused to show insufficient.

To solve the above-mentioned problems, so that using sound performance, restore the sense of hearing of a real scene, allow user in 2D Plane or VR video, or even get rid of in the performance of video content, can experience a complete 3D sound scape, allow use to reach Family becomes the participant of video story progress in sound performance, rather than attentive listener, especially by following manner:

The embodiment of the present application combines HRTF plug-in unit in the form of 3D audio immersion experience, and technically with number of people recording Mode carries out the post-production of sound, and cooperates the audio experience of video content.

How the technical solution of the application and the technical solution of the application are solved with specifically embodiment below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiments herein is described.

The embodiment of the present application provides a kind of method of audio processing, as shown in Figure 1, this method comprises:

Step S101, acquisition audio-frequency information to be processed and the audio-frequency information by number of people microphone records.

Ear for the embodiment of the present application, during passing through number of people microphone records audio-frequency information i.e. by copying people Exterior feature, ear canal, the cranium of people, shoulder etc. all can cause one to sound number of people recording technology to the refraction of sound wave, diffraction and diffraction Fixing is rung.Acoustically, with HRTF, i.e., " head-related transfer function " describes this influence.Just because of the influence of HRTF, The big brain-capacity of people rule of thumb judges the azimuth-range that sound issues.In the embodiment of the present application, it is recorded sound by the number of people The performance that the advantage of frequency information essentially consists in patch otoacoustic emission sound is much better than the performance of plug-in unit, that is, and its sound closely, for example, blowing The sense of hearing of gas sound and " secret words ", this kind of performance can be with reference to spontaneous consciousness meridianal reflection (Autonomous Sensory Meridian Response, ASMR) class sound；Disadvantage: the mode of synchronization, the requirement to space enrironment are high； It is difficult to be recorded by stages；The sound of recorded content positions, and spatial impression can not be changed in the later period；Walking when recording and perform requirement Height is difficult to predict sense of hearing, needs largely to test；The sound of far sound field, location sense performance are general.In the embodiment of the present application, Number of people microphone is as shown in Figure 5.

For the embodiment of the present application, included at least one of the following: in audio-frequency information to be processed

For the embodiment of the present application, the audio-frequency information recorded by capacitance microphone can be to be recorded by capacitance microphone Voice-over actor sound；Ambient sound information, audio information and background music information can be the audio letter prerecorded Breath, or pre-synthesis audio-frequency information.It does not limit in the embodiment of the present application.

Step S102, the audio-frequency information of preset kind is determined from audio-frequency information to be processed, and by the sound of preset kind Frequency information is handled by preset plug-in.

For the embodiment of the present application, audio-frequency information to be processed can be divided into the audio of 3D class according to the marshalling in engineering The audio-frequency information of information and non-3D class.For example, the audio-frequency information of 3D class may include: dialogue audio-frequency information and movement sound Sound；The audio-frequency information of non-3D class includes: ambient sound information, background music information and special-effect audio-frequency information.For example, right When classification samples are organized into groups, as shown in fig. 6, respectively be from top to bottom dialogue audio-frequency information, environmental audio information and Special-effect audio-frequency information three groupings, and there are also many Sound Tracks under each grouping, it can be only following in each grouping Effect device is loaded, does not need to load below each Sound Track, to save resource consumption.

For the embodiment of the present application, the audio-frequency information of the preset kind can be the audio-frequency information of 3D class.

For the embodiment of the present application, the audio-frequency information (audio-frequency information of 3D class) of preset kind is carried out by preset plug-in Processing, so that treated, audio-frequency information has surrounding effect.In application embodiment, handled by preset plug-in Advantage is: because being the processing to point sound source, facilitating the positioning in later period, spatial impression adjustment；Disadvantage: patch otoacoustic emission sound is unobvious； When point sound source is excessive, the amount that sound source controls in post-processing can be bigger.

For the embodiment of the present application, by testing the ambient sound of selected the application using stereo material.Specifically, It is tested by following two mode:

(1) ambient sound for downloading B-format format, uses ambisonics tool set (Ambisonics Tool Kit, ATK) and FB360 plug-in unit are decoded as biphonic, wherein B-format format is A kind of sound standard format of Ambisonics, FB360 plug-in unit are a audio plug that Facebook produces；

(2) monophonic and stereo ambient sound are used, is first encoded using ATK plug-in unit and FB360 plug-in unit, is converted Journey, and finally it is decoded as biphonic；

In both modes, ambient sound sense of hearing obtained is different from direct stereosonic sound sense of hearing, but its really The Sensurround of its real sound is general.Firstly, the four rail sound based on B-format, are not a particularly preferred performance ambient sound The record type of sound positioning, because it is recorded, essence is that three-dimensional centre/side (Mid/Side, MS) mode is recorded, and is had ignored Intensity difference；The mode of environment recording based on square surface (full name in English: Quad) mode can be more preferable.

Furthermore the mode of Ambisonics why VR sound application it is burning hoter, being based primarily upon it can be revolved with sound field Turn to be convenient to be combined with the mode of head rotation, while being blurred as one the lattice of the sound field rebuilding of sounding point sound source Formula can be used in ambient sound, and in head rotation, form the sound field rotation an of entirety, economize on resources consumption；So And in scene of game, close to or away from the point sound source of setting or a certain recording, sound is varied less.

But in the embodiment of the present application these Ambisonics advantage, ambient sound is used as the performance of transition more, is not It is certain to need, therefore ambient sound uses stereo material.

Step S103, sound will be carried out by the audio-frequency information of number of people microphone records and treated audio-frequency information to mix Sound processing.

For the embodiment of the present application, the audio-frequency information after stereo process is exported by earphone.

The embodiment of the present application provides a kind of method of audio processing, in the prior art to the audio of Video Composition letter Breath carries out processing in such a way that clear stereo rings duplication Ambisonics and compares, and the embodiment of the present application is by obtaining wait locate Then the audio-frequency information of reason and audio-frequency information by number of people microphone records determine default from audio-frequency information to be processed The audio-frequency information of type, and the audio-frequency information of preset kind is handled by preset plug-in, it then will pass through number of people Mike The audio-frequency information that wind is recorded and treated audio-frequency information carry out sound stereo process.I.e. the embodiment of the present application will be by that will belong to After the audio-frequency information of preset kind is handled by preset plug-in, then with by the audio-frequency informations of number of people microphone records into Row synthesis can be mentioned due to carrying out recording by number of people microphone and being handled by preset plug-in audio-frequency information The space orientation effect of high audio information, so as to improve the sound location sense and spatial impression of audio-frequency information, and then can be with Improve the audio experience of user, the especially audio experience when watching video.

The alternatively possible implementation of the embodiment of the present application, preset plug-in are head related transfer function HRTF plug-in unit.

For the embodiment of the present application, by above-mentioned for the advantage and disadvantage recorded by number of people microphone and by inserting The advantage and disadvantage that part carries out post-processing are compared, because number of people microphone location cannot function as the means that works largely use, still The patch ear performance that the number of people is recorded exclusive is again bright spot, therefore we are being directed to works (for example, hand is swum CG, VR game CG and moved State caricature) it is recorded, set several links for recording by number of people microphone.

For the embodiment of the present application, when the plug-in unit used when determining post-production, we to the plug-in units of several mainstreams into It has gone comparison, and the audio-frequency information on the video content being directed to due to us, has considered the plug-in unit for supporting DAW to use, for example, DearVR, Oculus, Ambipan and FB360 are based ultimately upon following reason and determine to use DearVR as plug-in unit to default The audio-frequency information of type is handled, specific as shown in Figure 7；The embodiment of the present application is mainly introduced by DearVR plug-in unit, but It is to be not limited to DearVR plug-in unit.

Wherein, the advantages of DearVR is as plug-in unit: (1) DearVR is integrated with reverberation and early reflection in plug-in unit, and Adjustable damping (full name in English: Damping) and gain (full name in English: Gain), reverberation is also an option that spatial shape；

(2) selection of output mode: the output of DearVR is more flexible, not single only to support biphonic output；

(3) in the setting of sound automation, use more preferable than Oculus, and increasing with sound source, support single track The display of sound source；Oculus is in card for testing of the plug-in unit as a game engine of DAW, and the experience in DAW is not counting spy It is not good.

For the embodiment of the present application, when above-mentioned selection DearVR is as preset plug-in, by preset kind in step S102 Audio-frequency information handled by preset plug-in, can specifically include: the audio-frequency information of preset kind is inserted by DearVR Part is handled:

(1) coordinate system such as Fig. 8 of Descartes's model (full name in English: Cartesian Mode) wherein Descartes's model is selected It is shown, it, can XYZ be corresponding through dragging mouse or in Fig. 9 as shown in figure 9, white point in region indicates sound source Input numerical value -9.58 in numerical value, such as the corresponding input frame of X-direction is inputted in input frame, is inputted in the corresponding input frame of Y-direction Numerical value 0.00 inputs numerical value 6.33 in the corresponding input frame of Z-direction, wherein X-direction, Y direction and Z axis side in Fig. 9 To the X-direction, Y-direction and Z-direction for being equivalent to coordinate system in Fig. 8, the mode for inputting numerical value determines the three-dimensional position of sound, into One step for the target voice for not being a fixed position, the movement of target voice takes the means of automation to target voice Estimation be configured, for example, as shown in Figure 10, XYZ respectively indicates sound and the distance relation of hearer, wherein in the side X To acoustic distance hearer is 2.48 meters (m), and in Z-direction, acoustic distance hearer is 0.47 meter, in Z-direction, acoustic distance hearer It is 0.00 meter；

(2) the volume Gain (reflection and reverberation that gain herein does not influence the later period) of adjustment input sound source, into One step, in the case where not influencing space sense of hearing, the wave volume balance between each sound is adjusted, specifically, by adjusting GAIN button as shown in figure 11, to adjust the volume of input sound source.

(3) use early reflection: the reflecting module of DearVR according to the position of sound source generate early reflection, target voice into Row movement, emission mode can also be suitable for the variation of signal in real time, such as can be to six faces (left, preceding, right, upper, after, bottom) Generate real-time change；

It can specifically include in step (3) using early reflection: a, adjusting six faces respectively at a distance from hearer, this six Face is respectively (left (left), preceding (front), right (right), upper (top), rear (back), bottom (bottom)), specific to adjust Mode is as shown in figure 12；B, the low-pass filter for adjusting early reflection, avoids that more high frequency sound can be generated due to early reflection Sound is more ear-piercing, wherein needing to optimize sound frequency range is (500Hz to 19999Hz), wherein the mode of optimization sound frequency range By adjusting damping (full name in English: DAMPING) button in Figure 13；C, real-time change selects, real-time sound as shown in figure 14 The sense of hearingization simulates (REALTIME AURALISATION) button, indicates the variation of target voice and metope relative position, can be with Hearer is set to hear the early reflection of corresponding generation in real time.The real-time change of early reflection can enhance the three-dimensional localization sense of sound source.

(4) reverberation is handled: enhancing spatial impression to sound load reverberation.It is mainly set with the preset space of selection, setting Space size, control low-pass filter, as shown in figure 15, for example, can DEAR VR plug-in unit reverberation (full name in English: REVERB) module passes through the space of selected movie institute effect, i.e., real-time auditory simulation (full name in English: VIRTUAL ACOUSTICS selected movie institute (full name in English: Cinema) effect in), and can by adjusting size (full name in English: Size) installation space size, and low-pass filter is controlled by adjusting damping (full name in English: DAMPING).

For the embodiment of the present application, after being handled by audio-frequency information of the DearVR to 3D class using predetermined manner into Row output.Specifically, for the output for the 3D audio put outside earphone, choose two-channel it is stereo (full name in English: Binaural), as illustrated in fig 16 a；For the output for the 3D audio put outside loudspeaker, choose it is 2.0 stereo (full name in English: Stereo), as shown in fig 16b, the difference of the two is only that the HRTF parameter whether loaded.

The alternatively possible implementation of the embodiment of the present application, step S101 further include before that step Sa (does not show in figure Out) and step Sb (not shown), wherein

Step Sa, in audio-frequency information recording process, current record is determined based on the distance between sound source and each microphone Make the microphone used.

For the embodiment of the present application, in the design, listener need closely the sound listened to using number of people microphone into Row is recorded；During actually recording, voice-over actor is planned centered on the number of people microphone, 3-5 meters (m) carry out for radius It dubs, while recording the distance between a part of voice-over actor and number of people microphone less than radius 10cm is circular sound, is made It is showed for the sound of exaggerationization, reaches certain ASMR effect.

Step Sb, it carries out recording corresponding audio-frequency information by the microphone determined.

The alternatively possible implementation of the embodiment of the present application, step Sa and step Sb include step Sab1 (in figure It is not shown) and step Sab2 (not shown), wherein

Step Sab1, when detecting that the distance between sound source and number of people microphone meet the first preset condition, determination is worked as The preceding microphone used of recording is number of people microphone, and passes through the corresponding audio-frequency information of number of people microphone records.

For example, the first preset condition be no more than 5m, then when detect the distance between sound source and number of people microphone less When 5m, the recording of audio-frequency information is carried out by number of people microphone.

Step Sab2, when detecting that the distance between sound source and capacitance microphone meet the second preset condition, determination is worked as The preceding microphone used of recording is capacitance microphone, and records corresponding audio-frequency information by capacitance microphone.

For example, the second preset condition is greater than 5m, then when detecting the distance between sound source and capacitance microphone greater than 5m When, the recording of audio-frequency information is carried out by capacitance microphone.

For the embodiment of the present application, the audio recorded and by preset plug-in to 3D type is carried out by number of people microphone Information is handled, and 3D sound can also be protruded by the following means in the performance of sound by being for prominent 3D sound:

(1) sound occurs capable of more attracting audience's attention at the rear of body, therefore many sound can be set in plot The sounding (sound source) of sound occurs at body rear；

(2) high-frequency sound, metalloid, the sound of high heel footwear can become apparent from distance performance and positioning performance；

(3) if based on wanting especially to cause the fear or special sense of hearing of audience, can take sound it is unexpected by It is remote to become nearly (preferably patch ear sense of hearing)；

(4) if the longer sound of length needs to show sound bearing and spatial impression for some time, picture circle can be used Shape or the track (being exactly the variation for simultaneously including all around acoustic image) for surrounding trend, can allow audience that can more recognize 3D sound Performance, this means usually requires that the volume performance of sound will compare under equilibrium situation, similar to small light of one section of Dynamic comparison Music or monologue；

(5) DearVR will usually select biggish distance to become in use, if to show the trail change of 3D sound Change.

The alternatively possible implementation of the embodiment of the present application may include: by ears sound before step S103 The three-dimensional audio that synthesis (i.e. treated audio-frequency information) and ears sound regain (believe by the audio of number of people microphone records Breath) frequency range adjustment is carried out, to reach sense of hearing matching.

Specifically, ears sound weight is transitioned into from the three-dimensional audio that ears sound rendering obtains for the target voice of movement The three-dimensional audio picked up, or be that the three-dimensional audio that ears sound regains is transitioned into the three-dimensional that ears sound rendering obtains Audio.This kind of sound need to carry out frequency range certain optimization, can just make the transition ratio of sense of hearing because acquisition modes are different It is smoother.Because the essence of HRTF is a tone filter, we use EQ effect device in the processing of transition optimization, but same When should be avoided excessively using EQ effect, locating effect is weakened, as shown in figure 17, by adjusting frequency (english abbreviation: FREQ), Gain (full name in English: GAIN), Q regain to obtain so that being transitioned into ears sound from the three-dimensional audio that ears sound rendering obtains Three-dimensional audio, or the three-dimensional audio that ears sound rendering obtains is transitioned by the three-dimensional audio that ears sound regains, Middle figure intermediate frequency spectral line characterization is transitioned into the three-dimensional audio that ears sound regains from three-dimensional audio, or is regained by ears sound To three-dimensional audio be transitioned into the transition degree of the three-dimensional audio that ears sound rendering obtains.

For the embodiment of the present application, step S103 is will be by the audio-frequency informations of number of people microphone records and treated Audio-frequency information carries out sound stereo process, during carrying out sound stereo process, including three-dimensional audio information and non-three It ties up audio-frequency information and carries out sound stereo process.In the embodiment of the present application, during carrying out sound audio mixing, primarily with respect to The voice parts of prominent three-dimensional audio effect, reduce the volume of non-three-dimensional audio effect, and using compression effectiveness device, dynamic adjusts sound Amount, as shown in figure 18, when non-three dimensional sound track receives 3d orbit volume, non-three dimensional sound track is according to compression effectiveness Device setting, volume are changed, the variation of the spectrum characterization volume in Figure 18.

The alternatively possible implementation of the embodiment of the present application, step S103 can specifically include " step S1031 (figure In be not shown), wherein

It step S1031, will be by the audio-frequency informations of number of people microphone records and treated that audio-frequency information passes through is linear The mode of superposition carries out sound stereo process.

For the embodiment of the present application, step S1031 can be with specifically: by by the audio-frequency information of number of people microphone records with And treated audio-frequency information carries out sound stereo process in such a way that linear superposition is averaging；Step can also be passed through S10311- step S10314 carries out sound stereo process, and wherein step S10311- step S10314 carries out sound stereo process Mode is as detailed below, and details are not described herein.

For the embodiment of the present application, in order to avoid being distorted after linear superposition, the result after linear summation is averaging, also It is if there is the road N audio mixing, the result of summation is equivalent to every circuit-switched data multiplied by a weight coefficient 1/N divided by N.This processing has Effect avoids problem of dtmf distortion DTMF.

The alternatively possible implementation of the embodiment of the present application, step S1031 can specifically include step S10311 (figure In be not shown), step S10312 (not shown), step S10313 (not shown), step S10314 (do not show in figure Out), wherein

It step S10311, will be by the audio-frequency informations of number of people microphone records and treated that audio-frequency information carries out is linear Superposition.

Step S10312, the audio signal after linear superposition is divided at least two audio mixings letter according to audio intensity size Number intensity interval.

For the embodiment of the present application, step S10312 be can specifically include: according to the sound of the multiple equal lengths divided in advance The signal that audio signal is in different audio intensity distributed areas is determined as at least two audio signals by frequency intensity distribution section Intensity interval.

Wherein, in the audio intensity distributed area of the multiple equal lengths divided in advance, n-th of audio intensity distributed area Between are as follows:

[(n-1)×2^Q-1, n × 2^Q-1], wherein n >=1, Q are preset constant.

Step S10313, audio intensity receipts are carried out to each audio signal intensity interval respectively using corresponding shrinkage rates Contracting.

For the embodiment of the present application, step S10313 may include: the audio mixing letter in n-th of audio intensity distributed area Number corresponding shrinkage rates of intensity interval are [(k-1)/k] * (1/k)ⁿ, wherein k is preset contraction factor.

For the embodiment of the present application, the probability occurred due to the signal of the Medium and low intensity in voice signal is than high-intensitive letter Number higher, the embodiment of the present application can use different shrink process schemes using to high-intensity signal and Medium and low intensity signal, By stages compression is carried out to the audio signal of the linear superposition after audio mixing, more low intensive signal uses biggish shrinkage ratio Example, it is ensured that there is certain contraction simultaneously compared with the identifiability of low intensity signal, high-intensitive signal uses smaller shrinkage rates, with Ensure to be not in that audio signal is overflowed, while also retaining certain identifiability.The shrinkage rates are the signal after shrinking Ratio between intensity and original signal intensity, such as original signal intensity are 100, are 50 after contraction, then shrinkage rates are 50%.

For example, with n-th of audio intensity distributed area of above-mentioned division are as follows: [(n-1) × 2^Q-1, n × 2^Q-1] by linear superposition Audio signal afterwards is divided into for multiple intensity interval signals, and the audio signal in n-th of audio intensity distributed area is strong Spending the corresponding shrinkage rates in section is [(k-1)/k] * (1/k) n, and wherein k is preset contraction factor, usually takes 2 multiple, example Such as 8 or 16.In a preferred embodiment, k takes 8, Q value 16.

Step S10314, at least two audio signal intensity intervals shunk by audio intensity are overlapped.

For the embodiment of the present application, by using above-mentioned mixed audio processing method, by the audio signal to linear superposition into Row intensity subregion, and then shrink process is carried out using different shrinkage rates to different audio signal intensity intervals, it is kept away with realizing To exempt to overflow distortion, while the processing of range shortening, for shrinkage rates with the scale of audio mixing number, time etc. is all unrelated, so Be not in it is suddenly big or suddenly small, it is unintelligible the problems such as.

For the embodiment of the present application, step S103 can also include: by the audio-frequency information of number of people microphone records and place Audio-frequency information after reason is sent to terminal device；Terminal device by with the equal number of destination player of audio-frequency information number into Row audio mixing decoding, wherein destination player is identical as the decoded object format of audio mixing is carried out using the destination player.

Wherein, which can be Streaming Media (FLASH VIDEO, FLV) format.

For the embodiment of the present application, volume adjustment when for sound stereo process, for the sound for needing prominent 3D audio Line point, by the volume for reducing non-3D audio-frequency information.Specifically, by utilizing compression effectiveness device dynamic adjustment volume.

The alternatively possible implementation of the embodiment of the present application, step S103 can also include step S104 (figure later In be not shown), wherein

Step S104, the audio-frequency information after stereo process is synthesized with video information to be synthesized.

For the embodiment of the present application, the audio-frequency information after stereo process is synthesized with video information to be synthesized, is obtained It is exported to multimedia messages, such as sound caricature etc..

The alternatively possible implementation of the embodiment of the present application, step S104 can specifically include step S1041 (in figure It is not shown) and step S1042 (not shown), wherein

Step S1041, by after stereo process audio-frequency information and video information to be synthesized encode respectively, obtain The video information after audio-frequency information and coded treatment after coded treatment.

Step S1042, the video information after the audio-frequency information and coded treatment after coded treatment is synthesized.

It can also include: the video frame based on video information to be synthesized before step S1041 for the embodiment of the present application Rate to after the stereo process audio-frequency information and video information to be synthesized be interleaved with formed encode before intertexture queue.

For the embodiment of the present application, by before encoding to the audio-frequency information and video letter to be synthesized after the stereo process Breath is interleaved to be formed and encode preceding intertexture queue, and the media file sound intermediate frequency played can be made synchronous with video holding.Such as figure Shown in 19, before encoding in intertexture queue, frames of video information Vi and audio-frequency information frame Ai are successively alternately arranged, wherein any frame Video information Vi all has its corresponding frame audio-frequency information Ai, specifically, a frame video information V2 have its corresponding one Frame audio-frequency information A2.

It for the embodiment of the present application, is interleaved by formula (1), and obtains intertexture queue before encoding, wherein

NBitA=nChannel × nSampleRate × nBit* (1/nFramerate)/8 (1)；

The byte for including in the corresponding frame audio-frequency information Ai of any frame video information Vi in intertexture queue before calculation code Number nBitA, wherein nChannel is the sound channel number of the audio-frequency information after audio mixing, and nSamplerate is the audio letter after audio mixing The sample rate of breath, nBit are the quantizing bit number of the audio-frequency information after each audio mixing, and nFramerate is that video to be synthesized is believed The video frame rate of breath.For example, it is assumed that the video frame rate nFramerate of video information to be synthesized is 30 frames/second, other parameters Do not consider；The parameter of audio-frequency information after audio mixing are as follows: sound channel nChannel is 2 sound channels, and sample rate nSamplerate is 48000Hz；Quantizing bit number nBit is 24bit, then can be calculated according to formula (1) and obtain any frame audio-frequency information Ai to be synthesized The byte number nBitA=2*48000*24* (1/30)/8 for inside including.

The alternatively possible implementation of the embodiment of the present application, step S1041 can also include: step Sc (figure later In be not shown) and step Sd (not shown), wherein

Step Sc, the corresponding video frame rate of video information after coding is determined.

Step Sd, based on the corresponding video frame rate of encoded video information to encoded audio information and encoded video Information is interleaved, and interweave queue after being encoded.

For the embodiment of the present application, step Sc and step Sd may include: to acquire encoded audio video queue respectively In each frame encoded audio information and the consumption of each frame encoded video information byte number, compiled with obtaining each frame respectively The duration of audio-frequency information and the duration of each frame encoded video information after code；And based on sound after each frame coding The duration of the duration of frequency information and each frame encoded video information is to the coding in encoded audio video queue Audio-frequency information and encoded video information are interleaved with the queue that interweaves after being encoded afterwards.

Wherein, the frame coding that the duration of any frame encoded video information is corresponding in intertexture queue after coding The difference of the duration of audio-frequency information is less than or equal to preset threshold afterwards.

It can by being interleaved to encoded audio information and encoded video information for the embodiment of the present application Nonsynchronous problem in the composite document of video/audio is avoided, so as to promote user experience.

The alternatively possible implementation of the embodiment of the present application, step S1042 can specifically include: step S10421 (not shown), wherein

Step S10421, intertexture queue after coding is synthesized.

For the embodiment of the present application, the multimedia messages (including video information and audio-frequency information) after synthesis can be used Technology, that is, motor rest image (or frame by frame) compress technique (Motion Joint Photographic Experts Group, MJPEG) compressed format and MOV encapsulation format are packaged, as shown in figure 20, i.e. Video coding (full name in English: Video It codec) is MJPEG, mode (format) is MOV.

For the embodiment of the present application, when exporting audio-frequency information after composition, for the output of overall sound, use 48Khz, 24BitPCM, WAV format, and pass through three-dimensional voice output, as shown in figure 21, sample frequency (full name in English: Sample It Rate) is 48000, output format (full name in English: Output format) is WAV, and wave bit depth is 24Bit pulse code tune Make (full name in English: Pulse Code Modulation, english abbreviation: PCM), channel be it is stereo (full name in English: Stereo)。

Above-described embodiment can be applied to every field, including but not limited to: the neck such as hand trip, VR game and dynamic caricature Domain, specifically,

By taking sound caricature as an example, as shown in figure 22, in the intellectual property (Intellectual for obtaining sound caricature Property, IP) after authorization, it is designed, adapts for the corresponding audio of the sound caricature, for each frame image, obtain The audio-frequency information for taking the frame image to be synthesized (includes: by the audio-frequency information of the voice-over actor of number of people microphone records, passes through electricity Hold the audio-frequency informations of voice-over actor of microphone records, background music audio-frequency information, ambient sound information, in audio audio-frequency information At least one of), then by the audio-frequency information of the voice-over actor recorded by capacitance microphone, background music information, environment message Breath, in audio audio-frequency information at least one of handled by preset plug-in, by plug-in unit treated audio-frequency information with pass through The audio-frequency information of the voice-over actor of number of people microphone records carries out sound stereo process, then by the audio-frequency information after stereo process It is synthesized with the frame image, to complete the production of sound caricature.

For the embodiment of the present application, by taking sound caricature as an example, introduction is related in product side, and sound caricature is a kind of use Multiple static pictures generate the dynamic effect switched between picture by special effect processing, and are equipped with the video of audio presentation, have Body,

The audio-frequency information of sound caricature is designed by the following means:

(1) it makes rational planning for the sound source moving direction of sound and the corresponding relationship of tableaux；

(2) guarantee screen switching, when scene switching, sound mobile continuity and reasonability；

(3) nearly body patch otoacoustic emission sound is cleverly designed, feels to be no different with real world in sense of hearing.Conventional acoustic is expressed and is developed At sound performance when participating in the cintest；

(4) it on sound type, avoids the form of aside monologue from promoting story plot, is more converted to the side of dialogue Formula, to reappear the scene content of story out.

Processing is carried out for audio-frequency information and being designed for audio-frequency information, the product of sound caricature can be with above-mentioned Realize following effect:

(1) story is showed in a manner of real scene reduction, the progress of story with dialog presentation in personage's scene, with And by motion and sound, sound-content is enriched in the performance of ambient sound；Similar to the sound performance of film, restore truer Audio experience use the surreal voice Art technique of expression only in some specific paragraphs；

(2) it is listened to earphone as medium, shows plot with the sound of immersion, get rid of traditional aside, the mode of monologue, It with the phonetic element of actual distance, allows spectators " on the spot in person ", to experience the progress of story.Its nearly body patch ear sense of hearing experience, position Set judgement experience beyond tradition films and television programs；

(3) in terms of scene reproduction, other than abundantization sound representative element, the performance of whole sound using 3D audio as Number of people recording and HRTF plug-in unit is utilized as method of producing in the form of expression, and making the performance of sound is no longer 2D planarization, 3Dization Sound performance further reach the purpose of design of real scene sound.

The apparatus structure schematic diagram of a kind of audio processing provided by the embodiments of the present application, as shown in Fig. 2, the embodiment of the present application Audio processing device 20 may include: obtain module 21, the first determining module 22, plug-in unit processing module 23, sound audio mixing Processing module 24, synthesis module 25, wherein

Module 21 is obtained, for obtaining audio-frequency information to be processed and by the audio-frequency information of number of people microphone records.

First determining module 22, for determining preset kind from the audio-frequency information to be processed that acquisition module 21 is got Audio-frequency information；

The audio-frequency information of plug-in unit processing module 23, the preset kind for determining the first determining module 22 is inserted by default Part is handled.

Sound stereo process module 24 is believed for will acquire the audio by number of people microphone records that module 21 is got Breath and treated the audio-frequency information of plug-in unit processing module 23 carry out sound stereo process.

The embodiment of the present application provides a kind of device of audio processing, in the prior art to the audio of Video Composition letter Breath carries out processing in such a way that clear stereo rings duplication Ambisonics and compares, and the embodiment of the present application is by obtaining wait locate Then the audio-frequency information of reason and audio-frequency information by number of people microphone records determine default from audio-frequency information to be processed The audio-frequency information of type, and the audio-frequency information of preset kind is handled by preset plug-in, it then will pass through number of people Mike The audio-frequency information that wind is recorded and treated audio-frequency information carry out sound stereo process.I.e. the embodiment of the present application will be by that will belong to After the audio-frequency information of preset kind is handled by preset plug-in, then with by the audio-frequency informations of number of people microphone records into Row synthesis can be mentioned due to carrying out recording by number of people microphone and being handled by preset plug-in audio-frequency information The space orientation effect of high audio information, so as to improve the sound location sense and spatial impression of audio-frequency information, and then can be with Improve the audio experience of user, the especially audio experience when watching video.

A kind of audio processing that above method embodiment provides can be performed in the device of the audio processing of the embodiment of the present application Method, realization principle is similar, and details are not described herein again.

The structural schematic diagram of the device of another kind audio processing provided by the embodiments of the present application, as shown in figure 3, the application is real The device 30 for applying the audio processing of example may include: to obtain module 31, the first determining module 32, plug-in unit processing module 33, sound Stereo process module 34, wherein

Module 31 is obtained, for obtaining audio-frequency information to be processed and by the audio-frequency information of number of people microphone records.

Wherein, the acquisition module 31 in Fig. 3 is same or similar with the function of module 21 is obtained in Fig. 2.

First determining module 32, for determining preset kind from the audio-frequency information to be processed that acquisition module 31 is got Audio-frequency information.

Wherein, the first determining module 32 in Fig. 3 is same or similar with the function of the first determining module 22 in Fig. 2.

The audio-frequency information of plug-in unit processing module 33, the preset kind for determining the first determining module 32 is inserted by default Part is handled.

Wherein, the plug-in unit processing module 33 in Fig. 3 is same or similar with the function of plug-in unit processing module 23 in Fig. 2.

Sound stereo process module 34 is believed for will acquire the audio by number of people microphone records that module 31 is got Breath and treated the audio-frequency information of plug-in unit processing module 33 carry out sound stereo process.

Wherein, the sound stereo process module 34 in Fig. 3 it is identical as the function of sound stereo process module 24 in Fig. 2 or It is similar.

The alternatively possible implementation of the embodiment of the present application includes following at least one in audio-frequency information to be processed :

Further, as shown in figure 3, the device 30 further include: the second determining module 36 records module 37, wherein

Second determining module 36, in audio-frequency information recording process, based between sound source and each microphone away from The microphone used is currently recorded from determination.

For the embodiment of the present application, the second determining module 36 and the first determining module 32 can be the same determining module, It may be two different determining modules.The embodiment of the present application without limitation.

Module 37 is recorded, the microphone for determining by the second determining module 36 carries out recording corresponding audio letter Breath.

A kind of possible implementation of the embodiment of the present application, the second determining module 36 detect sound source specifically for working as When the distance between number of people microphone meets the first preset condition, determine that the microphone currently recorded and used is number of people Mike Wind.

Module 37 is recorded, specifically for the corresponding audio of number of people microphone records determined by the second determining module 36 Information.

Second determining module 36 detects that the distance between sound source and capacitance microphone meet second and preset specifically for working as When condition, determine that the microphone currently recorded and used is capacitance microphone.

Module 37 is recorded, records corresponding audio specifically for the capacitance microphone determined by the second determining module 36 Information.

The alternatively possible implementation of the embodiment of the present application, sound stereo process module 34, specifically for that will pass through The audio-frequency information of number of people microphone records and treated that audio-frequency information is carried out at sound audio mixing by way of linear superposition Reason.

The alternatively possible implementation of the embodiment of the present application, as shown in figure 3, sound stereo process module 34 includes: Superpositing unit 341, division unit 342, audio intensity shrink unit 343, wherein

Superpositing unit 341, for will by the audio-frequency informations of number of people microphone records and treated audio-frequency information into Row linear superposition.

Division unit 342, for dividing the audio signal after 341 linear superposition of superpositing unit according to audio intensity size For at least two audio signal intensity intervals.

Audio intensity shrinks unit 343, each for being marked off respectively to division unit 342 using corresponding shrinkage rates A audio signal intensity interval carries out audio intensity contraction.

Superpositing unit 341 is also used to that at least two audio mixings that 343 audio intensity of unit is shunk will be shunk by audio intensity Signal strength section is overlapped.

Further, as shown in figure 3, the device 30 further include: synthesis module 35, wherein

Synthesis module 35, for by the audio-frequency information and video to be synthesized after 34 stereo process of sound stereo process module Information is synthesized.

The alternatively possible implementation of the embodiment of the present application, as shown in figure 3, synthesis module 35 includes: coding unit 351, synthesis unit 352, wherein

Coding unit 351, for by after stereo process audio-frequency information and video information to be synthesized compile respectively Code, the video information after audio-frequency information and coded treatment after obtaining coded treatment.

Synthesis unit 352, for by the video after the audio-frequency information and coded treatment after 351 coded treatment of coding unit Information is synthesized.

The alternatively possible implementation of the embodiment of the present application, as shown in figure 3, the device 30 further include: third determines Module 38, interleaving block 39, wherein

Third determining module 38, for determining the corresponding video frame rate of video information after encoding.

For the embodiment of the present application, third determining module 38 and the second determining module 36 and the first determining module 32 can be with It can also be respectively different determining modules for the same determining module, can also be same with any one determining module A determining module.The embodiment of the present application without limitation.

Show that third determining module 38, the second determining module 36 and the first determining module 32 are respectively different in Fig. 3 Determining module, but a kind of this mode of Fig. 3 is not limited.

Interleaving block 39, the corresponding video frame rate of encoded video information for being determined based on third determining module 38 Encoded audio information and encoded video information are interleaved, interweave queue after being encoded；

Synthesis module 35 is synthesized specifically for intertexture queue after encoding interleaving block 39.

The embodiment of the present application provides the device of another audio processing, and in the prior art to the audio with Video Composition Information clear stereo ring duplication Ambisonics by way of carry out processing compare, the embodiment of the present application by obtain to Then the audio-frequency information of processing and audio-frequency information by number of people microphone records determine pre- from audio-frequency information to be processed If the audio-frequency information of type, and the audio-frequency information of preset kind is handled by preset plug-in, it then will pass through number of people wheat Audio-frequency information that gram wind is recorded and treated that audio-frequency information carries out sound stereo process.I.e. the embodiment of the present application will be by that will belong to After the audio-frequency information of preset kind is handled by preset plug-in, then with pass through the audio-frequency informations of number of people microphone records It is synthesized, it, can be with due to carrying out recording by number of people microphone and being handled audio-frequency information by preset plug-in The space orientation effect of audio-frequency information is improved, so as to improve the localization of sound sense and spatial impression of audio-frequency information, in turn The audio experience of user, the especially audio experience when watching video can be improved.

The method that audio processing shown in above method embodiment can be performed in the device of the audio processing of the embodiment of the present application, Its realization principle is similar, and details are not described herein again.

The embodiment of the present application provides a kind of electronic equipment, as shown in figure 4, electronic equipment shown in Fig. 4 4000 includes: place Manage device 4001 and memory 4003.Wherein, processor 4001 is connected with memory 4003, is such as connected by bus 4002.It is optional Ground, electronic equipment 4000 can also include transceiver 4004.It should be noted that transceiver 4004 is not limited to one in practical application A, the structure of the electronic equipment 4000 does not constitute the restriction to the embodiment of the present application.

Wherein, processor 4001 is applied in the embodiment of the present application, for realizing Fig. 2 or shown in Fig. 3 acquisition module, the One determining module, plug-in unit processing module, the function of sound stereo process module and/or synthesis module shown in Fig. 3, second determine Module, the function of recording module, third determining module and interleaving block.Transceiver 4004 includes Receiver And Transmitter, is received It sends out device 4004 to be applied in the embodiment of the present application, carries out information exchange for other electronic equipments.

Processor 4001 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystalline substance Body pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described by present disclosure Various illustrative logic blocks, module and circuit.Processor 4001 is also possible to realize the combination of computing function, such as wraps It is combined containing one or more microprocessors, DSP and the combination of microprocessor etc..

Bus 4002 may include an access, and information is transmitted between said modules.Bus 4002 can be pci bus or Eisa bus etc..Bus 4002 can be divided into address bus, data/address bus, control bus etc..Only to be used in Fig. 4 convenient for indicating One thick line indicates, it is not intended that an only bus or a type of bus.

Memory 4003 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM Or the other kinds of dynamic memory of information and instruction can be stored, it is also possible to EEPROM, CD-ROM or other CDs Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium Or other magnetic storage apparatus or can be used in carry or store have instruction or data structure form desired program generation Code and can by any other medium of computer access, but not limited to this.

Memory 4003 is used to store the application code for executing application scheme, and is held by processor 4001 to control Row.Processor 4001 is for executing the application code stored in memory 4003, to realize Fig. 2 or embodiment illustrated in fig. 3 The movement of the device of the audio processing of offer.

The embodiment of the present application provides a kind of electronic equipment, and passes through in the prior art to the audio-frequency information with Video Composition The mode that clear stereo rings duplication Ambisonics carries out processing and compares, and the embodiment of the present application is by obtaining sound to be processed Then frequency information and audio-frequency information by number of people microphone records determine preset kind from audio-frequency information to be processed Audio-frequency information, and the audio-frequency information of preset kind is handled by preset plug-in, it then will pass through number of people microphone records Audio-frequency information and treated audio-frequency information carry out sound stereo process.I.e. the embodiment of the present application will be by that will belong to default class After the audio-frequency information of type is handled by preset plug-in, then closed with the audio-frequency information by number of people microphone records At sound can be improved due to carrying out recording by number of people microphone and handling audio-frequency information by preset plug-in The space orientation effect of frequency information so as to improve the localization of sound sense and spatial impression of audio-frequency information, and then can mention The audio experience of high user, the especially audio experience when watching video.

The embodiment of the present application provides a kind of computer readable storage medium, is stored thereon with computer program, the program The method of audio processing described in above method embodiment is realized when being executed by processor.

The embodiment of the present application provides a kind of computer readable storage medium, and in the prior art to the sound with Video Composition Frequency information carries out processing in such a way that clear stereo rings duplication Ambisonics and compares, and the embodiment of the present application passes through acquisition Then audio-frequency information to be processed and the audio-frequency information by number of people microphone records are determined from audio-frequency information to be processed The audio-frequency information of preset kind, and the audio-frequency information of preset kind is handled by preset plug-in, it then will pass through the number of people The audio-frequency information of microphone records and treated audio-frequency information carry out sound stereo process.I.e. the embodiment of the present application pass through by Belong to preset kind audio-frequency information handled by preset plug-in after, then believe with by the audios of number of people microphone records Breath is synthesized, due to carrying out recording by number of people microphone and being handled by preset plug-in audio-frequency information, To improve the space orientation effect of audio-frequency information, so as to improve the localization of sound sense and spatial impression of audio-frequency information, into And the audio experience of user can be improved, the especially audio experience when watching video.

The embodiment of the present application provides a kind of computer readable storage medium and is suitable for above method any embodiment.Herein It repeats no more.

It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.

The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of method of audio processing characterized by comprising

The audio-frequency information of preset kind is determined from audio-frequency information to be processed, and the audio-frequency information of the preset kind is passed through Preset plug-in is handled；

The audio-frequency information by number of people microphone records and treated audio-frequency information are subjected to sound stereo process.

2. the method according to claim 1, wherein including following at least one in the audio-frequency information to be processed :

3. method according to claim 1 or 2, which is characterized in that obtain audio-frequency information to be processed and pass through the number of people The audio-frequency information of microphone records, before further include:

In audio-frequency information recording process, the Mike used is currently recorded based on the determination of the distance between sound source and each microphone Wind；

4. according to the method described in claim 3, it is characterized in that, described true based on the distance between sound source and each microphone The microphone used is recorded before settled；It carries out recording corresponding audio-frequency information by the microphone determined, comprising:

When detecting that the distance between sound source and number of people microphone meet the first preset condition, the wheat used is currently recorded in determination Gram wind is the number of people microphone, and passes through the corresponding audio-frequency information of the number of people microphone records；

When detecting that the distance between sound source and capacitance microphone meet the second preset condition, the wheat used is currently recorded in determination Gram wind is the capacitance microphone, and records corresponding audio-frequency information by the capacitance microphone.

5. the method according to claim 1, wherein described believe the audio by number of people microphone records Breath and treated audio-frequency information carry out sound stereo process, comprising:

By described in such a way that the audio-frequency information of number of people microphone records and treated audio-frequency information are by linear superposition Carry out sound stereo process.

6. according to the method described in claim 5, it is characterized in that, described believe the audio by number of people microphone records Breath and treated that audio-frequency information carries out sound stereo process by way of linear superposition, comprising:

The audio-frequency information by number of people microphone records and treated audio-frequency information are subjected to linear superposition；

At least two audio signals intensity interval shunk by audio intensity is overlapped；

Wherein, the audio intensity corresponding with audio signal intensity interval of shrinkage rates used by audio signal section is in inverse proportion Relationship.

7. the method according to claim 1, wherein described believe the audio by number of people microphone records Breath and treated audio-frequency information carry out sound stereo process, later further include:

8. the method according to the description of claim 7 is characterized in that by audio-frequency information and video to be synthesized after stereo process Information is synthesized, comprising:

By after the stereo process audio-frequency information and the video information to be synthesized encode respectively, obtain at coding The video information after audio-frequency information and coded treatment after reason；

Video information after audio-frequency information and the coded treatment after the coded treatment is synthesized.

9. according to the method described in claim 8, it is characterized in that, by after the stereo process audio-frequency information and it is described to The video information of synthesis is encoded respectively, the video information after audio-frequency information and coded treatment after obtaining coded treatment, Later further include:

The corresponding video frame rate of video information after determining the coding；

Based on the corresponding video frame rate of the encoded video information to the encoded audio information and the coded video Frequency information is interleaved, and interweave queue after being encoded；

Video information after the audio-frequency information and the coded treatment by after the coded treatment synthesizes, comprising:

Intertexture queue after coding is synthesized.

10. the method according to claim 1, wherein the method also includes:

The preset plug-in is head related transfer function HRTF plug-in unit.

11. a kind of device of audio processing characterized by comprising

First determining module, for determining the sound of preset kind from the audio-frequency information to be processed that the acquisition module is got Frequency information；

The audio-frequency information of plug-in unit processing module, the preset kind for determining first determining module is inserted by default Part is handled；

Sound stereo process module, for the audio-frequency information by number of people microphone records and the plug-in unit to be handled mould Block treated audio-frequency information carries out sound stereo process.

12. a kind of electronic equipment, characterized in that it comprises:

One or more processors；

Memory；

One or more application program, wherein one or more of application programs are stored in the memory and are configured To be executed by one or more of processors, one or more of programs are configured to: being executed according to claim 1~10 The method of described in any item audio processings.

13. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The method of the described in any item audio processings of claim 1-10 is realized when execution.