CN106356067A

CN106356067A - Recording method, device and terminal

Info

Publication number: CN106356067A
Application number: CN201610729168.5A
Authority: CN
Inventors: 潘志刚; 于铎; 谢莹
Original assignee: LeTV Holding Beijing Co Ltd; LeTV Mobile Intelligent Information Technology Beijing Co Ltd
Current assignee: LeTV Holding Beijing Co Ltd; LeTV Mobile Intelligent Information Technology Beijing Co Ltd
Priority date: 2016-08-25
Filing date: 2016-08-25
Publication date: 2017-01-25

Abstract

The embodiment of the invention provides a recording method, a device and a terminal. The method comprises the steps of receiving a plurality of audio data sent by at least two sound sources; determining the sound source direction and/or location of at least two sound sources based on the received audio data; determining at least two target sectors corresponding to at least two of the sound sources and assigning a sector identifier to each target sector; generating at least one audio file which contains the corresponding relationship between the audio data and the sector identifier. The method provided by the embodiment of the invention is capable of identifying the sector according to the voice of the audio data, setting a sector identifier for each audio data collected by the sound collection device, and then generating at least one audio file which contains the corresponding relationship between the audio data and the sector identifier so that it is possible to easily acquire the audio data corresponding to the sector identifier according to a sector identifier,thereby simplifying the sound content acquisition flow, saving time, and improving the efficiency.

Description

The way of recording, device and terminal

Technical field

The present invention relates to field of audio processing, more particularly, to a kind of way of recording, device and terminal.

Background technology

Recording is that by Mike, amplifier, voice data is converted to the signal of telecommunication, with different materials and technique record Process on medium.Currently, in the recording file obtaining after recording, all that Mike in Recording Process receives can be recorded The voice data of sound object, for example: in conference process, session recording can record the voice letter of all spokesmans participating in meeting Number, and, noise that limb action of participant etc. sends etc..

Inventor finds during realizing the embodiment of the present invention, receives due to can record Mike in recording file Multiple spokesmans in the voice signal of different time sections, and, the voice of each spokesman is very difficult to distinguish by human ear, because This, when wanting targetedly to obtain the speech content specifying spokesman in recording file it may be necessary to playback repeatedly File, leads to energy of losing time, and efficiency is low.

Content of the invention

For overcoming problem present in correlation technique, the present invention provides a kind of way of recording, device and terminal.

According to embodiments of the present invention in a first aspect, providing a kind of way of recording, comprising:

Receive multiple voice datas that at least two sound sources send；

Determine the sound source of each sound source in described at least two sound sources according to received the plurality of voice data Direction and/or position；

According to determined by the Sounnd source direction of each sound source in described at least two sound sources and/or position, determine and institute State one-to-one at least two target sector of at least two sound sources, and each at least two target sector determined by being Sector mark is distributed in target sector；

Generate at least one audio file comprising described voice data and the corresponding relation of described sector mark.

Alternatively, described at least two target sector do not overlap each other, and each target sector only covers corresponding sound source Sounnd source direction and/or position.

Alternatively, methods described also includes:

Obtain the voice data with common sector mark；

Extract the vocal print feature in described voice data；

According to described vocal print feature, judge whether the voice data in described target sector is derived from same sound source；

When the voice data in described target sector is not derived from same sound source, it is to be derived from not in unison in described target sector The voice data in source is respectively provided with different sound source marks.

Alternatively, described generation comprises at least one audio frequency of described voice data and the corresponding relation of described sector mark File, comprising:

Generate the first audio file, wherein, the multiple voice datas in described first audio file are according to acquisition time Sequencing sorts, and each voice data in the plurality of voice data is respectively provided with corresponding sector mark.

Alternatively, described generation comprises at least one audio frequency literary composition of described voice data and the corresponding relation of sector mark Part, also includes:

Generate at least two second audio files, wherein, each described second audio file has same fan for preservation The voice data of area's mark.

Alternatively, multiple voice datas that described reception at least two sound sources send, comprising:

Obtain the acoustic information of the voice data of each sound collection equipment collection；

Sound collection equipment based on the nearest sound collection equipment of sound source position is determined according to described acoustic information, really Sound collection equipment supplemented by fixed sound collection equipment in addition to described master voice collecting device；

Determine the main audio data of described master voice collecting device collection, determine the auxiliary of described auxiliary sound collection equipment collection Voice data；

The antiphase of described main audio data and described auxiliary voice data is carried out Phase Stacking, obtains sound source data,

Determine the voice data of the sound source that described sound source data is described sound collection equipment collection.

Second aspect according to embodiments of the present invention, provides a kind of recording device, is applied to comprise multiple sound collections set Standby terminal, comprising:

Receiver module, for receiving multiple voice datas that at least two sound sources send；

First determining module, for determining in described at least two sound sources according to received the plurality of voice data The Sounnd source direction of each sound source and/or position；

Second determining module, for the Sounnd source direction of each sound source at least two sound sources described determined by basis And/or position, determine one-to-one at least two target sector with described at least two sound sources, and at least two determined by being Each target sector distribution sector mark in individual target sector；

Generation module, for generating at least one sound comprising described voice data and the corresponding relation of described sector mark Frequency file.

Alternatively, the second determining module, is additionally operable to, and described at least two target sector do not overlap each other, and each target is fanned Area only covers Sounnd source direction and/or the position of corresponding sound source.

Alternatively, described device also includes:

Acquisition module, for obtaining the voice data with common sector mark；

Extraction module, for extracting the vocal print feature in described voice data；

Judge module, for according to described vocal print feature, judging whether the voice data in described target sector is derived from same One sound source；

Setup module, for when the voice data in described target sector is not derived from same sound source, being described target fan The voice data being derived from different sound sources in area is respectively provided with different sound source marks.

Alternatively, described generation module is used for:

Alternatively, described generation module is additionally operable to:

Alternatively, the distance between the sound collection equipment described in any two in the plurality of sound collection equipment is more than Predeterminable range, described receiver module, comprising:

Acquisition submodule, for obtaining the acoustic information of the voice data of each sound collection equipment collection；

Determination sub-module, for determining based on the nearest sound collection equipment of sound source position according to described acoustic information Sound collection equipment, determines sound collection equipment supplemented by the sound collection equipment in addition to described master voice collecting device；

First determination sub-module, for determining the main audio data of described master voice collecting device collection, determines described auxiliary The auxiliary voice data of sound collection equipment collection；

Superposition submodule, for by the Phase Stacking of the antiphase of described main audio data and described auxiliary voice data, obtaining To sound source data；

3rd determination sub-module, for determining the audio frequency that described sound source data is the sound source that described sound collection equipment gathers Data.

The third aspect according to embodiments of the present invention, provides a kind of terminal, and described terminal includes:

Processor；

For storing the memorizer of processor executable；

Wherein, described processor is configured to:

Receive multiple voice datas that at least two sound sources send；

Fourth aspect according to embodiments of the present invention, also provides a kind of computer-readable storage medium, wherein, this Computer Storage Medium can have program stored therein, and can achieve that first aspect present invention provides a kind of each implementation of way of recording during this program performing In part or all of step.

The technical scheme that embodiments of the invention provide can include following beneficial effect:

The present invention first passes through and receives multiple voice datas of sending of at least two sound sources, according to received described many Individual voice data determines the Sounnd source direction of each sound source and/or position in described at least two sound sources；And then determine with described Each mesh in one-to-one at least two target sector of at least two sound sources, and at least two target sector determined by being Sector mark is distributed in mark sector, ultimately produces the corresponding relation comprising described voice data and described sector mark at least one Audio file.

In the method provided in an embodiment of the present invention, can voice recognition sector according to belonging to voice data, by sound Multiple voice datas of collecting device collection are respectively provided with sector mark, then generate and comprise described voice data and described sector At least one audio file of the corresponding relation of mark, so can be easy to obtain this sector mark pair according to a certain sector mark The voice data answered, can simplify sound-content and obtain flow process, time-consuming, improve efficiency.

It should be appreciated that above general description and detailed description hereinafter are only exemplary and explanatory, not The present invention can be limited.

Brief description

Accompanying drawing herein is merged in description and constitutes the part of this specification, shows the enforcement meeting the present invention Example, and be used for explaining the principle of the present invention together with description.

Fig. 1 is a kind of flow chart of the way of recording according to an exemplary embodiment；

Fig. 2 is a kind of another kind of flow chart of the way of recording according to an exemplary embodiment；

Fig. 3 is the flow chart of step s101 in Fig. 1；

Fig. 4 is a kind of a kind of structure chart of the recording device according to an exemplary embodiment；

Fig. 5 is a kind of another kind of structure chart of the recording device according to an exemplary embodiment；

Fig. 6 is a kind of block diagram of the terminal according to an exemplary embodiment.

Specific embodiment

Here will in detail exemplary embodiment be illustrated, its example is illustrated in the accompanying drawings.Explained below is related to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the present invention.On the contrary, they be only with such as appended The example of the consistent apparatus and method of some aspects being described in detail in claims, the present invention.

Due to the voice signal in different time sections for the multiple spokesmans that Mike receives can be recorded in recording file, and And, the voice of each spokesman is very difficult to identify by human ear, therefore, specifies wanting targetedly to obtain in recording file It may be necessary to playback file repeatedly during the speech content of spokesman, lead to energy of losing time, efficiency is low, for this reason, as schemed Shown in 1, in one embodiment of the invention, provide a kind of way of recording, be applied to comprise the end of multiple sound collection equipment End, the quantity of this sound collection equipment can be 3,4 or 5 etc., appointing in the plurality of sound collection equipment The distance between two described sound collection equipment of meaning can be more than predeterminable range, and here presetting at distance can be more than or equal to 30 milli Rice, for example: 30 millimeters, 35 millimeters or 40 millimeters etc., specifically can be determined according to the actual size of terminal, methods described bag Include following steps.

In step s101, receive multiple voice datas that at least two sound sources send.

In embodiments of the present invention, voice data can refer to all audio frequency numbers that sound collection equipment gathers in working order According to voice data here can be the acoustical signal that multi-acoustical sends, such as: voice signal that people speaks, limb action The acoustical signal of object collision leading to and the noise of indoor environment etc., each sound collection equipment can gather its pickup to be had Voice data in the range of effect.

In this step, after sound collection equipment collects voice data, the voice data collecting can be sent to Processor in terminal, processor receives the voice data of multiple sound collection equipment collections.

In step s102, determined every in described at least two sound sources according to received the plurality of voice data The Sounnd source direction of individual sound source and/or position.

In this step, centered on terminal, due in sound collection equipment pickup effective range, the sending out of any point The sound going out reach each sound collection equipment when along, loudness different with phase place it is possible to according to the multiple sounds receiving Frequency is according to the Sounnd source direction determining each sound source and/or position.

In step s103, according to determined by each sound source in described at least two sound sources Sounnd source direction and/or Position, determines one-to-one at least two target sector with described at least two sound sources, and at least two mesh determined by being Each target sector distribution sector mark in mark sector.

In embodiments of the present invention, effective pickup scope of sound collection equipment can abstract be a 2d plane, and In advance 2d plane can be averagely divided into several preset sound identification sector, for example, it is possible to 2d plane is averagely divided into 4 Individual preset sound identifies sector, is divided into 6 preset sound identification sectors or is divided into 8 preset sound identification sectors etc. Deng.

In this step, can know the preset sound according to belonging to Sounnd source direction and/or position determine each voice data Other sector, the preset sound identification sector of the Sounnd source direction and/or position that will be covered with voice data is defined as target sector, institute State at least two target sector not overlapping each other, each target sector only covers Sounnd source direction and/or the position of corresponding sound source Put, sector mark, such as a, b or c etc. can be distributed for each target sector.

For example, when audio collecting device collects 3 voice datas 1, voice data 2 and voice data 3 simultaneously, then permissible Determine the sound source position of voice data 1, voice data 2 and voice data 3 first, effective pickup scope is divided into terminal Centered on 4 preset sound identify sectors (corresponding sector mark is respectively a, b, c and d) as a example it is assumed that voice data 1 Sound source position is located at the corresponding preset sound of a and identifies that sector, voice data 2 and voice data 3 are located at the corresponding preset sound of c and know Other sector it may be determined that a corresponding preset sound identification sector and c corresponding preset sound identification sector be target sector, this The corresponding sector mark of sample voice data 1 is a, and the corresponding sector mark of voice data 2 is c, and the corresponding sector of voice data 3 is marked Know for c etc..

In step s104, generate at least one sound comprising described voice data and the corresponding relation of described sector mark Frequency file.

In this step, an audio file can be generated, the multiple voice datas in this audio file are according to during collection Between sequencing sequence, each voice data is respectively with its corresponding sector mark labelling；And/or, generate at least two sounds Frequency file, comprises at least one voice data with common sector mark in each described second audio file.

Due in actual applications, two sound sources or more in same preset sound identification sector, may be comprised, or When multiple spokesman are in same orientation, in same voice recognition sector, the voice data of each sound source is still difficult to by human ear Distinguish, for this reason, as shown in Fig. 2 in another embodiment of the present invention, can be further discriminated between by the way of vocal print, described side Method is further comprising the steps of.

In step s201, obtain the voice data with common sector mark.

In this step, the sector mark that can be directed to each target sector searches its corresponding voice data, for example, can So that voice data 1 is found according to sector mark " a ", voice data 2 and voice data 3 are found according to sector mark " c ".

In step s202, extract the vocal print feature in described voice data.

In this step, can be to extract the vocal print feature in voice data using modes such as sound groove recognition technology in es.

In step s203, according to described vocal print feature, judge whether the voice data in described target sector is derived from same One sound source.

In this step, because the vocal print of different sound sources is different it is possible to according to vocal print feature, determine that target is fanned Whether the voice data in area is not derived from same sound source, when the vocal print of the voice data in target sector is different it may be determined that Voice data in target sector is not derived from same sound source.

When the voice data in described target sector is not derived from same sound source, in step s204, it is described target fan The voice data being derived from different sound sources in area is respectively provided with different sound source marks.

In this step, a sound source mark can be respectively provided with for each voice data in target sector, for example, (1), (2) or (3) etc. are it is assumed that the sector mark of this target sector is c it is assumed that arbitrary voice data is the corresponding preset sound of c In identification region, (1) bugle call source sends, then the sound source mark of this voice data could be arranged to c (1) etc..

The present invention passes through to obtain the voice data with common sector mark first, then extracts in described voice data Vocal print feature, further according to described vocal print feature, judges whether the voice data in described target sector is derived from same sound source, works as institute When stating the voice data in target sector and not being derived from same sound source, can be the voice data of each sound source in described target sector It is respectively provided with sound source mark.

The method provided in an embodiment of the present invention, can same preset sound identification sector in comprise two sound sources or More, or when multiple spokesman is in same orientation, same voice recognition sector can be distinguished by way of Application on Voiceprint Recognition The voice data of middle multi-acoustical, and the voice data being derived from different sound sources for each arranges different sound source marks, such energy Enough it is easy to obtain the corresponding voice data of this sector mark according to a certain sector mark, sound-content can be simplified and obtain flow process, Time-consuming, improve efficiency.

In another embodiment of the present invention, described step s104 includes:

In this step, first audio file comprising multiple voice datas can be generated, in the first audio file In, each voice data is respectively provided with the label of sector mark, facilitates user's subsequent query.

In another embodiment of the present invention, described step s104 also includes:

In this step, each sector mark can be directed to, generate an audio file respectively, for example, it is possible to will have The voice data 2 of identical sector mark " c " and voice data 3, generate an audio file, will have sector mark " a " Voice data 1 generates audio file etc..

In actual applications, the voice data that sound collection equipment collects can comprise a lot of ambient sound data, for example, Environment noise etc., but due to the sound sending of any one sound source reach the time delay of each sound collection equipment, loudness and/or Phase place is different, in order to get the voice data of the high-quality of different sound sources, as shown in figure 3, in the present invention again In one embodiment, described step s101, comprise the following steps.

In step s301, obtain the acoustic information of the voice data of each sound collection equipment collection.

In embodiments of the present invention, acoustic information can refer to time delay, loudness and/or phase place of voice data etc..

In this step, the time delay of voice data, loudness and/or the phase place of the reception of each sound collection equipment can be extracted Deng acoustic information.

In step s302, sound based on the nearest sound collection equipment of sound source position is determined according to described acoustic information Sound collecting device, determines sound collection equipment supplemented by the sound collection equipment in addition to described master voice collecting device.

In this step, can be determined apart from the nearest sound collection equipment of sound source position by contrasting loudness and time delay, And this is defined as master voice collecting device apart from the nearest sound collection equipment of sound source position, other sound in terminal are adopted Sound collection equipment supplemented by the determination of collection equipment.

In step s303, determine the main audio data of described master voice collecting device collection, determine that described auxiliary sound is adopted The auxiliary voice data of collection equipment collection.

In embodiments of the present invention, comprise in described main audio data, and all include in auxiliary voice data sound source data and Ambient sound data.The acoustic energy of auxiliary voice data can be judged to ambient sound (noise or non-principal sound source sound), The acoustic energy of main audio data is judged to main sound source sound+ambient sound.

In step s304, the antiphase of described main audio data and described auxiliary voice data is carried out Phase Stacking, obtains To sound source data.

In embodiments of the present invention, because ambient sound concentrates on low frequency, main audio data has the feature energy of medium-high frequency Amount, therefore, it can in this, as the foundation distinguishing source data and ambient sound, and because ambient sound is adopted for all sound For collection equipment, energy is essentially identical, therefore can be by reversely (phase place of auxiliary voice data is assumed auxiliary voice data Phase place be 0 degree, then the phase place after reversely is 180 degree), be added with the acoustic energy of main audio data and offset, so Ensure that the sound filtering other noise sources only obtains the sound source data that sound source sends.

After in this step, sound can be made by correcting modes such as Filtering Processing, stable state de-noising and unstable state energy compensatings The energy of source data is fully supplemented, and so that noise and ambient sound is weakened enough, the signal to noise ratio of lifting recording.

In step s305, determine the voice data of the sound source that described sound source data is described sound collection equipment collection.

In this step, the sound source data obtaining can be defined as the voice data of sound collection equipment collection.

As shown in figure 4, in another embodiment of the present invention, providing a kind of recording device, it is applied to comprise multiple sound The terminal of collecting device, comprising: receiver module 41, the first determining module 42, the second determining module 43 and generation module 44.

Receiver module 41, for receiving multiple voice datas that at least two sound sources send.

First determining module 42, for determining described at least two sound sources according to received the plurality of voice data In the Sounnd source direction of each sound source and/or position.

Second determining module 43, for the Sounnd source direction of each sound source at least two sound sources described determined by basis And/or position, determine one-to-one at least two target sector with described at least two sound sources, and at least two determined by being Each target sector distribution sector mark in individual target sector.

Generation module 44, for generate the corresponding relation comprising described voice data and described sector mark at least one Audio file.

In another embodiment of the present invention, the second determining module, it is additionally operable to, described at least two target sector are each other not Overlap, each target sector only covers Sounnd source direction and/or the position of corresponding sound source.

As shown in figure 5, in another embodiment of the present invention, described device also includes: acquisition module 51, extraction module 52nd, judge module 53 and setup module 54.

Acquisition module 51, for obtaining the voice data with common sector mark.

Extraction module 52, for extracting the vocal print feature in described voice data.

Judge module 53, for according to described vocal print feature, judging whether the voice data in described target sector is derived from Same sound source.

Setup module 54, for when the voice data in described target sector is not derived from same sound source, being described target The voice data being derived from different sound sources in sector is respectively provided with different sound source marks.

In another embodiment of the present invention, described generation module is used for:

In another embodiment of the present invention, described generation module is additionally operable to:

In another embodiment of the present invention, sound collection described in any two in the plurality of sound collection equipment sets The distance between standby it is more than predeterminable range, described receiver module, comprising: acquisition submodule, determination sub-module, the first determination submodule Block, superposition submodule and the 3rd determination sub-module.

Fig. 6 is a kind of block diagram of the application program erecting device according to an exemplary embodiment.With reference to Fig. 6, this dress Put including:

Processor 21；

For storing the memorizer 22 of processor 21 executable instruction；

Wherein, described processor 21 is configured to:

Receive multiple voice datas that at least two sound sources send；

The embodiment of the present invention also provides a kind of computer-readable storage medium, and wherein, this computer-readable storage medium can be stored with journey Sequence, can achieve the part or complete in each implementation of the way of recording that Fig. 1-embodiment illustrated in fig. 3 provides during this program performing Portion's step.

Those skilled in the art, after considering description and putting into practice invention disclosed herein, will readily occur to its of the present invention Its embodiment.The application is intended to any modification, purposes or the adaptations of the present invention, these modifications, purposes or Person's adaptations are followed the general principle of the present invention and are included the undocumented common knowledge in the art of the present invention Or conventional techniques.Description and embodiments are considered only as exemplary, and true scope and spirit of the invention are by appended Claim is pointed out.

It is described above and precision architecture illustrated in the accompanying drawings it should be appreciated that the invention is not limited in, and And various modifications and changes can carried out without departing from the scope.The scope of the present invention only to be limited by appended claim.

Claims

1. a kind of way of recording is it is characterised in that include:

Receive multiple voice datas that at least two sound sources send；

Determine the Sounnd source direction of each sound source in described at least two sound sources according to received the plurality of voice data And/or position；

According to determined by the Sounnd source direction of each sound source in described at least two sound sources and/or position, determine with described extremely Each target in one-to-one at least two target sector of few two sound sources, and at least two target sector determined by being Sector mark is distributed in sector；

2. it is characterised in that wherein, described at least two target sector do not overlap each other method according to claim 1, Each target sector only covers Sounnd source direction and/or the position of corresponding sound source.

3. method according to claim 1 is it is characterised in that methods described also includes:

Obtain the voice data with common sector mark；

Extract the vocal print feature in described voice data；

When the voice data in described target sector is not derived from same sound source, it is in described target sector, to be derived from different sound sources Voice data is respectively provided with different sound source marks.

4. method according to claim 1 is it is characterised in that described generation comprises described voice data and described sector mark At least one audio file of the corresponding relation known, comprising:

Generate the first audio file, wherein, multiple voice datas in described first audio file are according to the priority of acquisition time Order sorts, and each voice data in the plurality of voice data is respectively provided with corresponding sector mark.

5. method according to claim 1 is it is characterised in that described generation comprises described voice data and sector mark At least one audio file of corresponding relation, also includes:

Generate at least two second audio files, wherein, each described second audio file has common sector mark for preservation The voice data known.

6. method according to claim 1 is it is characterised in that multiple audio frequency numbers of sending of described reception at least two sound source According to, comprising:

Sound collection equipment based on the nearest sound collection equipment of sound source position is determined according to described acoustic information, determination removes Sound collection equipment supplemented by sound collection equipment outside described master voice collecting device；

Determine the main audio data of described master voice collecting device collection, determine the consonant frequency of described auxiliary sound collection equipment collection Data；

The antiphase of described main audio data and described auxiliary voice data is carried out Phase Stacking, obtains sound source data；

7. a kind of recording device is it is characterised in that be applied to comprise the terminal of multiple sound collection equipment, comprising:

First determining module, every in described at least two sound sources for being determined according to received the plurality of voice data The Sounnd source direction of individual sound source and/or position；

Second determining module, the Sounnd source direction for each sound source at least two sound sources described determined by basis and/or Position, determines one-to-one at least two target sector with described at least two sound sources, and at least two mesh determined by being Each target sector distribution sector mark in mark sector；

Generation module, for generating at least one the audio frequency literary composition comprising described voice data and the corresponding relation of described sector mark Part.

8. device according to claim 7, it is characterised in that the second determining module, is additionally operable to, described at least two targets Sector does not overlap each other, and each target sector only covers Sounnd source direction and/or the position of corresponding sound source.

9. device according to claim 7 is it is characterised in that described device also includes:

Acquisition module, for obtaining the voice data with common sector mark；

Judge module, for according to described vocal print feature, judging whether the voice data in described target sector is derived from same sound Source；

Setup module, for when the voice data in described target sector is not derived from same sound source, being in described target sector It is respectively provided with different sound source marks from the voice data of different sound sources.

10. device according to claim 7 is it is characterised in that described generation module is used for:

11. devices according to claim 7 are it is characterised in that described generation module is additionally operable to:

12. devices according to claim 7 are it is characterised in that described receiver module, comprising:

According to described acoustic information, determination sub-module, for determining that apart from the nearest sound collection equipment of sound source position be master voice Collecting device, determines sound collection equipment supplemented by the sound collection equipment in addition to described master voice collecting device；

First determination sub-module, for determining the main audio data of described master voice collecting device collection, determines described auxiliary sound The auxiliary voice data of collecting device collection；

Superposition submodule, for by the Phase Stacking of the antiphase of described main audio data and described auxiliary voice data, obtaining sound Source data；

3rd determination sub-module, for determining the audio frequency number that described sound source data is the sound source that described sound collection equipment gathers According to.

A kind of 13. terminals are it is characterised in that described terminal includes:

Processor；

For storing the memorizer of processor executable；

Wherein, described processor is configured to:

Receive multiple voice datas that at least two sound sources send；