CN106847300A

CN106847300A - A kind of voice data processing method and device

Info

Publication number: CN106847300A
Application number: CN201710123431.0A
Authority: CN
Inventors: 禹业茂; 皮慧斌; 王金宝
Original assignee: Beijing Zed-3 Technology Co Ltd
Current assignee: Beijing Zed-3 Technology Co Ltd
Priority date: 2017-03-03
Filing date: 2017-03-03
Publication date: 2017-06-13
Anticipated expiration: 2037-03-03
Also published as: CN106847300B

Abstract

The invention provides a kind of voice data processing method and device, the method is applied to each intercommunication receiving end in intercom system, including：When the speech data that intercommunication originator sends is received, the current environment speech data for monitoring collection in advance is transferred, the duration of current environment speech data is not less than default network delay；The cross correlation of speech data and current environment speech data is calculated, and judges cross correlation whether less than threshold value；When cross correlation is less than threshold value, speech data is played；When cross correlation is not less than threshold value, speech data is abandoned.Avoiding problems due to intercommunication originator and intercommunication receiving end hypotelorism, form phonological loop and produce self-excitation, reach the purpose that removal is uttered long and high-pitched sounds.

Description

A kind of voice data processing method and device

Technical field

The present invention relates to intercom technical field, more specifically to a kind of voice data processing method and device.

Background technology

Intercom system is mainly used in the industries such as public security, transport, building and service, for the contact between member of community and finger Wave scheduling.

At present, each intercommunication end in intercom system is led to using the mode of PTT (push-to-talk, PTT) Words, are can only at most the presence of an intercommunication that can generate voice messaging in the one-way voice of half-duplex mode, i.e. intercom system Originator, and other intercommunication ends then receive the voice messaging of intercommunication originator to realize the communication of intercom system as intercommunication receiving end.

The intercom system that is communicated using half-duplex one-way voice is not had self-excitation typically and uttered long and high-pitched sounds, but, when During the hypotelorism of intercommunication originator and intercommunication receiving end, the voice that intercommunication receiving end is played is possible to feed back to intercommunication originator, so that Phonological loop is formed, causes self-excitation, and then cause generation to be uttered long and high-pitched sounds.

The content of the invention

In view of this, the present invention provides a kind of voice data processing method and device, with solving existing intercom system The problem of uttering long and high-pitched sounds being likely to occur.Technical scheme is as follows：

A kind of voice data processing method, is applied to each intercommunication receiving end in intercom system, including：

When the speech data that intercommunication originator sends is received, the current environment speech data for monitoring collection in advance is transferred, The duration of the current environment speech data is not less than default network delay；

The cross correlation of the speech data and the current environment speech data is calculated, and judges that the cross correlation is It is no less than threshold value；

When the cross correlation is less than the threshold value, the speech data is played；

When the cross correlation is not less than the threshold value, the speech data is abandoned.

Preferably, the advance monitoring gathers current environment speech data, including：

The storage region for storing setting duration speech data is distributed, the setting duration is not less than the default network Time delay；

Start microphone and Real-time Collection environment speech data, when the duration of the environment speech data is less than the setting It is long；

The environment speech data is stored in the storage region successively；

Speech data in the storage region according to the setting duration real-time update, and will work as in the storage region Preceding whole environment speech data is defined as current environment speech data.

Preferably, the cross correlation for calculating the speech data and the current environment speech data, including：

At least one target environment speech data, the target environment voice are chosen from the current environment speech data The duration of data is equal to the duration of the speech data；

The cross correlation of each described target environment speech data and the speech data is calculated respectively；

The maximum in each described cross correlation is chosen as the speech data and the current environment speech data Cross correlation.

A kind of voice data processing apparatus, including：Module, calculating judge module, voice playing module and voice is transferred to lose Module is abandoned, the module of transferring includes monitoring collecting unit；

The monitoring collecting unit, for monitoring collection current environment speech data in advance；

It is described to transfer module, for when the speech data that intercommunication originator sends is received, transferring and monitoring collection in advance Current environment speech data, the duration of the current environment speech data is not less than default network delay；

The calculating judge module, the cross-correlation for calculating the speech data and the current environment speech data Property, and judge the cross correlation whether less than threshold value；

The voice playing module, for when the cross correlation is less than the threshold value, playing the speech data；

The voice discard module, for when the cross correlation is not less than the threshold value, abandoning the speech data.

Preferably, the monitoring collecting unit, specifically for：

The storage region for storing setting duration speech data is distributed, the setting duration is not less than the default network Time delay；Start microphone and Real-time Collection environment speech data, the duration of the environment speech data is less than the setting duration； The environment speech data is stored in the storage region successively；The storage region according to the setting duration real-time update Interior speech data, and current whole environment speech data in the storage region is defined as current environment speech data.

Preferably, the calculating judgement for calculating the speech data and the cross correlation of the current environment speech data Module, specifically for：

At least one target environment speech data, the target environment voice are chosen from the current environment speech data The duration of data is equal to the duration of the speech data；Each described target environment speech data and the voice number are calculated respectively According to cross correlation；The maximum in each described cross correlation is chosen as the speech data and the current environment voice The cross correlation of data.

Compared to prior art, what the present invention was realized has the beneficial effect that：

A kind of voice data processing method and device that the above present invention is provided, the method are applied in intercom system each Intercommunication receiving end, by the calculating intercommunication originator speech data for sending for receiving and the current environment voice number for monitoring collection in advance According to cross correlation；By judge cross correlation whether less than threshold value come determine with intercommunication start distance whether exceed apart from threshold Value；When cross correlation exceedes distance threshold less than the distance that threshold value is namely started with intercommunication, speech data is played；When mutual When closing property is no more than distance threshold not less than the distance that threshold value is namely started with intercommunication, speech data is abandoned.Avoiding problems Due to intercommunication originator and intercommunication receiving end hypotelorism, form phonological loop and produce self-excitation, reach the purpose that removal is uttered long and high-pitched sounds.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis The accompanying drawing of offer obtains other accompanying drawings.

Fig. 1 is a kind of voice data processing method flow chart disclosed in the embodiment of the present invention one；

Fig. 2 is a kind of voice data processing method partial process view disclosed in the embodiment of the present invention two；

Fig. 3 is another voice data processing method partial process view disclosed in the embodiment of the present invention two；

Fig. 4 is a kind of voice data processing apparatus structural representation disclosed in the embodiment of the present invention three.

Specific embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.

Embodiment one

The embodiment of the present invention one discloses a kind of voice data processing method, and each intercommunication is received in being applied to intercom system End, flow chart is as shown in figure 1, comprise the following steps：

S101, when the speech data that intercommunication originator sends is received, transfers the current environment voice for monitoring collection in advance Data, the duration of current environment speech data is not less than default network delay；

During step S101 is performed, intercommunication receiving end opens phonetic incepting thread and microphone collecting thread simultaneously, Wherein microphone collecting thread is used to gather the speech data of intercommunication receiving end surrounding environment, due to generating voice number from intercommunication originator There is certain network delay according to speech data is received to intercommunication receiving end, therefore, intercommunication receiving end need to be transferred monitors collection in advance Current environment speech data, and the duration of current environment speech data is not less than default network delay.

S102, calculates the cross correlation of speech data and current environment speech data, and judges whether cross correlation is less than Threshold value；

During step S102 is performed, speech data and current environment speech data can be calculated according to cross-correlation function Cross correlation, the cross correlation is used to characterize the degree of correlation of speech data and current environment speech data, also, with it is right Say that the distance of originator is more remote, cross correlation is also just smaller, therefore, can be by calculating the distance that cross correlation judges and intercommunication is started Whether distance threshold is exceeded.

S103, when cross correlation is less than threshold value, plays speech data；

S104, when cross correlation is not less than threshold value, abandons speech data.

It should be noted that when cross correlation is less than threshold value, to ensure the intercommunication Experience Degree of user, can also be according to advance The cross correlation of setting and the mapping relations of level of sound volume, speech data is played with corresponding volume, but, cross correlation and sound The mapping relations for measuring grade should ensure that the voice of broadcasting will not feed back to intercommunication originator, so that phonological loop will not be formed, also Will not produce and utter long and high-pitched sounds.

A kind of voice data processing method disclosed in the embodiment of the present invention, the intercommunication received by calculating starts what is sent Speech data and the cross correlation for monitoring the current environment speech data for gathering in advance；By judging cross correlation whether less than threshold Value come determine with intercommunication start distance whether exceed distance threshold；When cross correlation is less than what threshold value was namely started with intercommunication When distance exceedes distance threshold, speech data is played；When cross correlation not less than threshold value namely with intercommunication start distance not During more than distance threshold, speech data is abandoned.Avoiding problems due to intercommunication originator and intercommunication receiving end hypotelorism, voice is formed Loop and produce self-excitation, reach the purpose uttered long and high-pitched sounds of removal.

Embodiment two

The voice data processing method with reference to disclosed in the embodiments of the present invention one, as illustrated in FIG. 1 the step of S101 in, The specific implementation procedure of collection current environment speech data is monitored in advance, as shown in Fig. 2 comprising the following steps：

S201, distributes the storage region for storing setting duration speech data, and setting duration is not less than the default net Network time delay；

S202, starts microphone and Real-time Collection environment speech data, and the duration of environment speech data is less than setting duration；

During step S202 is performed, Real-time Collection is presently in the environment speech data of position, for example, each time Gather the environment speech data of 20ms.

S203, storage region is stored in by environment speech data successively；

During step S203 is performed, for example, the environment speech data of the 20ms that will be gathered each time is according to collection The sequencing at time point is stored in storage region successively.

S204, according to the speech data in setting duration real-time update storage region, and will be current whole in storage region Environment speech data be defined as current environment speech data；

During step S204 is performed, because the duration of the storable speech data of storage region is setting, when The environment speech data for gathering each time is stored in during to storage region, and intercommunication receiving end judges environment languages whole in storage region Sound data duration with setting duration difference whether be more than 0, if so, by storage time it is earliest when a length of difference environment voice Data are deleted, and current whole environment speech data in storage region is defined as into current environment speech data；

For example, a length of 500ms during setting, intercommunication receiving end gathers the environment speech data of 20ms each time, also, when will most When the environment speech data storage of the nearly 20ms for once gathering is to storage region, in storage region whole environment speech datas when The difference with 500ms long is 20ms, then delete the storage region memory storage time it is earliest when a length of 20ms environment speech data, The speech data in storage region is updated with this.

A kind of voice data processing method disclosed in the embodiment of the present invention, the intercommunication received by calculating starts what is sent Speech data and the cross correlation for monitoring the current environment speech data for gathering in advance；By judging cross correlation whether less than threshold Value come determine with intercommunication start distance whether exceed distance threshold；When cross correlation is less than threshold value, speech data is played；When When cross correlation is not less than threshold value, speech data is abandoned.Avoiding problems due to intercommunication originator and intercommunication receiving end hypotelorism, shape Self-excitation is produced into phonological loop, the purpose that removal is uttered long and high-pitched sounds is reached.

The voice data processing method with reference to disclosed in the embodiments of the present invention one, as illustrated in FIG. 1 the step of S102 in The specific implementation procedure of the speech data and the cross correlation of the current environment speech data is calculated, as shown in figure 3, including Following steps：

S301, chooses at least one target environment speech data, target environment voice number from current environment speech data According to duration be equal to speech data duration；

S302, calculates the cross correlation of each target environment speech data and speech data respectively；

S303, chooses the maximum in each cross correlation as speech data and the cross-correlation of current environment speech data Property.

Embodiment three

Based on voice data processing method disclosed in the various embodiments described above, the embodiment of the present invention three is then corresponded on open execution The device of voice data processing method is stated, as shown in figure 4, voice data processing apparatus 100 include：Module 101, calculating is transferred to sentence Disconnected module 102, voice playing module 103 and voice discard module 104, transferring module 101 includes monitoring collecting unit 1011；

Collecting unit 1011 is monitored, for monitoring collection current environment speech data in advance；

Module 101 is transferred, for when the speech data that intercommunication originator sends is received, transferring and monitoring working as collection in advance Preceding environment speech data, the duration of current environment speech data is not less than default network delay；

Judge module 102, the cross correlation for calculating speech data and current environment speech data are calculated, and judges mutual Whether correlation is less than threshold value；

Voice playing module 103, for when cross correlation is less than threshold value, playing speech data；

Voice discard module 104, for when cross correlation is not less than threshold value, abandoning speech data.

It should be noted that collecting unit 1011 is monitored, specifically for：

The storage region for storing setting duration speech data is distributed, setting duration is not less than default network delay；Open Dynamic microphone and Real-time Collection environment speech data, the duration of environment speech data is less than setting duration；By environment speech data Storage region is stored in successively；According to the speech data in setting duration real-time update storage region, and will work as in storage region Preceding whole environment speech data is defined as current environment speech data.

Also, it should be noted that calculate speech data judges mould with the calculating of the cross correlation of current environment speech data Block, specifically for：

Choose at least one target environment speech data from current environment speech data, target environment speech data when Duration equal to speech data long；The cross correlation of each target environment speech data and speech data is calculated respectively；Choose each Maximum in individual cross correlation is used as speech data and the cross correlation of current environment speech data.

A kind of voice data processing apparatus disclosed in the embodiment of the present invention, the intercommunication received by calculating starts what is sent Speech data and the cross correlation for monitoring the current environment speech data for gathering in advance；By judging cross correlation whether less than threshold Value come determine with intercommunication start distance whether exceed distance threshold；When cross correlation is less than threshold value, speech data is played；When When cross correlation is not less than threshold value, speech data is abandoned.Avoiding problems due to intercommunication originator and intercommunication receiving end hypotelorism, shape Self-excitation is produced into phonological loop, the purpose that removal is uttered long and high-pitched sounds is reached.

A kind of voice data processing method provided by the present invention and device are described in detail above, herein should Principle of the invention and implementation method are set forth with specific case, the explanation of above example is only intended to help and manages The solution method of the present invention and its core concept；Simultaneously for those of ordinary skill in the art, according to thought of the invention, Be will change in specific embodiment and range of application, in sum, this specification content should not be construed as to this hair Bright limitation.

It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference with other embodiment, between each embodiment identical similar part mutually referring to. For device disclosed in embodiment, because it is corresponded to the method disclosed in Example, so fairly simple, the phase of description Part is closed referring to method part illustration.

Also, it should be noted that herein, such as first and second or the like relational terms are used merely to one Entity or operation make a distinction with another entity or operation, and between not necessarily requiring or implying these entities or operate There is any this actual relation or order.And, term " including ", "comprising" or its any other variant be intended to contain Lid nonexcludability is included, so that process, method, article or the intrinsic key element of equipment including a series of key elements, Or it is these processes, method, article or the intrinsic key element of equipment also to include.In the absence of more restrictions, The key element limited by sentence "including a ...", it is not excluded that in the process including the key element, method, article or equipment In also there is other identical element.

The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or uses the present invention. Various modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, the present invention The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The scope most wide for causing.

Claims

1. a kind of voice data processing method, it is characterised in that be applied to each intercommunication receiving end in intercom system, including：

When the speech data that intercommunication originator sends is received, the current environment speech data for monitoring collection in advance is transferred, it is described The duration of current environment speech data is not less than default network delay；

The cross correlation of the speech data and the current environment speech data is calculated, and judges whether the cross correlation is small In threshold value；

2. method according to claim 1, it is characterised in that described advance to monitor collection current environment speech data, bag Include：

The storage region for storing setting duration speech data is distributed, the setting duration prolongs not less than the default network When；

Start microphone and Real-time Collection environment speech data, the duration of the environment speech data is less than the setting duration；

The environment speech data is stored in the storage region successively；

Speech data in the storage region according to the setting duration real-time update, and will be current complete in the storage region The environment speech data in portion is defined as current environment speech data.

3. method according to claim 1, it is characterised in that the calculating speech data and the current environment language The cross correlation of sound data, including：

At least one target environment speech data, the target environment speech data are chosen from the current environment speech data Duration be equal to the speech data duration；

The maximum chosen in each described cross correlation is mutual with the current environment speech data as the speech data Correlation.

4. a kind of voice data processing apparatus, it is characterised in that including：Transfer module, calculate judge module, voice playing module With voice discard module, the module of transferring is including monitoring collecting unit；

It is described to transfer module, for when the speech data that intercommunication originator sends is received, transferring and monitoring the current of collection in advance Environment speech data, the duration of the current environment speech data is not less than default network delay；

The calculating judge module, the cross correlation for calculating the speech data and the current environment speech data, and Judge the cross correlation whether less than threshold value；

5. device according to claim 4, it is characterised in that the monitoring collecting unit, specifically for：

The storage region for storing setting duration speech data is distributed, the setting duration prolongs not less than the default network When；Start microphone and Real-time Collection environment speech data, the duration of the environment speech data is less than the setting duration；Will The environment speech data is stored in the storage region successively；In the storage region according to the setting duration real-time update Speech data, and environment speech data current whole in the storage region is defined as current environment speech data.

6. device according to claim 4, it is characterised in that the calculating speech data and the current environment language The calculating judge module of the cross correlation of sound data, specifically for：

At least one target environment speech data, the target environment speech data are chosen from the current environment speech data Duration be equal to the speech data duration；Each described target environment speech data and the speech data are calculated respectively Cross correlation；The maximum in each described cross correlation is chosen as the speech data and the current environment speech data Cross correlation.