CN106847300B

CN106847300B - A kind of voice data processing method and device

Info

Publication number: CN106847300B
Application number: CN201710123431.0A
Authority: CN
Inventors: 禹业茂; 皮慧斌; 王金宝
Original assignee: Beijing Zed-3 Technology Co Ltd
Current assignee: Beijing Zed-3 Technology Co Ltd
Priority date: 2017-03-03
Filing date: 2017-03-03
Publication date: 2018-06-22
Anticipated expiration: 2037-03-03
Also published as: CN106847300A

Abstract

The present invention provides a kind of voice data processing method and device, this method is applied to each intercommunication receiving end in intercom system, including：When receiving the voice data that intercommunication originator is sent, the current environment voice data for monitoring acquisition in advance is transferred, the duration of current environment voice data is not less than default network delay；The cross correlation of voice data and current environment voice data is calculated, and judges whether cross correlation is less than threshold value；When cross correlation is less than threshold value, voice data is played；When cross correlation is not less than threshold value, voice data is abandoned.Avoiding problems due to intercommunication originator and intercommunication receiving end hypotelorism, form phonological loop and generate self-excitation, achieve the purpose that removal is uttered long and high-pitched sounds.

Description

A kind of voice data processing method and device

Technical field

The present invention relates to intercom technical field, more specifically to a kind of voice data processing method and device.

Background technology

Intercom system is mainly used in the industries such as public security, transport, building and service, for the contact between member of community and finger Wave scheduling.

At present, each intercommunication end in intercom system is led to using the mode of PTT (push-to-talk, push to talk) Words are the one-way voice of half-duplex mode, i.e., at most can only there are the intercommunications that one can generate voice messaging in intercom system Originator, and other intercommunication ends then receive the voice messaging of intercommunication originator to realize the communication of intercom system as intercommunication receiving end.

The intercom system to be communicated using half-duplex one-way voice is not typically had self-excitation and uttered long and high-pitched sounds, still, when When intercommunication originator and the hypotelorism of intercommunication receiving end, the voice that intercommunication receiving end plays is possible to feed back to intercommunication originator, so as to Phonological loop is formed, causes self-excitation, and then causes to generate and utter long and high-pitched sounds.

Invention content

In view of this, the present invention provides a kind of voice data processing method and device, to solve in existing intercom system The problem of uttering long and high-pitched sounds being likely to occur.Technical solution is as follows：

A kind of voice data processing method, applied to intercommunication receiving end each in intercom system, including：

When receiving the voice data that intercommunication originator is sent, the current environment voice data for monitoring acquisition in advance is transferred, The duration of the current environment voice data is not less than default network delay；

The cross correlation of the voice data and the current environment voice data is calculated, and judges that the cross correlation is It is no to be less than threshold value；

When the cross correlation is less than the threshold value, the voice data is played；

When the cross correlation is not less than the threshold value, the voice data is abandoned.

Preferably, advance monitor acquires current environment voice data, including：

For storing the storage region of setting duration voice data, the setting duration is not less than the default network for distribution Delay；

Start microphone and acquire environment voice data in real time, when the duration of the environment voice data is less than the setting It is long；

The environment voice data is stored in the storage region successively；

According to the voice data described in the setting duration real-time update in storage region, and will work as in the storage region Preceding whole environment voice data is determined as current environment voice data.

Preferably, the cross correlation for calculating the voice data and the current environment voice data, including：

At least one target environment voice data, the target environment voice are chosen from the current environment voice data The duration of data is equal to the duration of the voice data；

The cross correlation of each target environment voice data and the voice data is calculated respectively；

The maximum value in each cross correlation is chosen as the voice data and the current environment voice data Cross correlation.

A kind of voice data processing apparatus, including：Module, calculating judgment module, voice playing module and voice is transferred to lose Module is abandoned, the module of transferring includes monitoring collecting unit；

The monitoring collecting unit, for monitoring acquisition current environment voice data in advance；

It is described to transfer module, for when receiving the voice data that intercommunication originator is sent, transferring and monitoring acquisition in advance Current environment voice data, the duration of the current environment voice data are not less than default network delay；

The calculating judgment module, for calculating the cross-correlation of the voice data and the current environment voice data Property, and judge whether the cross correlation is less than threshold value；

The voice playing module, for when the cross correlation is less than the threshold value, playing the voice data；

The voice discard module, for when the cross correlation is not less than the threshold value, abandoning the voice data.

Preferably, the monitoring collecting unit, is specifically used for：

For storing the storage region of setting duration voice data, the setting duration is not less than the default network for distribution Delay；Start microphone and acquire environment voice data in real time, the duration of the environment voice data is less than the setting duration； The environment voice data is stored in the storage region successively；According to storage region described in the setting duration real-time update Interior voice data, and whole environment voice data current in the storage region is determined as current environment voice data.

Preferably, the calculating for calculating the voice data and the cross correlation of the current environment voice data judges Module is specifically used for：

At least one target environment voice data, the target environment voice are chosen from the current environment voice data The duration of data is equal to the duration of the voice data；Each target environment voice data and the voice number are calculated respectively According to cross correlation；The maximum value in each cross correlation is chosen as the voice data and the current environment voice The cross correlation of data.

Compared to the prior art, what the present invention realized has the beneficial effect that：

Above a kind of voice data processing method and device provided by the invention, this method are applied to each in intercom system Intercommunication receiving end, by calculating voice data and the advance current environment voice number for monitoring acquisition that the intercommunication received originator is sent According to cross correlation；By judge cross correlation whether less than threshold value come determine with intercommunication start distance whether be more than apart from threshold Value；When cross correlation, which is less than threshold value, is namely more than distance threshold with the distance of intercommunication originator, voice data is played；When mutual When closing property is namely no more than distance threshold not less than threshold value with the distance of intercommunication originator, voice data is abandoned.Avoiding problems It due to intercommunication originator and intercommunication receiving end hypotelorism, forms phonological loop and generates self-excitation, achieve the purpose that removal is uttered long and high-pitched sounds.

Description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention, for those of ordinary skill in the art, without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 is a kind of voice data processing method flow chart disclosed in the embodiment of the present invention one；

Fig. 2 is a kind of voice data processing method partial process view disclosed in the embodiment of the present invention two；

Fig. 3 another voice data processing method partial process views disclosed in the embodiment of the present invention two；

Fig. 4 is a kind of voice data processing apparatus structure diagram disclosed in the embodiment of the present invention three.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment shall fall within the protection scope of the present invention.

Embodiment one

The embodiment of the present invention one discloses a kind of voice data processing method, is received applied to intercommunication each in intercom system End, flow chart is as shown in Figure 1, include the following steps：

S101 when receiving the voice data that intercommunication originator is sent, transfers the current environment voice for monitoring acquisition in advance Data, the duration of current environment voice data are not less than default network delay；

During step S101 is performed, intercommunication receiving end opens phonetic incepting thread and microphone collecting thread simultaneously, Wherein microphone collecting thread is used to acquire the voice data of intercommunication receiving end ambient enviroment, and voice number is generated due to starting from intercommunication According to voice data is received to intercommunication receiving end, there are certain network delays, and therefore, intercommunication receiving end need to be transferred monitors acquisition in advance Current environment voice data, and the duration of current environment voice data is not less than default network delay.

S102, calculates the cross correlation of voice data and current environment voice data, and judges whether cross correlation is less than Threshold value；

During step S102 is performed, can voice data and current environment voice data be calculated according to cross-correlation function Cross correlation, the cross correlation is used to characterize the degree of correlation of voice data and current environment voice data, also, with it is right Say that the distance of originator is more remote, cross correlation is also just smaller, therefore, the distance that can be started by calculating cross correlation judgement with intercommunication Whether it is more than distance threshold.

S103 when cross correlation is less than threshold value, plays voice data；

S104 when cross correlation is not less than threshold value, abandons voice data.

It should be noted that when cross correlation is less than threshold value, it, can also be according to advance to ensure the intercommunication Experience Degree of user The cross correlation of setting and the mapping relations of level of sound volume play voice data, still, cross correlation and sound with corresponding volume The mapping relations of amount grade should ensure that the voice of broadcasting will not feed back to intercommunication originator, so as to form phonological loop, also It will not generate and utter long and high-pitched sounds.

A kind of voice data processing method disclosed by the embodiments of the present invention, by calculating the intercommunication received originator transmission The cross correlation of current environment voice data of the voice data with monitoring acquisition in advance；By judging whether cross correlation is less than threshold It is worth to determine whether with the distance of intercommunication originator be more than distance threshold；When cross correlation is less than what threshold value was namely started with intercommunication When distance is more than distance threshold, voice data is played；When cross correlation not less than threshold value namely with intercommunication originator distance not During more than distance threshold, voice data is abandoned.Avoiding problems due to intercommunication originator and intercommunication receiving end hypotelorism, voice is formed Circuit and generate self-excitation, achieve the purpose that removal utter long and high-pitched sounds.

Embodiment two

The voice data processing method with reference to disclosed in the embodiments of the present invention one, in step S101 as illustrated in FIG. 1, The specific implementation procedure of acquisition current environment voice data is monitored in advance, as shown in Fig. 2, including the following steps：

S201, for storing the storage region of setting duration voice data, setting duration is not less than the default net for distribution Network is delayed；

S202 starts microphone and acquires environment voice data in real time, and the duration of environment voice data is less than setting duration；

During step S202 is performed, acquisition in real time is presently in the environment voice data of position, for example, each time Acquire the environment voice data of 20ms.

Environment voice data is stored in storage region by S203 successively；

During step S203 is performed, for example, by the environment voice data of the 20ms acquired each time according to acquisition The sequencing at time point is stored in storage region successively.

S204, according to the voice data in setting duration real-time update storage region, and will be current whole in storage region Environment voice data be determined as current environment voice data；

During step S204 is performed, since the duration of the storable voice data of storage region is setting, when When the environment voice data acquired each time is stored in storage region, intercommunication receiving end judges environment language whole in storage region Sound data duration with setting duration difference whether be more than 0, if so, by storage time it is earliest when a length of difference environment voice Data are deleted, and whole environment voice data current in storage region is determined as current environment voice data；

For example, a length of 500ms during setting, intercommunication receiving end acquire the environment voice data of 20ms each time, also, when will most When the environment voice data of the nearly 20ms once acquired is stored to storage region, in storage region whole environment voice data when The long difference with 500ms is 20ms, then delete the storage region memory storage time it is earliest when a length of 20ms environment voice data, The voice data in region is updated storage with this.

A kind of voice data processing method disclosed by the embodiments of the present invention, by calculating the intercommunication received originator transmission The cross correlation of current environment voice data of the voice data with monitoring acquisition in advance；By judging whether cross correlation is less than threshold It is worth to determine whether with the distance of intercommunication originator be more than distance threshold；When cross correlation is less than threshold value, voice data is played；When When cross correlation is not less than threshold value, voice data is abandoned.Avoiding problems due to intercommunication originator and intercommunication receiving end hypotelorism, shape Self-excitation is generated into phonological loop, achievees the purpose that removal is uttered long and high-pitched sounds.

The voice data processing method with reference to disclosed in the embodiments of the present invention one, in step S102 as illustrated in FIG. 1 The specific implementation procedure of the voice data and the cross correlation of the current environment voice data is calculated, as shown in figure 3, including Following steps：

S301 chooses at least one target environment voice data, target environment voice number from current environment voice data According to duration be equal to voice data duration；

S302 calculates the cross correlation of each target environment voice data and voice data respectively；

S303 chooses cross-correlation of the maximum value as voice data and current environment voice data in each cross correlation Property.

Embodiment three

Based on voice data processing method disclosed in the various embodiments described above, the embodiment of the present invention three is then in corresponding open execution The device of voice data processing method is stated, as shown in figure 4, voice data processing apparatus 100 includes：Module 101 is transferred, calculates and sentences Disconnected module 102, voice playing module 103 and voice discard module 104 transfer module 101 and include monitoring collecting unit 1011；

Collecting unit 1011 is monitored, for monitoring acquisition current environment voice data in advance；

Module 101 is transferred, for when receiving the voice data that intercommunication originator is sent, transferring and monitoring working as acquisition in advance Preceding environment voice data, the duration of current environment voice data are not less than default network delay；

Judgment module 102 is calculated, for calculating the cross correlation of voice data and current environment voice data, and is judged mutual Whether correlation is less than threshold value；

Voice playing module 103, for when cross correlation is less than threshold value, playing voice data；

Voice discard module 104, for when cross correlation is not less than threshold value, abandoning voice data.

It should be noted that monitoring collecting unit 1011, it is specifically used for：

For storing the storage region of setting duration voice data, setting duration is not less than default network delay for distribution；It opens Dynamic microphone simultaneously acquires environment voice data in real time, and the duration of environment voice data is less than setting duration；By environment voice data It is stored in storage region successively；According to the voice data in setting duration real-time update storage region, and will work as in storage region Preceding whole environment voice data is determined as current environment voice data.

It should also be noted that, the calculating for calculating voice data and the cross correlation of current environment voice data judges mould Block is specifically used for：

Choose at least one target environment voice data from current environment voice data, target environment voice data when The long duration equal to voice data；The cross correlation of each target environment voice data and voice data is calculated respectively；It chooses each Maximum value in a cross correlation is as voice data and the cross correlation of current environment voice data.

A kind of voice data processing apparatus disclosed by the embodiments of the present invention, by calculating the intercommunication received originator transmission The cross correlation of current environment voice data of the voice data with monitoring acquisition in advance；By judging whether cross correlation is less than threshold It is worth to determine whether with the distance of intercommunication originator be more than distance threshold；When cross correlation is less than threshold value, voice data is played；When When cross correlation is not less than threshold value, voice data is abandoned.Avoiding problems due to intercommunication originator and intercommunication receiving end hypotelorism, shape Self-excitation is generated into phonological loop, achievees the purpose that removal is uttered long and high-pitched sounds.

A kind of voice data processing method provided by the present invention and device are described in detail above, herein should The principle of the present invention and embodiment are expounded with specific case, the explanation of above example is only intended to help to manage Solve the method and its core concept of the present invention；Meanwhile for those of ordinary skill in the art, thought according to the present invention, There will be changes in specific embodiment and application range, in conclusion the content of the present specification should not be construed as to this hair Bright limitation.

It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference from other examples, and just to refer each other for identical similar part between each embodiment. For device disclosed in embodiment, since it is corresponded to the methods disclosed in the examples, so fairly simple, the phase of description Part is closed referring to method part illustration.

It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, term " comprising ", "comprising" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the element that process, method, article or equipment including a series of elements are intrinsic, It either further includes as these processes, method, article or the intrinsic element of equipment.In the absence of more restrictions, The element limited by sentence "including a ...", it is not excluded that in the process including the element, method, article or equipment In also there are other identical elements.

The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the present invention. A variety of modifications of these embodiments will be apparent for those skilled in the art, it is as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one The most wide range caused.

Claims

1. a kind of voice data processing method, which is characterized in that applied to intercommunication receiving end each in intercom system, including：

When receiving the voice data that intercommunication originator is sent, the current environment voice data for monitoring acquisition in advance is transferred, it is described The duration of current environment voice data is not less than default network delay；

The cross correlation of the voice data and the current environment voice data is calculated, and judges whether the cross correlation is small In threshold value；

When the cross correlation is not less than the threshold value, the voice data is abandoned；Wherein, advance monitor acquires currently Environment voice data, including：

For storing the storage region of setting duration voice data, the setting duration prolongs not less than the default network for distribution When；

Start microphone and acquire environment voice data in real time, the duration of the environment voice data is less than the setting duration；

The environment voice data is stored in the storage region successively；

According to the voice data described in the setting duration real-time update in storage region, and will be current complete in the storage region The environment voice data in portion is determined as current environment voice data.

2. according to the method described in claim 1, it is characterized in that, described calculate the voice data and the current environment language The cross correlation of sound data, including：

At least one target environment voice data, the target environment voice data are chosen from the current environment voice data Duration be equal to the voice data duration；

Maximum value in each cross correlation is chosen as the mutual of the voice data and the current environment voice data Correlation.

3. a kind of voice data processing apparatus, which is characterized in that including：It transfers module, calculate judgment module, voice playing module With voice discard module, the module of transferring includes monitoring collecting unit；

It is described to transfer module, for when receiving the voice data that intercommunication originator is sent, transferring and monitoring the current of acquisition in advance Environment voice data, the duration of the current environment voice data are not less than default network delay；

The calculating judgment module, for calculating the cross correlation of the voice data and the current environment voice data, and Judge whether the cross correlation is less than threshold value；

The voice discard module, for when the cross correlation is not less than the threshold value, abandoning the voice data；Its In, the monitoring collecting unit is specifically used for：

For storing the storage region of setting duration voice data, the setting duration prolongs not less than the default network for distribution When；Start microphone and acquire environment voice data in real time, the duration of the environment voice data is less than the setting duration；It will The environment voice data is stored in the storage region successively；According to the setting duration real-time update in storage region Voice data, and environment voice data current whole in the storage region is determined as current environment voice data.

4. device according to claim 3, which is characterized in that described to calculate the voice data and the current environment language The calculating judgment module of the cross correlation of sound data, is specifically used for：

At least one target environment voice data, the target environment voice data are chosen from the current environment voice data Duration be equal to the voice data duration；Each target environment voice data and the voice data are calculated respectively Cross correlation；The maximum value in each cross correlation is chosen as the voice data and the current environment voice data Cross correlation.