CN110473562A

CN110473562A - Audio data processing method, device and system

Info

Publication number: CN110473562A
Application number: CN201810442504.7A
Authority: CN
Inventors: 王伟杰
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-05-10
Filing date: 2018-05-10
Publication date: 2019-11-19
Anticipated expiration: 2038-05-10
Also published as: CN110473562B

Abstract

This application discloses a kind of audio data processing method, device and systems, belong to audio signal processing technique field.Method includes: between the reference audio data and collected acquisition audio data of judgement storage with the presence or absence of beat；If there is beat, then N number of blank audio data frame is written into the caching of storage reference audio data, and read N number of reference audio data frame, blank audio data frame is the audio data frame that volume is zero；N number of acquisition audio data frame is read, echo cancellation process is carried out based on N number of reference audio data frame and N number of acquisition audio data frame.In reference audio data and acquisition audio data, there are in the case where beat, the application can be after by filling up the blank audio data elimination beat phenomenon that volume is zero, echo cancellation process is carried out again, so that the wherein side for participating in call can only hear the sound for participating in another party of call, the treatment effect of echo cancellor is effectively promoted.

Description

Audio data processing method, device and system

Technical field

This application involves audio signal processing technique field, in particular to a kind of audio data processing method, device and system.

Background technique

Echo cancellor (Echo Cancellation) be also known as echo inhibition, be Circuit Telephone, mobile phone, intercom, It is one very heavy to promote voice quality for the speech ciphering equipments such as VOIP (Voice over Internet Protocol, the networking telephone) The technology wanted.Wherein, it delays and passes in the ear of oneself when echo essentially refers to the sound of oneself by one section.For example, The sound that the loudspeaker that echoing can be summarized as proximal device plays back is sent back to picked up by the microphone of proximal device again after To remote equipment, the remote subscriber positioned at distal end is enabled to hear the sound of oneself.

Illustratively, during proximal device and remote equipment are conversed, it is limited to the influence of the factors such as network fluctuation, It usually will appear the case where remote equipment is sent to the loss of data of proximal device, such case directly will lead to reference audio number According to there are beats between acquisition audio data.Illustratively, above-mentioned reference audio data are sent to proximal end for remote equipment and set Standby data, acquisition audio data are the microphone of proximal device in local collected audio data.Wherein, beat phenomenon can It is summarised as the acquisition audio data comprising echo data and reference audio data is not corresponding relationship in the time domain.

Since the treatment effect that beat phenomenon will cause echo cancellor is bad, before carrying out echo cancellation process, Usually also need the Processing for removing for first carrying out beat phenomenon.That is, how to be carried out at audio data in the case where data are lost Reason to eliminate beat phenomenon, and then improves the treatment effect of echo cancellor, so that the wherein side for participating in call can only hear The sound for participating in another party of call, becomes the focus that those skilled in the art pay close attention at present.

Summary of the invention

This application provides a kind of audio data processing method, device and system, solves echo in the related technology and disappear The bad problem of the treatment effect removed.The technical solution is as follows:

In a first aspect, providing a kind of audio data processing method, which comprises

Judge to whether there is beat between the reference audio data and collected acquisition audio data of storage；

If there are beats between the reference audio data and the acquisition audio data, described with reference to sound to storing N number of blank audio data frame is written in the caching of frequency evidence, and reads N number of reference audio data frame, the blank audio data Frame is the audio data frame that volume is zero, and N is the positive integer not less than 1；

N number of acquisition audio data frame is read, N number of reference audio data frame and N number of acquisition audio data are based on Frame carries out echo cancellation process.

In another embodiment, between the reference audio data and collected acquisition audio data of the judgement storage With the presence or absence of beat, comprising:

Obtain the playing duration of the reference audio data of storage；

Judge whether the playing duration is less than the targets threshold of setting；

If the playing duration is less than the targets threshold, it is determined that the reference audio data and the acquisition audio There are beats between data.

In another embodiment, the playing duration of each audio data frame is equal, and the targets threshold is prolonging for setting When desired value and N number of audio data frame the sum of playing duration；

Wherein, the delay desired value refers to the time difference between the first moment and the second moment, first moment At the time of reference audio data will to receive are stored to the caching, second moment is by the reference received At the time of audio data is collected again after playing back.

In another embodiment, the method also includes:

When starting call, the delay desired value is set；Or, the delay desired value is arranged in communication process.

In another embodiment, before obtaining the playing duration of reference audio data of storage, the method is also wrapped It includes:

Based on the delay desired value, the data pointer of writing in the caching is adjusted backward by the first position being currently located It is whole to the second position；

In the first position to the audio data that write-in volume is zero between the second position.

In another embodiment, the method also includes:

If between the reference audio data and collected acquisition audio data of storage, there are beats, cancel audio number According to read operation.

In another embodiment, described that N number of blank audio number is written into the caching for storing the reference audio data According to frame, and read N number of reference audio data frame, comprising:

The third place that data pointer is currently located is write in the caching, and N number of blank audio data is written backward Frame；

The 4th position that data pointer is currently located is read in the caching reads N number of reference audio data backward Frame.

Second aspect, provides a kind of audio-frequency data processing device, and described device includes:

Judgment module, for judge storage reference audio data and collected acquisition audio data between whether there is Beat；

First processing module, if between the reference audio data and the acquisition audio data there are beat, N number of blank audio data frame then is written into the caching for storing the reference audio data, and reads N number of reference audio data Frame, the blank audio data frame are the audio data frame that volume is zero, and N is the positive integer not less than 1；

Second processing module is based on N number of reference audio data frame and institute for reading N number of acquisition audio data frame It states N number of acquisition audio data frame and carries out echo cancellation process.

In another embodiment, the judgment module, the playing duration of the reference audio data for obtaining storage；Sentence Whether the playing duration of breaking is less than the targets threshold of setting；If the playing duration is less than the targets threshold, it is determined that There are beats between the reference audio data and the acquisition audio data.

In another embodiment, described device further include:

Setup module, for the delay desired value to be arranged when starting call；Or, in communication process, described in setting Be delayed desired value.

In another embodiment, described device further include:

Third processing module will write data pointer by current institute in the caching for being based on the delay desired value First position adjusted backward to the second position；It is zero in the first position to write-in volume between the second position Audio data.

In another embodiment, the Second processing module, if the reference audio data and acquisition that are also used to store To acquisition audio data between there are beats, then cancel audio data read operation.

In another embodiment, the first processing module is currently located for writing data pointer in the caching The third place N number of blank audio data frame is written backward；The data pointer is currently located the 4th is read in the caching N number of reference audio data frame is read backward in position.

The third aspect provides a kind of audio-frequency data processing system, and the system comprises the first equipment and the second equipment；

First equipment is used to send reference audio data to second equipment, and second equipment is for receiving After the reference audio data sent to first equipment, audio data processing method described in above-mentioned first aspect is executed.

Technical solution provided by the present application has the benefit that

In reference audio data and acquisition audio data there are in the case where beat, the application can be by filling up volume After zero blank audio data eliminates beat phenomenon, then echo cancellation process is carried out, so that the wherein side for participating in call is only capable of It enough hears the sound for participating in another party of call, therefore the treatment effect of echo cancellor is effectively promoted.

Detailed description of the invention

It, below will be to attached needed in embodiment description in order to illustrate more clearly of the technical solution in the application Figure is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for this field For those of ordinary skill, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is the schematic diagram for the implementation environment that audio data processing method provided by the present application is related to；

Fig. 2 is a kind of flow chart of audio data processing method provided by the present application；

Fig. 3 is the flow diagram that a kind of first caching provided by the present application carries out delay adjustment；

Fig. 4 is the flow diagram that a kind of first caching provided by the present application carries out delay adjustment；

Fig. 5 is a kind of schematic diagram of the delay adjustment of first caching provided by the present application；

Fig. 6 be it is provided by the present application it is a kind of there is no when beat first caching read-write operation schematic diagram；

Fig. 7 be it is provided by the present application it is a kind of there are when beat first caching read-write operation schematic diagram；

Fig. 8 is a kind of overall flow figure of audio data processing provided by the present application；

Fig. 9 is a kind of structural schematic diagram of audio-frequency data processing device provided by the present application；

Figure 10 is a kind of structural schematic diagram of equipment for audio data processing provided by the present application.

Specific embodiment

To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party Formula is described in further detail.

Before to the application carrying out that explanation is explained in detail, first to this application involves to some vocabularies of terms solve Release explanation.

Echo: it delays and passes in the ear of oneself when referring to the sound of oneself by one section.

In an optional embodiment, echoing can be summarized as the sound that the loudspeaker of proximal device plays back Remote equipment is sent back to picked up by the microphone of proximal device again after, the remote subscriber positioned at distal end is enabled to hear oneself Sound.

In another optional embodiment, what the loudspeaker that echoing also can be summarized as remote equipment played back Sound sends back to proximal device picked up by the microphone of remote equipment again after, and the near-end user positioned at proximal end is heard certainly Oneself sound.

Wherein, near-end user refers to local user, and remote subscriber is referred to remotely located and conversed with local user User.

Reference audio data: in an optional embodiment, by taking near-end user angle as an example, reference audio data are The data that the remote equipment that proximal device receives is sent.

In another optional embodiment, by taking remote subscriber angle as an example, reference audio data connect for remote equipment The data that the proximal device received is sent.

It acquires audio data: referring to the collected audio data of microphone.

In an optional embodiment, by taking near-end user angle as an example, acquisition audio data can be proximal device The collected data of microphone.Illustratively, for near-end user angle, acquire may include in audio data near-end user sound The audio data of frequency evidence and remote subscriber by the loudspeaker of proximal device after being played again by the microphone of proximal device The collected data of institute.Wherein, the audio data of remote subscriber is collected in proximal device side and via the loudspeaking of remote equipment After device plays back, remote subscriber has just heard the sound of oneself, is based on this, this portion of audio data is also referred to as echo number According to.

In another optional embodiment, by taking remote subscriber angle as an example, acquisition audio data can be remote equipment The collected data of microphone.It illustratively, may include the sound of remote subscriber in remote subscriber angle, acquisition audio data The audio data of frequency evidence and near-end user by the loudspeaker of remote equipment after being played again by the microphone of remote equipment The collected data of institute.

Alignment of data: reference carries out reference audio data and the acquisition audio data comprising echo data in the time domain The operation of alignment.

Audio data frame: in audio area, it can be divided into picture frame one by one similar to video, audio data can also divide For audio data frame one by one, wherein the corresponding playing duration of audio data frame, such as an audio data frame broadcasting when Length can be 10ms, 20ms, 30ms etc., and the application is to this without specifically limiting.Under normal conditions, not according to sample frequency Together, the playing duration of an audio data frame also can be different.

Wherein, audio data frame includes reference audio data frame, acquisition audio data frame, blank audio data frame.Wherein, Reference audio data frame derives from the audio data that remote equipment is sent；Acquisition audio data source is adopted in proximal device locally The audio data collected；Blank audio data frame refers to the audio data frame that volume is zero, that is, blank audio data frame includes It is the audio data that volume is zero, also referred to as blank audio data；For reference audio data frame, acquisition audio data frame and For blank audio data frame, the playing duration of each audio data frame can be equal.

The implementation environment being related to below to audio data processing method provided by the present application is illustrated.The implementation environment packet Include proximal device and remote equipment.Wherein, the type of proximal device and remote equipment includes but is not limited to: Circuit Telephone, shifting Mobile phone, intercom, VOIP etc..

Referring to Fig. 1, the application only by taking proximal device and remote equipment are mobile phone (such as mobile phone) as an example, is lifted Example explanation.

Due to being usually present echo in communication process, it is generally required to carry out echo to improve communication effect Processing for removing.Wherein, it carries out needing two groups of data, respectively acquisition audio data and reference audio number when echo cancellation process According to.In addition, before carrying out echo cancellation process, usually also needing to consider following feelings to improve the treatment effect of echo cancellor Condition:

By taking proximal device carries out audio data processing as an example, in communication process, due to the influence of the factors such as network fluctuation, It will appear proximal device to fail the case where receiving the data of remote equipment transmission, i.e., the data that remote equipment is sent are lost Lose, this case can further cause reference audio data and acquisition audio data between there are beats.And beat phenomenon In the presence of then can further lead to the ineffective of echo cancellation process, so also being needed excellent before carrying out echo cancellation process First eliminate beat phenomenon.

Illustratively, technical problems to be solved in this application include at least: how in proximal device loss remote equipment Under the data cases of transmission, the treatment effect of echo cancellor is improved.In addition, the application is carried out primarily directed to reference audio data Processing, with eliminate reference audio data and acquisition audio data between existing beat phenomenon so that reference audio data with adopt Collection audio data is mutually aligned, and then improves the treatment effect of echo cancellor.

It should be noted that the application is only illustrated so that proximal device carries out audio data processing as an example, for distal end For equipment, similar mode can also be taken to carry out audio data processing, the application is to this without specifically limiting.

Fig. 2 is a kind of flow chart of audio data processing method provided by the present application.Audio processing is carried out with proximal device, The reference audio data received are stored to the first caching, and collected acquisition audio data is stored to the second caching, ginseng See that Fig. 2, method flow provided by the present application include:

201, proximal device receives the reference audio data that remote equipment is sent, and the reference audio data received are divided It does not store to the first caching and third caching.

Wherein, the first caching is for storing the reference audio data of remote equipment transmission, and the first caching is in this application It can be referred to as reference data caching.

And third caching is plays caching, the player of proximal device is when locally carrying out audio broadcasting, specifically from broadcasting Slow down and deposits middle carry out reading data.

202, proximal device prepares to read reference buffer storage data from the first caching, judges the reference stored in the first caching With the presence or absence of beat between the acquisition audio data stored in audio data and the second caching；If it is, executing following step 203；If it is not, then executing following step 205.

In a kind of possible embodiment, judge between reference audio data and acquisition audio data with the presence or absence of poor It claps, whether can lose realization by the data for judging that remote equipment is sent.And whether the data for judging that remote equipment is sent lose, The playing duration of currently stored reference audio data is realized in being cached based on judgement first.A kind of expression way is changed, this Shen Whether the data that the playing duration of the reference audio data stored in the first caching please can be used to judge that remote equipment is sent go out Active, the specific steps are as follows:

(1), the playing duration for obtaining the currently stored reference audio data of the first caching, judges whether the playing duration is small In the targets threshold of setting.

Wherein, targets threshold is the delay desired value of setting and the sum of the playing duration of N number of audio data frame.

The desired value that is delayed refers to the time difference between the first moment and the second moment.Wherein, the first moment set for proximal end Standby at the time of store the reference audio data received to the first caching, the second moment was the reference that proximal device will receive Audio data by player plays after locally being come out again by the microphone collected moment.

Change a kind of expression way, it is assumed that proximal device is after the reference audio data for receiving remote equipment transmission, deposit It is T1 at the time of playing out that third caching is sent while first caching, it is assumed that refers to sound by this of player plays out Frequency is T2 according to the microphone of the proximal device collected moment again, then the stool and urine for the desired value that is delayed is T2-T1.

In addition, for for a proximal device, under normal circumstances, delay desired value be it is metastable, i.e., greatly It is small generally to remain unchanged.

In another embodiment, it in order to reduce because of time-consuming caused by computing repeatedly delay desired value, is delayed in advance The setting of desired value.Wherein, delay desired value may be either to be configured when proximal device starts and converses with remote equipment, can also To be configured in the communication process of proximal device and remote equipment, the application is to this without specifically limiting.

Wherein, the value of N is the positive integer more than or equal to 1；If on a frame-by-frame basis carrying out echo cancellation process, N Value be 1；If carrying out to every every two frame of two frames echo cancellation process, the value of N is 2, and so on.

(2) if, storage reference audio data playing duration be less than setting targets threshold, it is determined that reference audio There are beats between data and acquisition audio data；If the playing duration of the reference audio data of storage is greater than the target of setting Threshold value, it is determined that beat is not present between reference audio data and acquisition audio data.

In another embodiment, even if the data of remote equipment transmission are there is no losing, under normal circumstances, sound is acquired The echo data that frequency includes in is also not aligned with reference audio data, but there are some delays.Wherein, these Time delay is mainly come out by reference audio data by the player plays of proximal device, with these data again by the wheat of proximal device Gram wind has caused by the time difference between collecting.If do not handled for this phenomenon, after echo cancellation process still Have remaining echo.

In order to solve the problems, such as it is above-mentioned be mentioned to, above-mentioned delay desired value can as to first caching carry out delay adjustment Foundation.That is, as shown in figure 3, offset of the pre-set delay desired value as the first caching, can be based on the delay desired value Delay adjustment is carried out to the first caching, by the operation to the first caching, so that in reference audio data and acquisition audio data Echo segment alignment.

Further, as shown in figure 4, the offset that cache eventually as adjusting first of delay desired value, so that first delays The audio 1 deposited is aligned with the echo 1 in the second caching.Wherein, the second caching is for storing the microphone of proximal device in local Collected data.Echo 1 be with the matched echo of audio 1, echo 2 be with the matched echo of audio 2, echo 3 be and audio 3 Matched echo, and so on.

Further, proximal device specifically may be used when carrying out delay adjustment to the first caching based on above-mentioned delay desired value Take following manner to realize: the delay desired value based on setting, by the first caching write data pointer by be currently located the One position is adjusted backward to the second position, and in first position to the blank audio number that write-in volume is zero between the second position According to that is, Offset portion is filled up using blank audio data.

As shown in figure 5, writing data pointer using above-mentioned time delay prestige value as default bias amount to adjust.Wherein, data are write Pointer is equivalent to the data length for moving a delay desired value backward after the adjustment.

It should be noted is that the judgement with the presence or absence of beat phenomenon and based on the subsequent processing stream for determining result Journey is carried out after carrying out above-mentioned delay adjustment.

If 203, there are beats between reference audio data and acquisition audio data, proximal device is into the first caching N number of blank audio data frame is written, and reads N number of reference audio data frame from the first caching, is read from the second caching N number of Acquire audio data frame.

Wherein, if proximal device judges that the playing duration of the reference audio data stored in the first caching is less than setting Delay desired value and N number of audio data frame the sum of playing duration, it is determined that the reference audio data deficiencies cached at this time, Confirm the loss of data that remote equipment is sent, there are beats between reference audio data and acquisition audio data.

For this kind of situation, proximal device is first write the third place that data pointer is currently located from the first caching and is write backward Enter blank audio data and carry out Data-parallel language, is i.e. the reference audio data write-in first using blank audio data as substitution is slow It deposits.Specifically, writable N number of blank audio data frame is being write due to when carrying out audio processing being carried out by N frame After entering N number of blank audio data frame, proximal device read from the first caching again the 4th position that is currently located of data pointer to After read N number of reference audio data frame.

As shown in fig. 6, above-mentioned the third place, which corresponds to before filling up in Fig. 6, writes position where data pointer, above-mentioned the Position where data pointer is write after corresponding to filling up in Fig. 6 in four positions.

First point for needing to illustrate is, at this time if proximal device receives the reference audio data of remote equipment transmission, Then as shown in fig. 6, carrying out data write-in backward at the 4th position.

The second point for needing to illustrate is that the above-mentioned beat phenomenon cancellation taken is to insert N number of sky into the first caching Then white audio data frame reads N number of reference audio data frame from the first caching again and carries out echo cancellation process.At another In embodiment, as shown in Fig. 2, the application further includes beat phenomenon cancellation shown in such as following step 204.

If 204, there are beat between reference audio data and acquisition audio data, proximal device cancels audio data Read operation.

For this kind of situation, proximal device is also believed that since loss of data occurs, in acquisition audio data There is no the echoes of the data of loss, so N frame reference audio number can not be read from the first caching without Data-parallel language According to without echo cancellation process.

It should be noted that working as at this time if proximal device receives the reference audio data of remote equipment transmission Before write and carry out data write-in backward at position where data pointer.

If beat 205, is not present between reference audio data and acquisition audio data, proximal device is cached from first It is middle to read N number of reference audio data frame, and N number of acquisition audio data frame is read from the second caching.

For this kind of situation, since there is no beat, therefore proximal device is directly read data pointer from the first caching and is worked as N number of reference audio data frame is read backward in 5th position at preceding place.

As shown in fig. 7, above-mentioned 5th position corresponds to the position read where data pointer in Fig. 7.Further, at this time If proximal device receives the reference audio data of remote equipment transmission, as shown in fig. 7, where currently writing data pointer Position at carry out data write-in backward.

206, proximal device is based on the N number of reference audio data frame read and the N number of acquisition audio data frame read Carry out echo cancellation process.

Wherein, the N number of reference audio data frame read and the N number of acquisition audio data frame read can be admitted to proximal end The echo processing module of equipment, and then echo cancellor is carried out by echo processing module.And carry out the audio obtained after echo cancellor Data can be sent to remote equipment by proximal device.

Illustratively, referring to Fig. 8, by taking proximal device carries out audio data processing as an example, audio data provided by the present application The process flow of processing method includes:

(a), proximal device receives the reference audio data that remote equipment is sent.

(b), the reference audio data received are respectively written into broadcasting caching by proximal device and reference data caches.

(c), proximal device prepares to read data from reference data caching, judges currently stored in reference data caching Whether the playing duration of reference audio data is less than the sum of the playing duration of extension desired value Yu N number of audio data frame.

(d), it if it is not, then beat is not present between acquisition audio data and reference audio data, is cached in reference data It is middle to read N number of reference audio data frame, and N number of acquisition audio data is read from the caching for storing acquisition audio data Frame.

(e), if it is, there are beats between acquisition audio data and reference audio data, into reference data caching N blank audio data frame is written and carries out alignment of data, later, reads N number of reference audio data frame in reference data caching, with And N number of acquisition audio data frame is read from the caching for storing acquisition audio data.

Wherein, step (e) can also be replaced by above-mentioned steps 204.

(f), the N number of reference audio data frame and N number of acquisition audio data frame read is fed to echo processing mould Block carries out echo cancellation process.

(g), the audio data after echo cancellor is exported.

Method provided by the present application, in reference audio data and acquisition audio data there are in the case where beat, the application It can be after by filling up the blank audio data elimination beat phenomenon that volume is zero, then echo cancellation process is carried out, so that participating in A wherein side for call can only hear the sound for participating in another party of call, therefore the processing of echo cancellor is effectively promoted Effect.

In addition, the application can not also be from reference number there are when beat between reference audio data and acquisition audio data According to data are read in caching, assert the data of loss and there is no echoes, disappears so not carrying out echo to current audio data Except processing.

In addition, the application not only can be to avoid computing repeatedly delay desired value, but also it can also quickly judge that remote equipment is sent out Whether the data sent lose, and beat phenomenon caused by can quickly handling because of shortage of data, reduce because of echo cancellor bring Mouth-to-ear delay, while having both good echo cancellor effect.

Fig. 9 is a kind of structural schematic diagram of audio-frequency data processing device provided by the present application.Referring to Fig. 9, which includes:

Judgment module 901, for judge storage reference audio data and collected acquisition audio data between whether There are beats；

First processing module 902, if poor for existing between the reference audio data and the acquisition audio data It claps, then N number of blank audio data frame is written into the caching for storing the reference audio data, and read N number of reference audio number According to frame, the blank audio data frame is the audio data frame that volume is zero, and N is the positive integer not less than 1；

Second processing module 903, for reading N number of acquisition audio data frame, based on N number of reference audio data frame and N number of acquisition audio data frame carries out echo cancellation process.

Device provided by the present application, in reference audio data and acquisition audio data there are in the case where beat, the application It can be after by filling up the blank audio data elimination beat phenomenon that volume is zero, then echo cancellation process is carried out, so that participating in A wherein side for call can only hear the sound for participating in another party of call, therefore the processing of echo cancellor is effectively promoted Effect.

In another embodiment, the device further include:

All the above alternatives can form the alternative embodiment of the application, herein no longer using any combination It repeats one by one.

It should be understood that audio-frequency data processing device provided by the above embodiment is when carrying out audio data processing, only The example of the division of the above functional modules, in practical application, can according to need and by above-mentioned function distribution by Different functional modules is completed, i.e., the internal structure of device is divided into different functional modules, described above complete to complete Portion or partial function.In addition, audio-frequency data processing device provided by the above embodiment and audio data processing method embodiment Belong to same design, specific implementation process is detailed in embodiment of the method, and which is not described herein again.

Figure 10 shows the knot for the equipment 1000 for audio data processing that one exemplary embodiment of the application provides Structure block diagram.The equipment 1000 may is that smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, laptop Or desktop computer.Equipment 1000 be also possible to referred to as user equipment, portable terminal, laptop terminal, terminal console etc. other Title.

In general, equipment 1000 includes: processor 1001 and memory 1002.

Processor 1001 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 1001 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 1001 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit)；Coprocessor is the low power processor for being handled data in the standby state.In In some embodiments, processor 1001 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 1001 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.

Memory 1002 may include one or more computer readable storage mediums, which can To be non-transient.Memory 1002 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 1002 can Storage medium is read for storing at least one instruction, at least one instruction performed by processor 1001 for realizing this Shen Please in embodiment of the method provide audio data processing method.

In some embodiments, equipment 1000 is also optional includes: peripheral device interface 1003 and at least one periphery are set It is standby.It can be connected by bus or signal wire between processor 1001, memory 1002 and peripheral device interface 1003.It is each outer Peripheral equipment can be connected by bus, signal wire or circuit board with peripheral device interface 1003.Specifically, peripheral equipment includes: In radio circuit 1004, touch display screen 1005, camera 1006, voicefrequency circuit 1007, positioning component 1008 and power supply 1009 At least one.

Peripheral device interface 1003 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 1001 and memory 1002.In some embodiments, processor 1001, memory 1002 and periphery Equipment interface 1003 is integrated on same chip or circuit board；In some other embodiments, processor 1001, memory 1002 and peripheral device interface 1003 in any one or two can be realized on individual chip or circuit board, this implementation Example is not limited this.

Radio circuit 1004 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal. Radio circuit 1004 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 1004 is by telecommunications Number being converted to electromagnetic signal is sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 1004 include: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, volume solution Code chipset, user identity module card etc..Radio circuit 1004 can by least one wireless communication protocol come with it is other Terminal is communicated.The wireless communication protocol includes but is not limited to: WWW, Metropolitan Area Network (MAN), Intranet, each third generation mobile communication network (2G, 3G, 4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some implementations In example, radio circuit 1004 can also include that NFC (Near Field Communication, wireless near field communication) is related Circuit, the application are not limited this.

Display screen 1005 is for showing UI (User Interface, user interface).The UI may include figure, text, Icon, video and its their any combination.When display screen 1005 is touch display screen, display screen 1005 also there is acquisition to exist The ability of the touch signal on the surface or surface of display screen 1005.The touch signal can be used as control signal and be input to place Reason device 1001 is handled.At this point, display screen 1005 can be also used for providing virtual push button and/or dummy keyboard, it is also referred to as soft to press Button and/or soft keyboard.In some embodiments, display screen 1005 can be one, and the front panel of equipment 1000 is arranged；Another In a little embodiments, display screen 1005 can be at least two, be separately positioned on the different surfaces of equipment 1000 or in foldover design； In still other embodiments, display screen 1005 can be flexible display screen, is arranged on the curved surface of equipment 1000 or folds On face.Even, display screen 1005 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 1005 can be with Using LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) etc. materials preparation.

CCD camera assembly 1006 is for acquiring image or video.Optionally, CCD camera assembly 1006 includes front camera And rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.In In some embodiments, rear camera at least two is that main camera, depth of field camera, wide-angle camera, focal length are taken the photograph respectively As any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide Pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are realized in camera fusion in angle Shooting function.In some embodiments, CCD camera assembly 1006 can also include flash lamp.Flash lamp can be monochromatic temperature flash of light Lamp is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for Light compensation under different-colour.

Voicefrequency circuit 1007 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and It converts sound waves into electric signal and is input to processor 1001 and handled, or be input to radio circuit 1004 to realize that voice is logical Letter.For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of equipment 1000 to be multiple. Microphone can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 1001 or radio frequency will to be come from The electric signal of circuit 1004 is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramics loudspeaking Device.When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, can also be incited somebody to action Electric signal is converted to the sound wave that the mankind do not hear to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 1007 may be used also To include earphone jack.

Positioning component 1008 is used for the current geographic position of positioning device 1000, to realize navigation or LBS (Location Based Service, location based service).Positioning component 1008 can be the GPS (Global based on the U.S. Positioning System, global positioning system), China dipper system or Russia Galileo system positioning group Part.

Power supply 1009 is used to be powered for the various components in equipment 1000.Power supply 1009 can be alternating current, direct current Electricity, disposable battery or rechargeable battery.When power supply 1009 includes rechargeable battery, which can be line charge Battery or wireless charging battery.Wired charging battery is the battery to be charged by Wireline, and wireless charging battery is to pass through The battery of wireless coil charging.The rechargeable battery can be also used for supporting fast charge technology.

In some embodiments, equipment 1000 further includes having one or more sensors 1010.One or more sensing Device 1010 includes but is not limited to: acceleration transducer 1011, gyro sensor 1012, pressure sensor 1013, fingerprint sensing Device 1014, optical sensor 1015 and proximity sensor 1016.

Acceleration transducer 1011 can detecte the acceleration in three reference axis of the coordinate system established with equipment 1000 Size.For example, acceleration transducer 1011 can be used for detecting component of the acceleration of gravity in three reference axis.Processor The 1001 acceleration of gravity signals that can be acquired according to acceleration transducer 1011, control touch display screen 1005 with transverse views Or longitudinal view carries out the display of user interface.Acceleration transducer 1011 can be also used for game or the exercise data of user Acquisition.

Gyro sensor 1012 can detecte body direction and the rotational angle of equipment 1000, gyro sensor 1012 Acquisition user can be cooperateed with to act the 3D of equipment 1000 with acceleration transducer 1011.Processor 1001 is according to gyro sensors The data that device 1012 acquires, following function may be implemented: action induction (for example changing UI according to the tilt operation of user) is clapped Image stabilization, game control and inertial navigation when taking the photograph.

The lower layer of side frame and/or touch display screen 1005 in equipment 1000 can be set in pressure sensor 1013.When When the side frame of equipment 1000 is arranged in pressure sensor 1013, user can detecte to the gripping signal of equipment 1000, by Reason device 1001 carries out right-hand man's identification or prompt operation according to the gripping signal that pressure sensor 1013 acquires.Work as pressure sensor 1013 when being arranged in the lower layer of touch display screen 1005, is grasped by processor 1001 according to pressure of the user to touch display screen 1005 Make, realization controls the operability control on the interface UI.Operability control include button control, scroll bar control, At least one of icon control, menu control.

Fingerprint sensor 1014 is used to acquire the fingerprint of user, is collected by processor 1001 according to fingerprint sensor 1014 Fingerprint recognition user identity, alternatively, by fingerprint sensor 1014 according to the identity of collected fingerprint recognition user.Knowing Not Chu the identity of user when being trusted identity, authorize the user to execute relevant sensitive operation by processor 1001, which grasps Make to include solving lock screen, checking encryption information, downloading software, payment and change setting etc..Fingerprint sensor 1014 can be set Install standby 1000 front, the back side or side.When being provided with physical button or manufacturer Logo in equipment 1000, fingerprint sensor 1014 can integrate with physical button or manufacturer Logo.

Optical sensor 1015 is for acquiring ambient light intensity.In one embodiment, processor 1001 can be according to light The ambient light intensity that sensor 1015 acquires is learned, the display brightness of touch display screen 1005 is controlled.Specifically, work as ambient light intensity When higher, the display brightness of touch display screen 1005 is turned up；When ambient light intensity is lower, the aobvious of touch display screen 1005 is turned down Show brightness.In another embodiment, the ambient light intensity that processor 1001 can also be acquired according to optical sensor 1015, is moved The acquisition parameters of state adjustment CCD camera assembly 1006.

Proximity sensor 1016, also referred to as range sensor are generally arranged at the front panel of equipment 1000.Proximity sensor 1016 for acquiring the distance between the front of user Yu equipment 1000.In one embodiment, when proximity sensor 1016 is examined When measuring the distance between the front of user and equipment 1000 and gradually becoming smaller, by processor 1001 control touch display screen 1005 from Bright screen state is switched to breath screen state；When proximity sensor 1016 detect the distance between front of user and equipment 1000 by When gradual change is big, touch display screen 1005 is controlled by processor 1001 and is switched to bright screen state from breath screen state.

In the exemplary embodiment, a kind of audio-frequency data processing system is additionally provided, which includes the first equipment and the Two equipment, when the first equipment is remote equipment, the second equipment is proximal device；When the first equipment is proximal device, second Equipment is remote equipment.Wherein, the first equipment is used to send reference audio data to the second equipment, and the second equipment is for receiving After the reference audio data sent to the first equipment, audio data processing method described in above-described embodiment is executed.

In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, the memory for example including instruction, Above-metioned instruction can be executed by the processor in terminal to complete the audio data processing method in above-described embodiment.For example, described Computer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage Equipment etc..

It, can be with it will be understood by those skilled in the art that structure shown in Figure 10 does not constitute the restriction to equipment 1000 Including than illustrating more or fewer components, perhaps combining certain components or being arranged using different components.

The foregoing is merely the preferred embodiments of the application, not to limit the application, it is all in spirit herein and Within principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.

Claims

1. a kind of audio data processing method, which is characterized in that the described method includes:

If there are beats between the reference audio data and the acquisition audio data, to the storage reference audio number According to caching in N number of blank audio data frame is written, and read N number of reference audio data frame, the blank audio data frame is The audio data frame that volume is zero, N are the positive integer not less than 1；

Read N number of acquisition audio data frame, based on N number of reference audio data frame and N number of acquisition audio data frame into Row echo cancellation process.

2. the method according to claim 1, wherein it is described judgement storage reference audio data with it is collected It acquires and whether there is beat between audio data, comprising:

Obtain the playing duration of the reference audio data of storage；

If the playing duration is less than the targets threshold, it is determined that the reference audio data and the acquisition audio data Between there are beats.

3. according to the method described in claim 2, it is characterized in that, the playing duration of each audio data frame is equal, the mesh Mark threshold value is the delay desired value of setting and the sum of the playing duration of N number of audio data frame；

Wherein, the delay desired value refers to the time difference between the first moment and the second moment, and first moment is will At the time of the reference audio data received are stored to the caching, second moment is by the reference audio received At the time of data playback is collected again after coming out.

4. according to the method described in claim 3, it is characterized in that, the method also includes:

5. the method according to claim 3 or 4, which is characterized in that when obtaining the broadcasting of reference audio data of storage Before length, the method also includes:

Based on the delay desired value, by the caching write data pointer by the first position being currently located adjust backward to The second position；

6. the method according to claim 1, wherein the method also includes:

If there are beats between the reference audio data and collected acquisition audio data of storage, cancel audio data reading Extract operation.

7. the method according to claim 1, wherein described write into the caching for storing the reference audio data Enter N number of blank audio data frame, and read N number of reference audio data frame, comprising:

The third place that data pointer is currently located is write in the caching, and N number of blank audio data frame is written backward；

The 4th position that data pointer is currently located is read in the caching reads N number of reference audio data frame backward.

8. a kind of audio-frequency data processing device, which is characterized in that described device includes:

Judgment module, for judge storage reference audio data and collected acquisition audio data between whether there is it is poor It claps；

First processing module, if between the reference audio data and the acquisition audio data there are beat, to It stores and N number of blank audio data frame is written in the caching of the reference audio data, and read N number of reference audio data frame, institute Stating blank audio data frame is the audio data frame that volume is zero, and N is the positive integer not less than 1；

Second processing module, for reading N number of acquisition audio data frame, based on N number of reference audio data frame and described N number of It acquires audio data frame and carries out echo cancellation process.

9. device according to claim 8, which is characterized in that the judgment module, for obtaining the reference audio of storage The playing duration of data；Judge whether the playing duration is less than the targets threshold of setting；If the playing duration is less than institute State targets threshold, it is determined that there are beats between the reference audio data and the acquisition audio data.

10. device according to claim 9, which is characterized in that the playing duration of each audio data frame is equal, the mesh Mark threshold value is the delay desired value of setting and the sum of the playing duration of N number of audio data frame；

11. device according to claim 10, which is characterized in that described device further include:

Setup module, for the delay desired value to be arranged when starting call；Or, the delay is arranged in communication process Desired value.

12. device described in 0 or 11 according to claim 1, which is characterized in that described device further include:

Third processing module will write data pointer by being currently located in the caching for being based on the delay desired value First position is adjusted backward to the second position；In the first position to the audio that write-in volume is zero between the second position Data.

13. device according to claim 8, which is characterized in that the Second processing module, if the ginseng for being also used to store It examines between audio data and collected acquisition audio data that there are beats, then cancels audio data read operation.

14. device according to claim 8, which is characterized in that the first processing module, for being write in the caching N number of blank audio data frame is written in the third place that data pointer is currently located backward；Data are read in the caching to refer to N number of reference audio data frame is read backward in the 4th position that needle is currently located.

15. a kind of audio-frequency data processing system, which is characterized in that the system comprises the first equipment and the second equipment；

First equipment is used to send reference audio data to second equipment, and second equipment is used for receiving After the reference audio data for stating the transmission of the first equipment, audio data processing described in any one of the claims 1-7 is executed Method.