WO2018040432A1 - 一种音频通话的实现方法、系统及智能会议设备 - Google Patents

一种音频通话的实现方法、系统及智能会议设备 Download PDF

Info

Publication number
WO2018040432A1
WO2018040432A1 PCT/CN2016/113264 CN2016113264W WO2018040432A1 WO 2018040432 A1 WO2018040432 A1 WO 2018040432A1 CN 2016113264 W CN2016113264 W CN 2016113264W WO 2018040432 A1 WO2018040432 A1 WO 2018040432A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal data
data
audio
buffer area
signal
Prior art date
Application number
PCT/CN2016/113264
Other languages
English (en)
French (fr)
Inventor
刘荣
Original Assignee
广州视源电子科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司 filed Critical 广州视源电子科技股份有限公司
Publication of WO2018040432A1 publication Critical patent/WO2018040432A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/764Media network packet handling at the destination 

Definitions

  • the embodiments of the present invention relate to the field of voice signal processing technologies, and in particular, to a method, a system, and an intelligent conference device for implementing an audio call.
  • a voice call between the local device and the peer device is implemented mainly according to a voice call system integrated in the electronic device.
  • the implementation process of the current voice call can be described as: receiving the voice signal data sent by the opposite end and buffering it into the play buffer area, then reading the voice signal data from the play buffer area and playing the audio signal data based on the speaker by calling the system's play interface, Simultaneously writing the played audio signal data into the reference buffer area; then acquiring the input signal data based on the microphone pickup by calling the audio collection interface of the system, and writing the acquired input signal data into the input buffer area, wherein the input signal
  • the data includes an echo signal formed after the voice signal data is played and a voice signal data generated by the local speaker; and then a certain length of signal data is read from the reference buffer area and the input buffer area to perform echo cancellation processing, and finally the echo is eliminated.
  • the processed signal data is sent to the opposite end.
  • the voice signal data sent by the opposite end cannot be normally received. If there is not enough voice signal data in the playback buffer area, the playback delay will be caused, and the audio signal data is also written to the reference buffer area. Continuity.
  • the signal data subjected to the echo cancellation processing is mainly read from the reference buffer area and the input buffer area. If the audio signal data cannot be continuously written to the reference buffer area, the echo cancellation may not be performed normally, thereby affecting the processing time of the echo cancellation. And processing efficiency, while reducing the stability of echo cancellation performance, which in turn affects the quality of the call during the entire call.
  • the invention provides a method, a system and an intelligent conference device for implementing an audio call, which ensures the continuity of signal data for echo cancellation processing, reduces the processing time of echo cancellation, and provides a basis for improving the stability of echo cancellation performance. .
  • an embodiment of the present invention provides a method for implementing an audio call, where the method includes:
  • an embodiment of the present invention further provides an implementation system for an audio call, where the system includes:
  • the signal data receiving module is configured to receive the audio signal data sent by the opposite end and cache the data in the play buffer after establishing a call connection between the local end and the opposite end;
  • a signal data playing module configured to read audio signal data of a set data length from the play buffer for playing, and write the audio signal data to a reference buffer, if an audio signal in the play buffer If the data is insufficient to set the data length, the silent signal of the set data length is played and the silent signal is written into the reference buffer;
  • a signal data acquisition module configured to acquire input signal data picked up based on the audio input device, and write the input signal data into the input buffer area;
  • the echo cancellation processing module is configured to perform echo cancellation processing on the signal data of the set data length read from the reference buffer area and the input buffer area.
  • the embodiment of the present invention further provides an intelligent conference device, which integrates an implementation system of an audio call provided by an embodiment of the present invention.
  • the present invention provides a method, system and intelligent conference device for implementing an audio call.
  • the method first receives audio signal data sent by a peer end and writes it into a play buffer area; and then reads audio of a set data length from a play buffer area.
  • the signal data is played, and the audio signal data is written into the reference buffer area. If the audio signal data in the play buffer area is less than the set data length, the silent signal of the set data length is played and the silent signal is written into the reference.
  • a buffer area afterwards, acquiring input signal data picked up based on the audio input device, and writing the input signal data into the input buffer area; finally echoing the signal data of the set data length read from the reference buffer area and the input buffer area Eliminate processing.
  • FIG. 1 is a flowchart of a method for implementing an audio call according to Embodiment 1 of the present invention
  • FIG. 2 is a flowchart of a method for implementing an audio call according to Embodiment 2 of the present invention
  • FIG. 3 is a flowchart of a method for implementing an audio call according to Embodiment 3 of the present invention.
  • FIG. 4 is a structural block diagram of an implementation system of an audio call according to Embodiment 4 of the present invention.
  • FIG. 1 is a flowchart of a method for implementing an audio call according to Embodiment 1 of the present invention.
  • the present embodiment is applicable to a situation in which a voice call is performed based on an electronic device having a call function, and the method may be implemented by an audio call implementation system.
  • the system can be implemented by hardware and/or software, and can generally be integrated into an electronic device having a voice call function.
  • the embodiment of the present invention provides a method for implementing an audio call to solve a reference buffer.
  • the electronic device may specifically refer to a device capable of implementing a voice call, such as a mobile phone, a computer, or a smart conference device.
  • the method for implementing an audio call includes:
  • the local end and the opposite end can both refer to an electronic device having a call function. It can be understood that the voice call can be performed only after the local end establishes a call connection with the opposite end based on the corresponding call protocol. Specifically, after the local end establishes a call connection with the opposite end, the local end can receive the audio signal data transmitted by the peer end based on the network. It should be noted that the received audio signal data is not directly played by the local audio output device. Rather, it needs to be cached to the set play buffer.
  • the audio output device may specifically refer to an audio play device for playing audio data, such as an earpiece and a speaker in the electronic device.
  • S102 Read audio signal data of a set data length from the play buffer area for playing, and write the audio signal data to a reference buffer area, if the audio signal data in the play buffer area is insufficient for setting data. For the length, a silent signal of the set data length is played and the silent signal is written to the reference buffer.
  • the audio signal data of the set data length can be read from the play buffer area and the audio output device is based on the play interface of the calling system.
  • the audio signal data is played, and the audio signal data is buffered into a set reference buffer while the audio signal data is being played.
  • the transmission of the audio signal data may be affected, so that the audio signal data cannot be buffered into the play buffer area in time, thereby being played in the play buffer area.
  • the audio signal data cannot be played normally, and the audio output device can be played back the silent signal data and played.
  • the silent signal data is written into the reference buffer, thereby ensuring continuity of signal data buffered to the reference buffer.
  • the data length of the audio signal data in the play buffer area may be less than the set data length. The signal data, when the data length of the audio signal data is not less than the set data length, the audio signal data can be played normally.
  • the silent signal is specifically understood as signal data having a value of 0, and the set data length may be an artificial setting or a system default length.
  • the setting data length may be optimized.
  • the number of sampling points included in the unit frame may be determined on the premise of the known sampling rate, and the number of sampling points may be regarded as the data length corresponding to the unit frame, assuming that the adoption rate is 16 kHz, and the unit frame is set. If the duration is 10 ms, the number of sampling points included in the unit frame is 160, and the value of the set data length is considered to be 160.
  • the signal data played by the audio input device may form corresponding associated signal data under the interference of the external environment.
  • the associated signal data may specifically be an echo generated by the played signal data based on the external environment. Signal data.
  • the associated signal data can be picked up again by the audio input device.
  • the audio data generated by the local speaker is also picked up by the audio input device.
  • the signal data picked up based on the audio input device is called
  • the audio input device may specifically refer to a device such as a microphone of a electronic device and a microphone capable of audio pickup.
  • the process of performing echo cancellation may be described as: first, adaptively filtering signal data of a set data length acquired from the reference buffer area to obtain analog signal data corresponding to a set data length, where The signal data may be a silent signal or a normal audio signal data; after that, the input signal data of the set data length acquired from the input buffer area is subtracted from the analog signal data to obtain echo cancellation. After the signal data to be transmitted, the current echo cancellation processing is completed.
  • the input signal data includes the associated signal data and the audio data of the local speaker, and the opposite end only needs the audio data of the local speaker, and therefore, the associated signal needs to be eliminated.
  • the analog signal data obtained based on the adaptive filtering can be equated with the associated signal data, so that the input signal can be eliminated as long as the input signal data is subtracted from the analog signal data.
  • Correlation signal data in the data, wherein the signal data obtained after the echo cancellation processing can be referred to as signal data to be transmitted.
  • the method for implementing an audio call firstly receives audio signal data sent by the opposite end and writes the data into the play buffer area; and then reads the audio signal data of the set data length from the play buffer area for playing, And writing audio signal data to the reference buffer area, if the audio signal data in the play buffer area is less than the set data length, playing a silent signal of the set data length and writing the silent signal into the reference buffer; after that, Acquiring input signal data picked up based on the audio input device, and writing the input signal data into the input buffer area; finally performing echo cancellation processing on the signal data of the set data length read from the reference buffer area and the input buffer area.
  • the embodiment further optimizes the subsequent operation steps when the local end and the opposite end perform an audio call.
  • the method further includes: writing the to-be-transmitted signal data obtained after the echo cancellation processing The buffer area to be sent is sent, and the to-be-sent signal data read from the buffer to be transmitted is sent to the opposite end.
  • the continuity of the signal data buffered to the reference buffer area can be ensured, and then the signal data can be continuously read from the reference buffer area based on step S103.
  • the to-be-sent data is obtained, and the to-be-sent data cannot be directly sent to the peer end, and the to-be-sent signal data needs to be cached first to the set to-be-sent buffer.
  • the to-be-transmitted signal data of the set data length is read from the buffer to be sent, and finally the read signal data to be sent is sent to the opposite end.
  • FIG. 2 is a flowchart of a method for implementing an audio call according to Embodiment 2 of the present invention.
  • the second embodiment of the present invention is optimized based on the foregoing embodiment. In this embodiment, the optimization is added: before the local end establishes a call connection with the opposite end, determining an echo delay based on the set audio test signal, where The audio test signal is at least one single frequency signal.
  • determining the echo delay based on the set audio test signal includes: reading signal data including the audio test signal from the play buffer to play before establishing a call connection between the local end and the opposite end And writing the signal data to the reference buffer; acquiring input test signal data picked up based on the audio input device, and writing the input test signal data to the input buffer a region, wherein the input test signal data includes associated signal data of the audio test signal; determining current time information corresponding to when the audio test signal is searched in the reference buffer, and recording the first time information; And determining, according to the first time information and the second time information, the echo delay.
  • the method further includes: deleting signal data of the set number of frames before the signal data corresponding to the second time information in the input buffer area And performing echo cancellation processing on the associated signal data in the input buffer and the audio test signal in the reference buffer; wherein the set number of frames is equal to the number of time frames corresponding to the echo delay.
  • a method for implementing an audio call according to Embodiment 2 of the present invention specifically includes the following operations:
  • steps S201 to S205 specifically describe the determination operation of the echo delay. It should be noted that the determination of the echo delay in this embodiment is mainly performed after the local end initiates the voice call function and before establishing the call connection with the opposite end, and is mainly implemented based on the set audio test signal.
  • the audio signal data including the audio test signal is first sent, and based on the call principle described in the first embodiment, the audio signal data is stored in a play.
  • the audio signal data is read from the play buffer and played by the audio output device, and the audio signal data is played based on the audio output device, The audio signal data is stored and stored in the reference Check the buffer.
  • the audio test signal is specifically a preset at least one single frequency signal
  • the single frequency signal may be an audio signal of any single frequency and is generally stored in a set signal buffer.
  • the single-frequency signal has the characteristics of short cycle and simple fluctuation, so it is suitable for use as an audio test signal.
  • the audio signal data is specifically understood as signal data that can be played back from the audio output device, which is required for determining the echo delay. It should be noted that, in order to ensure effective measurement of the echo delay, the audio signal is used in this embodiment.
  • the data also includes a piece of silent signal data.
  • the audio test signal may form corresponding associated signal data, and the associated signal data may be picked up by the audio input device and picked up by the audio output device.
  • the other signal data together is referred to as input test signal data.
  • the input test signal data is first stored in a set input buffer, and then the input test is read from the input buffer in a set manner.
  • the signal data is also used for echo cancellation processing.
  • the echo delay is specifically understood to be formed between the time when the audio signal data is written into the reference buffer and the time when the corresponding associated signal data is written into the input buffer.
  • the time difference which may be acquired based on the determination operation of the echo delay described in steps S201 to S205. Specifically, it is known by steps S203 and S204 that, in order to determine the echo delay, it is first necessary to separately read signal data of a set length in the reference buffer area and the input buffer area; and then in the read signal data.
  • the time information recorded in the signal is specifically a relative time period information, and the time period information is a time at which the audio test signal or the associated signal data is searched for in the currently read signal data and starts in the buffer area.
  • the time difference of the time at which the signal data is read (refer to the buffer area or the input buffer area).
  • the time period information may be described based on a specific time, such as the time period information being the time when the audio test signal is searched and the first time reading the signal data from the buffer area (reference buffer area or input buffer area).
  • the difference between the times; the time period information may also be described based on a time frame (each frame duration is T, for example, 10 ms), such as the first time reading the signal data from the buffer area as the 0th frame, the time period information The current frame number corresponding to the audio test signal.
  • the time difference between the second time information and the first time information may be determined. For example, when the first time information and the second time information are described based on a specific time, the time period corresponding to the first time information and the time period corresponding to the second time information may be determined by The difference between the two time periods may be determined as the echo delay; for example, when the first time information and the second time information are described based on the time frame, the time corresponding to the first time information may be determined. The number of frames and the number of time frames corresponding to the second time information, whereby the difference between the two time frame numbers can be determined as the echo delay.
  • the time when the associated signal data of the audio test signal enters the input buffer is delayed after entering the reference buffer, that is, reading in the reference buffer.
  • the associated signal data cannot be read in the input buffer. Therefore, it is necessary to delete the signal data corresponding to the echo extension degree before the associated signal data in the input buffer area, so as to ensure that the echo cancellation module reads the audio test signal in the reference buffer area as well
  • the associated signal data is read in the input buffer, thereby eliminating the effect of echo delay on the voice call.
  • the length of the signal data buffered in the reference buffer area is greater than the length of the signal data buffered in the input buffer area by the data length corresponding to the echo delay. This can ensure that the audio signal data in the reference buffer and the associated signal data in the input buffer can be simultaneously read and used for echo cancellation processing when the actual call is made based on the voice call system, thereby reducing the The processing time of the adaptive filter to the audio signal data, thereby ensuring the efficiency of the echo cancellation.
  • the foregoing steps S201 to S206 are equivalent to performing a pre-processing operation of the audio call.
  • the local end can establish a call connection with the opposite end, and start a normal audio call based on steps S207 to S210. Operation, and can ensure that the audio signal data in the reference buffer and the associated signal data in the input buffer can be simultaneously read and subjected to echo cancellation processing.
  • S208 Read audio signal data of a set data length from the play buffer area for playing, and write the audio signal data to a reference buffer area, if the audio signal data in the play buffer area is insufficient for setting data. Length, playing a silent signal of a set data length and writing the silent signal to the reference buffer;
  • the method for implementing an audio call according to the second embodiment of the present invention specifically increases the process of determining the echo delay to determine the time difference between when the signal data is written into the reference buffer during the audio call and when the input buffer is written.
  • the deletion operation of the signal data is optimized after the echo delay is determined, thereby ensuring that the associated signal data in the reference buffer and the input buffer are simultaneously read and simultaneously subjected to echo cancellation processing.
  • not only the buffered signal in the reference buffer is eliminated.
  • the effect of the data on the echo cancellation process also reduces the impact of the echo delay on the echo cancellation process, thereby improving the processing efficiency of the echo cancellation, thereby improving the performance of the entire audio call.
  • FIG. 3 is a flowchart of a method for implementing an audio call according to Embodiment 3 of the present invention.
  • the embodiment of the present invention is optimized based on the foregoing embodiment.
  • processing signal data in the reference buffer and/or the input buffer to maintain a difference in data length between the reference buffer and signal data in the input buffer.
  • the signal data in the reference buffer area and/or the input buffer area including: acquiring data length information of the reference data buffer having signal data in a current processing period, and Entering data buffer length data of the signal data in the processing period; determining data of the buffered signal data of the reference buffer area and the input buffer area based on the reference buffer area and the data length information corresponding to the input buffer area Length difference information; determining a comparison result of the data length difference information and the set standard data length difference information, and based on the comparison result, the signal stored in the reference buffer area and/or the input buffer area The data is resampled.
  • a method for implementing an audio call according to Embodiment 3 of the present invention specifically includes the following operations:
  • S301 Before the local end establishes a call connection with the opposite end, determining an echo delay based on the set audio test signal, where the audio test signal is at least one single frequency signal.
  • the echo delay at the time of an audio call is first determined based on the set audio test signal.
  • the signal data of the set length before the associated signal data may be deleted.
  • the set length may be a corresponding data length in the time period of the echo delay, for example, determining that the echo delay is two unit frames, and assuming that the data length of the unit frame is 160, then two The data length of the unit frame is 320, so it is necessary to delete 320 signal data before the associated signal data.
  • S304 Read audio signal data of a set data length from the play buffer area for playing, and write the audio signal data to a reference buffer area, if the audio signal data in the play buffer area is insufficient for setting data. For the length, a silent signal of the set data length is played and the silent signal is written to the reference buffer.
  • the input signal data may include associated signal data associated with the played signal data and audio data generated by the local speaker.
  • steps S305 to S307 provide specific operations for performing signal data processing on the reference buffer area and/or the input buffer area in real time.
  • the processing cycle can be specifically understood as a preset time period (can be expressed in basic time units (such as seconds);
  • the frame is expressed in units. It is mainly used to count the data length information of the reference buffer area and the input buffer area in a time period.
  • the data length information may be a total data length obtained by accumulating data lengths in a buffer area (reference buffer area and an input buffer area) in a current processing period, or may be accumulated in a buffer area in a current processing period.
  • the data length is mainly represented by the number of discrete signal data that the buffer has.
  • S307. Determine data length difference information of the buffered signal data of the reference buffer area and the input buffer area based on the reference buffer area and the data length information corresponding to the input buffer area.
  • the data length difference information is equal to a difference between the data length information of the reference buffer area and the data length information of the input buffer area.
  • the data length difference information is a total data length corresponding to the reference buffer area and the input buffer area in the processing period. a difference between the values; when the data length information is an average value of the data length determined after the data length is accumulated in the processing period, the data length difference information is the reference buffer area and the input buffer area in the processing period The difference between the corresponding data length averages.
  • the standard data length difference information can be specifically understood as a standard difference value of data length information of two buffer areas preset, and can also be understood as being based on two buffer areas during processing of signal data.
  • the difference value determined by the data length information for example, the data length difference information obtained for the first time can be determined as the standard data length difference information when the signal data processing is performed.
  • the standard data is long
  • the setting of the difference information is different based on the specific content indicated by the data length difference information, for example, when the specific content indicated by the data length difference information is the difference of the average value of the data length, the standard data
  • the set value of the length difference information is relatively smaller than the set value corresponding to the difference in the data length difference information as the total data length.
  • the difference between the data length difference information and the standard data length difference information may be used as a comparison result, and may be stored in the reference buffer or based on the determined comparison result pair.
  • the signal data in the input buffer area is subjected to resampling processing, thereby ensuring that the data length difference between the reference buffer area and the signal data in the input buffer area remains unchanged.
  • the resampling can be specifically understood as re-sampling the signal data with different sampling rates, so that the sampling rate different signal data can be converted into the same sampling rate, including down sampling and upsampling.
  • the sampling rate difference between the two signals is small, one of the signal data may be downsampled or used for the other signal data. Upsampling is performed by inserting signal data at intervals. Since the difference between the sampling rate corresponding to the reference buffer area and the input buffer area is small in this embodiment, the resampling processing can be performed in the above two manners.
  • S310 Write the to-be-sent signal data obtained after the echo cancellation process to the to-be-sent buffer area, and send the to-be-sent signal data read from the to-be-sent buffer area to the opposite end.
  • the method for implementing an audio call according to Embodiment 3 of the present invention optimizes the processing of signal data in the reference buffer area or the input buffer area before performing echo cancellation processing on the signal data in the reference buffer area and the input buffer area. Operation, thereby being able to correspond to the reference buffer and the input buffer.
  • the data length difference of the signal data in the reference buffer area and the input buffer area can be kept unchanged, thereby ensuring that the signal data of the echo cancellation in the two buffer areas can be echoed simultaneously in the audio call in real time. Eliminate processing.
  • the influence of the buffered signal data in the reference buffer area on the echo cancellation processing is eliminated, the influence of the echo delay on the echo cancellation processing is reduced, and the signal data for echo cancellation during the audio call can be simultaneously ensured in real time.
  • the echo cancellation processing is performed, thereby improving the processing efficiency of the echo cancellation, thereby improving the performance of the entire audio call.
  • FIG. 4 is a structural block diagram of an implementation system for an audio call according to Embodiment 4 of the present invention.
  • the present embodiment is applicable to a case where a voice call is performed based on an electronic device having a call function, and the system can be implemented by hardware and/or software.
  • the method is implemented and generally integrated into an electronic device having a voice call function.
  • the implementation system first includes a signal data receiving module 42, a signal data playing module 43, a signal data collecting module 44, and an echo cancellation processing module 45.
  • the signal data receiving module 42 is configured to: after establishing a call connection between the local end and the opposite end, receiving audio signal data sent by the opposite end and buffering the data into the play buffer area;
  • the signal data playing module 43 is configured to read the audio signal data of the set data length from the play buffer for playing, and write the audio signal data to the reference buffer, if the audio in the play buffer If the signal data is insufficient to set the data length, playing a silent signal of the set data length and writing the silent signal into the reference buffer area;
  • the signal data acquisition module 44 is configured to acquire input signal data picked up based on the audio input device, and write the input signal data into the input buffer area;
  • An echo cancellation processing module 45 configured to read from the reference buffer area and the input buffer area The signal data of the set data length is subjected to echo cancellation processing.
  • the system first receives the audio signal data sent by the opposite end through the signal data receiving module 42 after establishing the call connection between the local end and the opposite end, and buffers the data into the play buffer area;
  • the playback buffer reads the audio signal data of the set data length for playing, and writes the audio signal data to the reference buffer area, and if the audio signal data in the play buffer area is less than the set data length, playing Setting a silent signal of data length and writing the silent signal to the reference buffer; then acquiring input signal data picked up based on the audio input device by the signal data acquisition module 44, and writing the input signal data to the input buffer
  • the echo cancellation processing module 45 then performs echo cancellation processing on the signal data of the set data length read from the reference buffer and the input buffer.
  • the implementation system of the audio call provided by the embodiment of the invention can ensure that the signal data is continuously written into the reference buffer area in real time, thereby ensuring the continuity of the signal data for performing the echo cancellation processing, thereby reducing the processing of echo cancellation.
  • Time while improving the processing efficiency of echo cancellation, provides a basis for improving the stability of echo cancellation performance.
  • a signal data sending module 46 configured to write the to-be-sent signal data obtained after the echo canceling process into the buffer to be sent, and The peer sends the to-be-sent signal data read from the buffer to be transmitted.
  • an echo delay determination module 40 wherein the echo delay determination module 40 is configured to determine an echo based on the set audio test signal before the local end establishes a call connection with the opposite end.
  • the delay, wherein the audio test signal is at least one single frequency signal.
  • the echo delay determination module 40 is specifically configured to: read the number of signals including the audio test signal from the play buffer before establishing a call connection between the local end and the opposite end. Playing, and writing the signal data into the reference buffer; acquiring input test signal data picked up based on the audio input device, and writing the input test signal data to the input buffer area, wherein The input test signal data includes associated signal data of the audio test signal; determining current time information corresponding to when the audio test signal is searched in the reference buffer, as first time information; determining in the input buffer The current time information corresponding to the search for the associated signal data in the area is recorded as the second time information; and the echo delay is determined based on the first time information and the second time information.
  • the system also optimizes the addition of the signal data deletion module 41, wherein the signal data deletion module 41 is configured to delete the input buffer after determining the echo delay based on the set audio test signal.
  • the signal data deletion module 41 is configured to delete the input buffer after determining the echo delay based on the set audio test signal.
  • the set number of frames is equal to the number of time frames corresponding to the echo delay.
  • the system is further optimized to add a signal data processing module 47, wherein the signal data processing module 47 is configured to set a data length from the reference buffer and the input buffer. Processing signal data in the reference buffer area and/or the input buffer area before the signal data is subjected to echo cancellation processing, so that the data length difference between the reference buffer area and the signal data in the input buffer area remains unchanged .
  • the signal data processing module 47 is specifically configured to: obtain data length information of the signal data that the reference buffer has in the current processing period, and the input buffer area is in the processing cycle. Data length information having signal data; determining a cached message of the reference buffer area and the input buffer area based on the reference buffer area and data length information corresponding to the input buffer area Data length difference information of the number data; determining a comparison result of the data length difference information and the set standard data length difference information, based on the comparison result pair being stored in the reference buffer area and/or the input buffer
  • the signal data in the zone is subjected to resampling processing.
  • the fifth embodiment of the present invention provides an intelligent conference device, which integrates an implementation system of an audio call provided in the fourth embodiment.
  • the audio call implementation method provided by the foregoing Embodiments 1 to 3 can perform an audio call with other electronic devices.
  • the intelligent conference device has a call function because the implementation system of the audio call provided by the embodiment of the present invention is integrated in the smart conference device.
  • the intelligent conference device also has an audio output device and an audio output device, and performs corresponding operations mainly by calling respective interfaces.
  • an audio call can be performed with other electronic devices having a call function, and the signal data can be continuously written into the reference in real time.
  • the buffer area ensures the continuity of the signal data for echo cancellation processing, reduces the processing time of the echo cancellation, improves the processing efficiency of the echo cancellation during the audio call, and further improves the user experience of the intelligent conference device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

本发明公开了一种音频通话的实现方法、系统及智能会议设备。该方法包括:接收对端发送的音频信号数据并缓存到播放缓存区;如果播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将无声信号写入所述参考缓存区;获取基于音频输入设备拾取的输入信号数据并写入输入缓存区;对从参考缓存区及输入缓存区中读取的设定数据长度的信号数据进行回声消除处理。利用该方法,能够实时地保证信号数据连续地写入参考缓存区,由此保证了进行回声消除处理的信号数据的连续性,从而降低了回声消除的处理时间,同时提高了回声消除的处理效率,为提高回声消除性能的稳定性提供了基础。

Description

一种音频通话的实现方法、系统及智能会议设备 技术领域
本发明实施例涉及语音信号处理技术领域,尤其涉及一种音频通话的实现方法、系统及智能会议设备。
背景技术
对于具有通话功能的电子设备而言,主要依据集成在电子设备中的语音通话系统实现本端设备与对端设备的语音通话。当前的语音通话的实现过程可描述为:接收对端发送的语音信号数据并缓存至播放缓存区,然后从播放缓存区读取语音信号数据并通过调用系统的播放接口基于扬声器播放音频信号数据,同时将所播放的音频信号数据写入参考缓存区;之后通过调用系统的音频采集接口获取到基于麦克风拾取的输入信号数据,并将获取的输入信号数据写入输入缓存区,其中,该输入信号数据包括语音信号数据播放后形成了的回声信号和本端说话者产生的语音信号数据;接着分别从参考缓存区和输入缓存区中读取一定长度的信号数据进行回声消除处理,最终将回声消除处理后的信号数据发送给对端。
一般地,在网络不畅时无法正常接收对端发送的语音信号数据,此时如果播放缓存区没有足够的语音信号数据则会造成播放延迟,还会影响向参考缓存区写入音频信号数据的连续性。然而进行回声消除处理的信号数据主要从参考缓存区和输入缓存区中读取,如果不能连续向参考缓存区写入音频信号数据,则有可能无法正常进行回声消除,从而影响回音消除的处理时间及处理效率,同时降低回声消除性能的稳定性,进而影响整个通话过程的通话质量。
发明内容
本发明提供了一种音频通话的实现方法、系统及智能会议设备,保证了进行回声消除处理的信号数据的连续性,降低了回声消除的处理时间,为提高回声消除性能的稳定性提供了基础。
本发明实施例采用以下技术方案:
第一方面,本发明实施例提供了一种音频通话的实现方法,该方法包括:
在本端与对端建立通话连接后,接收对端发送的音频信号数据并缓存到播放缓存区;
从所述播放缓存区读取设定数据长度的音频信号数据进行播放,并将所述音频信号数据写入到参考缓存区,如果所述播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将所述无声信号写入所述参考缓存区;
获取基于音频输入设备拾取的输入信号数据,并将所述输入信号数据写入输入缓存区;对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理。
第二方面,本发明实施例还提供了一种音频通话的实现系统,该系统包括:
信号数据接收模块,用于在本端与对端建立通话连接后,接收对端发送的音频信号数据并缓存到播放缓存区;
信号数据播放模块,用于从所述播放缓存区读取设定数据长度的音频信号数据进行播放,并将所述音频信号数据写入到参考缓存区,如果所述播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将所述无声信号写入所述参考缓存区;
信号数据采集模块,用于获取基于音频输入设备拾取的输入信号数据,并将所述输入信号数据写入输入缓存区;
回声消除处理模块,用于对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理。
第三方面,本发明实施例又提供了一种智能会议设备,该智能会议设备集成了本发明实施例提供的一种音频通话的实现系统。
本发明提供了一种音频通话的实现方法、系统及智能会议设备,该方法首先接收对端发送的音频信号数据,并写入播放缓存区;然后从播放缓存区读取设定数据长度的音频信号数据进行播放,并将音频信号数据写入到参考缓存区,如果播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将所述无声信号写入参考缓存区;之后,获取基于音频输入设备拾取的输入信号数据,并将输入信号数据写入输入缓存区;最终对从参考缓存区及输入缓存区中读取的设定数据长度的信号数据进行回声消除处理。利用该方法,能够实时地保证信号数据连续地写入参考缓存区,由此保证了进行回声消除处理的信号数据的连续性,从而降低了回声消除的处理时间,同时提高了回声消除的处理效率,为提高回声消除性能的稳定性提供了基础。
附图说明
图1为本发明实施例一提供的一种音频通话的实现方法的流程图;
图2为本发明实施例二提供的一种音频通话的实现方法的流程图;
图3为本发明实施例三提供的一种音频通话的实现方法的流程图;
图4为本发明实施例四提供的一种音频通话的实现系统的结构框图。
具体实施方式
下面结合附图和实施例对本发明作进一步地详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部内容。在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。
实施例一
图1为本发明实施例一提供的一种音频通话的实现方法的流程图,本实施例可适用于基于具有通话功能的电子设备进行语音通话的情况,该方法可以由音频通话的实现系统来执行,该系统可通过硬件和/或软件的方式实现,并一般可集成于具有语音通话功能的电子设备中。
一般地,对于具有语音通话功能的电子设备而言,主要基于电子设备中的现有的语音通话方法来实现与其他具有语音通话功能的电子设备的通话,然而,基于现有的语音通话方法通话时,在进行相应的回声消除处理时有可能无法连续地从参考缓存区中读取的信号数据,影响了回声消除的处理时间,进而有可能影响整个语音通话的工作性能。因此,本发明实施例提供了一种音频通话的实现方法,以解决参考缓存区在本实施例中,所述电子设备具体可指手机、电脑、智能会议设备等能够实现语音通话的设备。
如图1所示,本实施例提供的一种音频通话的实现方法,具体包括:
S101、在本端与对端建立通话连接后,接收对端发送的音频信号数据并缓存到播放缓存区。
在本实施例中,所述本端和对端均可指具有通话功能的电子设备,可以理解的是,只有基于相应的通话协议将本端与对端建立通话连接后才可以进行语音通话。具体地,在本端与对端建立通话连接后,本端可以接收对端基于网络传输的音频信号数据,需要说明的是,接收到的音频信号数据并不直接被本端的音频输出设备播放,而是需要先缓存到设定的播放缓存区,其中,所述音频输出设备具体可指电子设备中的听筒和扬声器等用于播放音频数据的音频播放器件。
S102、从所述播放缓存区读取设定数据长度的音频信号数据进行播放,并将所述音频信号数据写入到参考缓存区,如果所述播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将所述无声信号写入所述参考缓存区。
一般地,在将对端发送的音频信号数据缓存至所述播放缓存区之后,可以从所述播放缓存区中读取设定数据长度的音频信号数据并通过调用系统的播放接口基于音频输出设备播放所述音频信号数据,且在播放所述音频信号数据的同时将所述音频信号数据缓存到设定的参考缓存区中。
在本实施例中,如果本端与对端语音通话时的网络不畅,则会影响音频信号数据的传输,使得音频信号数据不能及时缓存到播放缓存区中,由此在播放缓存区中所具有音频信号数据的数据长度小于设定数据长度时,无法正常播放音频信号数据,此时可以让所述音频输出设备播放无声信号数据,并将所播放 的无声信号数据写入所述参考缓存区,由此来确保向所述参考缓存区缓存信号数据的连续性。需要说明的是,对所述无声信号的播放及缓存存在时间及长度限制,具体地,可以在播放缓存区中音频信号数据的数据长度小于设定数据长度播放时开始播放设定数据长度的无声信号数据,当音频信号数据的数据长度不小于设定数据长度时,则可正常进行音频信号数据的播放操作。
在本实施例中,所述无声信号具体可理解为数值为0的信号数据,所述设定数据长度可以是人为设定或系统默认的长度,一般地,可以将所述设定数据长度优选为单位时间帧所对应的数据长度。示例性地,在已知采样率的前提下可以确定单位帧所包含的采样点数,所述采样点数就可以看做所述单位帧对应的数据长度,假设采用率为16KHz,且设定单位帧的时长为10ms,则单位帧所包含的采样点数为160,即可认为所述设定数据长度的值为160。
S103、获取基于音频输入设备拾取的输入信号数据,并将所述输入信号数据写入输入缓存区;对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理。
在本实施例中,基于音频输入设备播放的信号数据会在外界环境的干扰下形成相应的关联信号数据,示例性地,所述关联信号数据具体可以是被播放信号数据基于外界环境产生的回声信号数据。
在本实施例中,所述关联信号数据可以再次被音频输入设备拾取,此外,本端说话者产生的音频数据也会被音频输入设备拾取,本实施例将基于音频输入设备拾取的信号数据称为输入信号数据,且所采集的输入信号数据也会被写入设定的输入缓存区,其中,所述音频输入设备具体可指电子设备的话筒以及麦克风等能够进行音频拾取的器件。
在本实施例中,进行回声消除的过程可描述为:首先对从所述参考缓存区中获取的设定数据长度的信号数据进行自适应滤波获得对应的设定数据长度的模拟信号数据,其中,所述信号数据可以是无声信号也可以是正常的音频信号数据;之后,将从所述输入缓存区中获取的设定数据长度的输入信号数据与所述模拟信号数据相减,获得回声消除后的待发送信号数据,由此完成当前的回声消除处理。
可以理解的是,所述输入信号数据中包括了关联信号数据以及本端说话者的音频数据,而对端所需的只是本端说话者的音频数据,因此,而需要消除掉所述关联信号数据,在本实施例中,可以将基于自适应滤波所获得的模拟信号数据等同于所述关联信号数据,因此,只要输入信号数据与所述模拟信号数据相减就可以消除掉所述输入信号数据中的关联信号数据,其中,可以将回声消除处理后获得的信号数据称为待发送信号数据。
本发明实施例一提供的一种音频通话的实现方法,首先接收对端发送的音频信号数据,并写入播放缓存区;然后从播放缓存区读取设定数据长度的音频信号数据进行播放,并将音频信号数据写入到参考缓存区,如果播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将所述无声信号写入参考缓存区;之后,获取基于音频输入设备拾取的输入信号数据,并将输入信号数据写入输入缓存区;最终对从参考缓存区及输入缓存区中读取的设定数据长度的信号数据进行回声消除处理。利用该方法,能够实时地保证信号数据连续地写入参考缓存区,由此保证了进行回声消除处理的信号数据的连续性,从而降低了回声消除的处理时间,同时提高了回声消除的处理效率,为提高回声消除性能的稳定性提供了基础。
在上述实施例的基础上,本实施例还进一步优化增加了本端与对端进行音频通话时的后续操作步骤。具体地,在对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理之后,还包括:将回声消除处理后获得的待发送信号数据写入待发送缓存区,并向对端发送从所述待发送缓存区中读取的待发送信号数据。
在本实施例中,执行上述步骤S102之后,可以保证缓存至所述参考缓存区的信号数据的连续性,之后在基于步骤S103就可以从所述参考缓存区中连续地读取信号数据。具体地,在基于步骤S103进行回声消除处理后,获得了待发送信号数据,所述待发送数据不能直接发送至对端,还需要将所述待发送信号数据先缓存至设定的待发送缓存区中,然后再从所述待发送缓存区中读取设定数据长度的待发送信号数据,最终将所读取的待发送信号数据发送给对端。在完成上述步骤后,就实现了本端与对端的一次音频通话。
实施例二
图2为本发明实施例二提供的一种音频通话的实现方法的流程图。本发明实施例二以上述实施例为基础进行优化,在本实施例中,该优化增加了:在本端与对端建立通话连接之前,基于设定的音频测试信号确定回声时延,其中,所述音频测试信号为至少一路单频信号。
进一步地,所述基于设定的音频测试信号确定回声时延,包括:在本端与对端建立通话连接之前,从所述播放缓存区中读取包含所述音频测试信号的信号数据进行播放,并将所述信号数据写入所述参考缓存区;获取基于音频输入设备拾取的输入测试信号数据,并将所述输入测试信号数据写入所述输入缓存 区,其中,所述输入测试信号数据中包含了所述音频测试信号的关联信号数据;确定在所述参考缓存区中搜索到音频测试信号时对应的当前时间信息,记为第一时间信息;确定在所述输入缓存区中搜索到所述关联信号数据时对应的当前时间信息,记为第二时间信息;基于所述第一时间信息以及所述第二时间信息确定所述回声时延。
在上述优化的基础上,在所述基于设定的音频测试信号确定回声时延之后,还包括:删除所述输入缓存区中第二时间信息所对应信号数据之前的设定帧数的信号数据,以对所述输入缓存区中的关联信号数据与所述参考缓存区中的音频测试信号同时进行回声消除处理;其中,所述设定帧数等于所述回声时延对应的时间帧数。
如图2所示,本发明实施例二提供的一种音频通话的实现方法,具体包括如下操作:
在本实施例中,步骤S201至步骤S205具体阐述了回声时延的确定操作。需要说明的是,本实施例对回声时延的确定主要在本端启动语音通话功能后与对端建立通话连接前进行,且主要基于设定的音频测试信号来实现。
S201、在本端与对端建立通话连接之前,从所述播放缓存区中读取包含所述音频测试信号的信号数据进行播放,并将所述信号数据写入所述参考缓存区。
在本实施例中,在检测到本端启动语音通话功能后,会首先发出包含音频测试信号的音频信号数据,同时基于上述实施例一阐述的通话原理,所述音频信号数据会存放于一个播放缓存区中,之后会从该播放缓存中读取所述音频信号数据并通过音频输出设备播放所述音频信号数据,在基于所述音频输出设备播放所述音频信号数据的同时,还需要获取所述音频信号数据并存放至所述参 考缓存区中。
在本实施例中,所述音频测试信号具体为预先设定的至少一路单频信号,所述单频信号可以是任一路单频率的音频信号并一般存放于设定的信号缓存中,由于所述单频信号具有周期短、波动简单等特点,所以适用于作为音频测试信号。所述音频信号数据具体可理解为确定所述回声时延所需的能够从音频输出设备播放的信号数据,需要说明的是,本实施例为了保证回声时延的有效量测,所述音频信号数据除包含所述音频测试信号外,也包括一段无声信号数据。
S202、获取基于音频输入设备拾取的输入测试信号数据,并将所述输入测试信号数据写入所述输入缓存区,其中,所述输入测试信号数据中包含了所述音频测试信号的关联信号数据。
在本实施例中,基于音频输出设备播放所述音频测试信号后,所述音频测试信号可能会形成相应的关联信号数据,该关联信号数据会被音频输入设备重新拾取并与音频输出设备所拾取的其他信号数据一起称为输入测试信号数据。此外,在拾取所述输入测试信号数据后,所述输入测试信号数据会首先存放于设定的输入缓存区中,之后会以设定的方式从所述输入缓存区中读取所述输入测试信号数据并用于进行回声消除处理。
S203、确定在所述参考缓存区中搜索到音频测试信号时对应的当前时间信息,记为第一时间信息。
S204、确定在所述输入缓存区中搜索到所述关联信号数据时对应的当前时间信息,记为第二时间信息。
一般地,所述回声时延具体可理解为音频信号数据被写入所述参考缓存区对应的时间与相应关联信号数据被写入所述输入缓存区所对应时间之间形成的 时间差,所述时间差可以基于步骤S201至步骤S205所描述的回声时延的确定操作来获取。具体地,由步骤S203以及S204可知,为了确定所述回声时延,首先需要在所述参考缓存区以及所述输入缓存区分别读取设定长度的信号数据;然后在所读取的信号数据中分别搜索所述音频测试信号及所述关联信号数据,如果在参考缓存区中搜索到所述音频测试信号,则记录搜索到所述音频测试信号时的时间信息;同样,如果在输入缓存区中搜索到所述关联信号数据,则记录搜索到所述关联信号数据时的时间信息;最终,基于所记录的两个时间信息,就可以确定所述回声时延。
在本实施例中,由于在上述两缓存区中分别搜索所述音频测试信号及所述关联信号数据时没有限定时间顺序,所以在确定所述回声时延时,需要将搜索到所述音频测试信号时记录的时间信息具体为一个相对的时间段信息,且所述时间段信息为在当前读取的信号数据中搜索到所述音频测试信号或所述关联信号数据的时刻与开始在缓存区(参考缓存区或输入缓存区)中读取信号数据的时刻的时间差。
在本实施例中,可以基于具体时刻来描述所述时间段信息,如时间段信息为搜索到音频测试信号的时刻与首次从缓存区(参考缓存区或输入缓存区)中读取信号数据的时刻之差;还可以基于时间帧(每帧时长为T,例如10ms)来描述所述时间段信息,如将首次从缓存区中读取信号数据记为第0帧,则所述时间段信息为搜索到音频测试信号时对应的当前帧号。
S205、基于所述第一时间信息以及所述第二时间信息确定所述回声时延。
在本实施例中,基于上述步骤S203以及S204分别确定所述第一时间信息以及第二时间信息之后,可以确定所述第二时间信息与第一时间信息的时间差, 示例性地,如,基于具体时刻来描述所述第一时间信息以及第二时间信息时,可以确定所述第一时间信息对应的时间段,以及所述第二时间信息对应的时间段,由此可将上述两个时间段之差确定为所述回声时延;又如,基于时间帧来描述所述第一时间信息以及第二时间信息时,可以确定所述第一时间信息对应的时间帧数,以及所述第二时间信息对应的时间帧数,由此可将上述两个时间帧数之差确定为所述回声时延。
S206、删除所述输入缓存区中第二时间信息所对应信号数据之前的设定帧数的信号数据,以对所述输入缓存区中的关联信号数据与所述参考缓存区中的音频测试信号同时进行回声消除处理。
在本实施例中,通过上述回声时延的确定可知,所述音频测试信号的关联信号数据进入所述输入缓存区的时间延后于进入参考缓存区的时间,即,在参考缓存区中读取到所述音频测试信号时,还不能在所述输入缓存区中读取到所述关联信号数据。因此,需要在所述输入缓存区中删除关联信号数据前与回声时延长度相对应的信号数据,才能保证回音消除模块在参考缓存区中读取到所述音频测试信号的同时也在所述输入缓存区中读取到关联信号数据,由此才能消除回声时延对语音通话的影响。
需要说明的是,基于本实施例进行步骤S312的操作后,参考缓存区中缓存的信号数据长度将比输入缓存区中缓存的信号数据长度多出回声时延所对应的数据长度。这样可以保证在基于语音通话系统进行实际通话时,所述参考缓存区中的音频信号数据与所述输入缓存区中的关联信号数据能够同时被读取并用来进行回声消除处理,由此降低了自适应滤波器对音频信号数据的处理时间,进而保证了回音消除的工作效率。
S207、在本端与对端建立通话连接后,接收对端发送的音频信号数据并缓存到播放缓存区。
在本实施例,上述步骤S201至步骤S206相当于进行音频通话的预处理操作,在完成上述操作后,可以将本端与对端建立通话连接,并基于步骤S207至步骤S210开始正常的音频通话操作,且可以保证所述参考缓存区中的音频信号数据与所述输入缓存区中的关联信号数据能够同时被读取并进行回声消除处理。
S208、从所述播放缓存区读取设定数据长度的音频信号数据进行播放,并将所述音频信号数据写入到参考缓存区,如果所述播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将所述无声信号写入所述参考缓存区;
S209、获取基于音频输入设备拾取的输入信号数据,并将所述输入信号数据写入输入缓存区;对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理。
S210、将回声消除处理后获得的待发送信号数据写入待发送缓存区,并向对端发送从所述待发送缓存区中读取的待发送信号数据。
在本实施例中,上述步骤S207至步骤S210已在实施例一中具体阐述,这里不再赘述。
本发明实施例二提供的一种音频通话的实现方法,具体优化增加了回声时延的确定过程,以确定音频通话过程中信号数据写入参考缓存区时和写入输入缓存区时存在的时间差;同时还在确定回声时延之后优化增加了信号数据的删除操作,由此来保证参考缓存区和输入缓存区中相关联的信号数据同时被读取并同时进行回声消除处理。利用该方法,不仅消除了参考缓存区中所缓存信号 数据对回声消除处理的影响,还降低了回声时延对回声消除处理的影响,从而提高了回声消除的处理效率,进而提高了整个音频通话的工作性能。
实施例三
图3为本发明实施例三提供的一种音频通话的实现方法的流程图。本发明实施例以上述实施例为基础进行优化,在本实施例中,在对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理之前,还优化:对所述参考缓存区和/或输入缓存区中的信号数据进行处理,以使所述参考缓存区与所述输入缓存区中信号数据的数据长度差保持不变。
在上述优化的基础上,对所述参考缓存区和/或输入缓存区中的信号数据进行处理,包括:获取所述参考缓存区在当前处理周期内所具有信号数据的数据长度信息以及所述输入缓存区在所述处理周期内所具有信号数据的数据长度信息;基于所述参考缓存区以及输入缓存区对应的数据长度信息,确定所述参考缓存区与输入缓存区所缓存信号数据的数据长度差信息;确定所述数据长度差信息与设定的标准数据长度差信息的比对结果,基于所述比对结果对存放于所述参考缓存区和/或所述输入缓存区中的信号数据进行重采样处理。
如图3所示,本发明实施例三提供的一种音频通话的实现方法,具体包括如下操作:
S301、在本端与对端建立通话连接之前,基于设定的音频测试信号确定回声时延,其中,所述音频测试信号为至少一路单频信号。
示例性地,首先基于设定的音频测试信号确定音频通话时的回声时延。
S302、从所述输入缓存区中删除与所述回声时延相对应数据长度的信号数 据,以使所述输入缓存区中的关联信号数据与所述参考缓存区中的音频测试信号同时进行回声消除处理。
示例性地,在确定所述输入缓存区中关联信号数据的所在位置后,可以将所述关联信号数据之前的设定长度的信号数据删除。其中,所述设定长度具体可为所述回声时延的时间段内对应的数据长度,如,确定所述回声时延为两个单位帧,假设单位帧的数据长度为160,则两个单位帧的数据长度为320,因此,需要将所述关联信号数据之前的320个信号数据删除。
S303、在本端与对端建立通话连接后,接收对端发送的音频信号数据并缓存到播放缓存区。
S304、从所述播放缓存区读取设定数据长度的音频信号数据进行播放,并将所述音频信号数据写入到参考缓存区,如果所述播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将所述无声信号写入所述参考缓存区。
S305、获取基于音频输入设备拾取的输入信号数据,并将所述输入信号数据写入输入缓存区。
示例性地,所述输入信号数据可以包括与所播放的信号数据相关的关联信号数据以及本端说话者产生的音频数据。
S306、获取所述参考缓存区在当前处理周期内所具有信号数据的数据长度信息以及所述输入缓存区在所述处理周期内所具有信号数据的数据长度信息。
在本实施例中,步骤S305至步骤S307给出了实时对所述参考缓存区和/或所述输入缓存区进行信号数据处理具体操作。其中,所述处理周期具体可理解为预先设定的一个时间段(可以用基本的时间单位(如秒)表示;也可以用 帧为单位表示),主要用于统计参考缓存区及输入缓存区在一个时间段内的数据长度信息。
在本实施例中,所述数据长度信息可以是当前处理周期内缓存区(参考缓存区和输入缓存区)中累加数据长度后得到的总数据长度,也可以是当前处理周期内缓存区中累加数据长度后确定的数据长度平均值。在本实施例中,所述数据长度主要以缓存区所具有离散信号数据的个数表示。
S307、基于所述参考缓存区以及输入缓存区对应的数据长度信息,确定所述参考缓存区与输入缓存区所缓存信号数据的数据长度差信息。
在本实施例中,所述数据长度差信息等于所述参考缓存区的数据长度信息与所述输入缓存区的数据长度信息的差值。具体地,当所述数据长度信息为处理周期内累加数据长度后得到的总数据长度时,所述数据长度差信息为所述参考缓存区与输入缓存区在处理周期内所对应的总数据长度值的差值;当所述数据长度信息为在所述处理周期内累加数据长度后确定的数据长度平均值时,所述数据长度差信息为所述参考缓存区与输入缓存区在处理周期内所对应的数据长度平均值的差值。
S308、确定所述数据长度差信息与设定的标准数据长度差信息的比对结果,基于所述比对结果对存放于所述参考缓存区和/或所述输入缓存区中的信号数据进行重采样处理。
在本实施例中,所述标准数据长度差信息具体可理解为预先设定的两缓存区的数据长度信息的标准差值,也可理解为在进行信号数据的处理过程中基于两缓存区的数据长度信息确定的差值,例如,可以在进行信号数据处理时将首次获取的数据长度差信息确定标准数据长度差信息。此外,对所述标准数据长 度差信息的设定基于所述数据长度差信息所表示具体内容的不同而存在不同,例如,当所述数据长度差信息表示的具体内容为数据长度平均值的差值时,所述标准数据长度差信息的设定值相对小于数据长度差信息为总数据长度的差值时对应的设定值。
在本实施例中,可以将所述数据长度差信息与所述标准数据长度差信息的差值作为比对结果,并且可以基于所确定的比对结果对存放于所述参考缓存区或者存放于输入缓存区中的信号数据进行重采样处理,由此保证所述参考缓存区与所述输入缓存区中信号数据的数据长度差保持不变。
在本实施例中,所述重采样具体可理解为对采样率不同的信号数据重新进行采样,使得采样率不同信号数据的能够转换为相同的采样率,包括降采样和升采样。一般地,对于采样率不同的两路信号数据而言,如果两路信号的采样率差别很小,则可以对其中一路信号数据使用间隔丢弃信号数据的方式进行降采样或对另一路信号数据使用间隔插入信号数据的方式进行升采样。由于本实施例中参考缓存区与输入缓存区所对应采样率的差别很小,因此可以采用上述两种方式进行重采样处理。
S309、对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理。
S310、将回声消除处理后获得的待发送信号数据写入待发送缓存区,并向对端发送从所述待发送缓存区中读取的待发送信号数据。
本发明实施例三提供的一种音频通话的实现方法,在对参考缓存区和输入缓存区中的信号数据进行回声消除处理之前,优化增加了对参考缓存区或输入缓存区中信号数据的处理操作,由此能够在参考缓存区和输入缓存区对应的采 样率不同时让可以保持所述参考缓存区与所述输入缓存区中信号数据的数据长度差保持不变,进而实时地保证音频通话时两缓存区中进行回声消除的信号数据能够同时进行回音消除处理。利用该方法,不仅消除了参考缓存区中所缓存信号数据对回声消除处理的影响,降低了回声时延对回声消除处理的影响,还实时地保证了音频通话时进行回声消除的信号数据能够同时进行回音消除处理,从而提高了回声消除的处理效率,进而提高了整个音频通话的工作性能。
实施例四
图4为本发明实施例四提供的一种音频通话的实现系统的结构框图,本实施例可适用于基于具有通话功能的电子设备进行语音通话的情况,该系统可通过硬件和/或软件的方式实现,并一般可集成于具有语音通话功能的电子设备中。如图4所示,该实现系统首先包括了:信号数据接收模块42、信号数据播放模块43、信号数据采集模块44以及回声消除处理模块45。
其中,信号数据接收模块42,用于在本端与对端建立通话连接后,接收对端发送的音频信号数据并缓存到播放缓存区;
信号数据播放模块43,用于从所述播放缓存区读取设定数据长度的音频信号数据进行播放,并将所述音频信号数据写入到参考缓存区,如果所述播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将所述无声信号写入所述参考缓存区;
信号数据采集模块44,用于获取基于音频输入设备拾取的输入信号数据,并将所述输入信号数据写入输入缓存区;
回声消除处理模块45,用于对从所述参考缓存区及所述输入缓存区中读取 的设定数据长度的信号数据进行回声消除处理。
在本实施例中,该系统首先通过信号数据接收模块42在本端与对端建立通话连接后,接收对端发送的音频信号数据并缓存到播放缓存区;然后通过信号数据播放模块43从所述播放缓存区读取设定数据长度的音频信号数据进行播放,并将所述音频信号数据写入到参考缓存区,如果所述播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将所述无声信号写入所述参考缓存区;之后通过信号数据采集模块44获取基于音频输入设备拾取的输入信号数据,并将所述输入信号数据写入输入缓存区;接着通过回声消除处理模块45对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理。
本发明实施例提供的一种音频通话的实现系统,能够实时地保证信号数据连续地写入参考缓存区,由此保证了进行回声消除处理的信号数据的连续性,从而降低了回声消除的处理时间,同时提高了回声消除的处理效率,为提高回声消除性能的稳定性提供了基础。
在上述实施例的基础上,该系统优化增加了:信号数据发送模块46,其中,信号数据发送模块46,用于将回声消除处理后获得的待发送信号数据写入待发送缓存区,并向对端发送从所述待发送缓存区中读取的待发送信号数据。
进一步地,该系统该优化增加了:回声时延确定模块40,其中,所述回声时延确定模块40,用于在本端与对端建立通话连接之前,基于设定的音频测试信号确定回声时延,其中,所述音频测试信号为至少一路单频信号。
在上述优化的基础上,所述回声时延确定模块40具体用于:在本端与对端建立通话连接之前,从所述播放缓存区中读取包含所述音频测试信号的信号数 据进行播放,并将所述信号数据写入所述参考缓存区;获取基于音频输入设备拾取的输入测试信号数据,并将所述输入测试信号数据写入所述输入缓存区,其中,所述输入测试信号数据中包含了所述音频测试信号的关联信号数据;确定在所述参考缓存区中搜索到音频测试信号时对应的当前时间信息,记为第一时间信息;确定在所述输入缓存区中搜索到所述关联信号数据时对应的当前时间信息,记为第二时间信息;基于所述第一时间信息以及所述第二时间信息确定所述回声时延。
在上述优化的基础上,该系统还优化增加了信号数据删除模块41,其中,所述信号数据删除模块41,用于在基于设定的音频测试信号确定回声时延之后,删除所述输入缓存区中第二时间信息所对应信号数据之前的设定帧数的信号数据,以对所述输入缓存区中的关联信号数据与所述参考缓存区中的音频测试信号同时进行回声消除处理;其中,所述设定帧数等于所述回声时延对应的时间帧数。
进一步地,该系统又优化增加了信号数据处理模块47,其中,所述信号数据处理模块47,用于在对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理之前,对所述参考缓存区和/或输入缓存区中的信号数据进行处理,以使所述参考缓存区与所述输入缓存区中信号数据的数据长度差保持不变。
在上述优化的基础上,所述信号数据处理模块47具体用于:获取所述参考缓存区在当前处理周期内所具有信号数据的数据长度信息以及所述输入缓存区在所述处理周期内所具有信号数据的数据长度信息;基于所述参考缓存区以及输入缓存区对应的数据长度信息,确定所述参考缓存区与输入缓存区所缓存信 号数据的数据长度差信息;确定所述数据长度差信息与设定的标准数据长度差信息的比对结果,基于所述比对结果对存放于所述参考缓存区和/或所述输入缓存区中的信号数据进行重采样处理。
实施例五
本发明实施例五提供了一种智能会议设备,该智能会议设备集成了上述实施例四提供的一种音频通话的实现系统。可以通过上述实施例一至实施例三提供的音频通话的实现方法与其他电子设备进行音频通话。
在本实施例中,由于所述述智能会议设备中集成了本发明实施例提供的音频通话的实现系统,使得所述智能会议设备具有通话功能。同时所述智能会议设备还具备了音频输出设备以及音频输出设备,主要通过调用各自的接口来进行相应的操作。
在所述智能会议设备中集成本发明上述实施例提供的一种音频通话的实现系统之后,能够在与其他具有通话功能的电子设备进行音频通话,并能够实时地保证信号数据连续地写入参考缓存区,由此保证进行回声消除处理的信号数据的连续性,降低回声消除的处理时间,提高音频通话过程中回声消除的处理效率,也进一步提高了智能会议设备的用户体验。
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。

Claims (10)

  1. 一种音频通话的实现方法,其特征在于,包括:
    在本端与对端建立通话连接后,接收对端发送的音频信号数据并缓存到播放缓存区;
    从所述播放缓存区读取设定数据长度的音频信号数据进行播放,并将所述音频信号数据写入到参考缓存区,如果所述播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将所述无声信号写入所述参考缓存区;
    获取基于音频输入设备拾取的输入信号数据,并将所述输入信号数据写入输入缓存区;对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    在本端与对端建立通话连接之前,基于设定的音频测试信号确定回声时延,其中,所述音频测试信号为至少一路单频信号。
  3. 根据权利要求2所述的方法,其特征在于,基于设定的音频测试信号确定回声时延,包括:
    从所述播放缓存区中读取包含所述音频测试信号的信号数据进行播放,并将所述信号数据写入所述参考缓存区;
    获取基于音频输入设备拾取的输入测试信号数据,并将所述输入测试信号数据写入所述输入缓存区,其中,所述输入测试信号数据中包含了所述音频测试信号的关联信号数据;
    确定在所述参考缓存区中搜索到音频测试信号时对应的当前时间信息,记为第一时间信息;
    确定在所述输入缓存区中搜索到所述关联信号数据时对应的当前时间信息,记为第二时间信息;
    基于所述第一时间信息以及所述第二时间信息确定所述回声时延。
  4. 根据权利要求3所述的方法,其特征在于,在所述基于设定的音频测试信号确定回声时延之后,还包括:
    删除所述输入缓存区中第二时间信息所对应信号数据之前的设定帧数的信号数据,以对所述输入缓存区中的关联信号数据与所述参考缓存区中的音频测试信号同时进行回声消除处理;
    其中,所述设定帧数等于所述回声时延对应的时间帧数。
  5. 根据权利要求1所述的方法,其特征在于,在对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理之前,还包括:
    对所述参考缓存区和/或输入缓存区中的信号数据进行处理,以使所述参考缓存区与所述输入缓存区中信号数据的数据长度差保持不变。
  6. 根据权利要求5所述的方法,其特征在于,所述对所述参考缓存区和/或输入缓存区中的信号数据进行处理,包括:
    获取所述参考缓存区在当前处理周期内所具有信号数据的数据长度信息以及所述输入缓存区在所述处理周期内所具有信号数据的数据长度信息;
    基于所述参考缓存区以及输入缓存区对应的数据长度信息,确定所述参考缓存区与输入缓存区所缓存信号数据的数据长度差信息;
    确定所述数据长度差信息与设定的标准数据长度差信息的比对结果,基于所述比对结果对存放于所述参考缓存区和/或所述输入缓存区中的信号数据进 行重采样处理。
  7. 根据权利要求1-6任一所述的方法,其特征在于,还包括:
    将回声消除处理后获得的待发送信号数据写入待发送缓存区,并向对端发送从所述待发送缓存区中读取的待发送信号数据。
  8. 一种音频通话的实现系统,其特征在于,包括:
    信号数据接收模块,用于在本端与对端建立通话连接后,接收对端发送的音频信号数据并缓存到播放缓存区;
    信号数据播放模块,用于从所述播放缓存区读取设定数据长度的音频信号数据进行播放,并将所述音频信号数据写入到参考缓存区,如果所述播放缓存区中的音频信号数据不足设定数据长度,则播放设定数据长度的无声信号并将所述无声信号写入所述参考缓存区;
    信号数据采集模块,用于获取基于音频输入设备拾取的输入信号数据,并将所述输入信号数据写入输入缓存区;
    回声消除处理模块,用于对从所述参考缓存区及所述输入缓存区中读取的设定数据长度的信号数据进行回声消除处理。
  9. 根据权利要求8所述的系统,其特征在于,还包括:
    回声时延确定模块,用于在本端与对端建立通话连接之前,基于设定的音频测试信号确定回声时延,其中,所述音频测试信号为至少一路单频信号。
  10. 一种智能会议设备,其特征在于,该智能会议设备集成了如权利要求8或9所述的音频通话的实现系统。
PCT/CN2016/113264 2016-08-31 2016-12-29 一种音频通话的实现方法、系统及智能会议设备 WO2018040432A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610795116.8 2016-08-31
CN201610795116.8A CN106385517A (zh) 2016-08-31 2016-08-31 一种音频通话的实现方法、系统及智能会议设备

Publications (1)

Publication Number Publication Date
WO2018040432A1 true WO2018040432A1 (zh) 2018-03-08

Family

ID=57938741

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/113264 WO2018040432A1 (zh) 2016-08-31 2016-12-29 一种音频通话的实现方法、系统及智能会议设备

Country Status (2)

Country Link
CN (1) CN106385517A (zh)
WO (1) WO2018040432A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111713118A (zh) * 2019-05-30 2020-09-25 深圳市大疆创新科技有限公司 音频数据的处理方法、设备、系统及存储介质
CN110380937B (zh) * 2019-07-23 2021-08-31 中国工商银行股份有限公司 应用于电子设备的网络测试方法和装置
CN111724803B (zh) * 2020-06-29 2023-08-08 北京达佳互联信息技术有限公司 音频处理方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6724736B1 (en) * 2000-05-12 2004-04-20 3Com Corporation Remote echo cancellation in a packet based network
CN101420586A (zh) * 2007-10-22 2009-04-29 中兴通讯股份有限公司 会议电视系统中消除电学回声的方法及装置
CN102568494A (zh) * 2012-02-23 2012-07-11 贵阳朗玛信息技术股份有限公司 消除回声的优化方法、装置及系统
CN104010100A (zh) * 2014-05-08 2014-08-27 深圳市汇川技术股份有限公司 VoIP通信中的回声消除系统及方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104822001B (zh) * 2015-04-23 2018-12-18 腾讯科技(深圳)有限公司 回声消除数据同步控制方法和装置
CN105304093B (zh) * 2015-11-10 2017-07-25 百度在线网络技术(北京)有限公司 用于语音识别的信号前端处理方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6724736B1 (en) * 2000-05-12 2004-04-20 3Com Corporation Remote echo cancellation in a packet based network
CN101420586A (zh) * 2007-10-22 2009-04-29 中兴通讯股份有限公司 会议电视系统中消除电学回声的方法及装置
CN102568494A (zh) * 2012-02-23 2012-07-11 贵阳朗玛信息技术股份有限公司 消除回声的优化方法、装置及系统
CN104010100A (zh) * 2014-05-08 2014-08-27 深圳市汇川技术股份有限公司 VoIP通信中的回声消除系统及方法

Also Published As

Publication number Publication date
CN106385517A (zh) 2017-02-08

Similar Documents

Publication Publication Date Title
WO2018040430A1 (zh) 一种回声时延的确定方法、装置及智能会议设备
TWI272860B (en) Audio receiver and volume reminder method
JP2016180988A (ja) モバイルデバイスのためのスマートオーディオロギングのシステムおよび方法
WO2018040432A1 (zh) 一种音频通话的实现方法、系统及智能会议设备
CN101313483A (zh) 回音消除的配置
US8786659B2 (en) Device, method and computer program product for responding to media conference deficiencies
CN103546839B (zh) 音频信号处理系统及其回音信号去除方法
CN109346098B (zh) 一种回声消除方法及终端
CN105632541B (zh) 一种录制手机输出音频的方法、系统及手机
KR20180091439A (ko) 어쿠스틱 에코 제거 장치 및 방법
JP2012134923A (ja) 音声処理装置および方法、並びにプログラム
CN112492112B (zh) 一种基于对讲系统的回音消除方法和装置
CA2779157A1 (fr) Procede et dispositif d'annulation d'echo acoustique par tatouage audio
CN114627854A (zh) 语音识别方法、语音识别系统及存储介质
CN108924725B (zh) 一种车载音响系统的音效测试方法
US7600032B2 (en) Temporal drift correction
CN106341564B (zh) 一种信号数据的处理方法、装置及智能会议设备
US7039193B2 (en) Automatic microphone detection
CN103312911B (zh) 数据处理方法和电子终端
WO2017202024A1 (zh) 一种通话模式切换的方法及移动终端
CN110349592B (zh) 用于输出信息的方法和装置
CN115706875A (zh) 对讲语音质量优化方法、装置、设备及存储介质
CN111145770B (zh) 音频处理方法和装置
CN113823306B (zh) 语音的回声消除方法、装置、设备及存储介质
JPWO2011027437A1 (ja) 音声再生装置および音声再生方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16914993

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16914993

Country of ref document: EP

Kind code of ref document: A1