TW201701275A - Front-end audio processing system - Google Patents

Front-end audio processing system Download PDF

Info

Publication number
TW201701275A
TW201701275A TW105120417A TW105120417A TW201701275A TW 201701275 A TW201701275 A TW 201701275A TW 105120417 A TW105120417 A TW 105120417A TW 105120417 A TW105120417 A TW 105120417A TW 201701275 A TW201701275 A TW 201701275A
Authority
TW
Taiwan
Prior art keywords
signal
audio
unit
valid
audio signal
Prior art date
Application number
TW105120417A
Other languages
Chinese (zh)
Other versions
TWI581255B (en
Inventor
施家琪
劉鑫
Original Assignee
芋頭科技(杭州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 芋頭科技(杭州)有限公司 filed Critical 芋頭科技(杭州)有限公司
Publication of TW201701275A publication Critical patent/TW201701275A/en
Application granted granted Critical
Publication of TWI581255B publication Critical patent/TWI581255B/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Manipulator (AREA)

Abstract

The invention relates to the field of intelligent voice interaction, more particularly, to a front-end audio processing system. The front-end audio processing system designed by the invention fills the lack of front-end voice processing in the field of intelligent robot of the embedded operating system commonly seen at present, the system provides to the back-end voice recognition application for front-end voice noise reduction function based on not modifying codes of the existing embedded operating system, which has high system expandability and flexibility.

Description

前端音頻處理系統Front-end audio processing system

本發明關於智能語音互動領域,特別是關於一種前端音頻處理系統。The present invention relates to the field of intelligent voice interaction, and more particularly to a front end audio processing system.

隨著嵌入式技術和人工智慧技術的發展,在智慧機器人上,語音識別技術開始得到廣泛的應用,再次掀起了人機互動的革命。語音識別技術是一種讓機器通過識別和理解過程把自然語音訊號轉變爲相應文字或命令的技術。語音識別技術的關鍵性能參考是識別率,如果識別率太低,那麽使用者會因爲要多次朗讀語音命令而影響人機間通信的流暢性。音頻前端處理就是一系列以提高有效語音訊雜比爲目標的從機器語音採集到演算法預處理過程的全稱。常見的語音前端處理技術包括環境噪音消除技術,自身音源消除技術以及增益自動控制技術。環境噪音消除技術用於降低真實世界中的穩態和非穩態噪音,一般環境消除技術都對穩態噪音有較好的效果,而對非穩態噪音,由於其具有能量大,規律性不強的特點,常見環境噪音消除效果較差。自身音源消除技術旨在減少機器人自身發聲對自身音頻收集的影響,比如一台閱讀報紙的機器人,報紙上的內容會通過TTS技術轉換成語音信息由機器人播放出來,此時播放出來的語音信息有可能會干擾機器人的語音識別系統,使機器人出現錯誤識別和識別率下降的問題。自動增益控制技術則旨在自動調節麥克風採集到音頻的增益,在麥克風一定的情況下,如果採集到的音頻能量過大,則會出現訊號截幅導致其頻譜變化從而出現識別率下降的問題。另外由於聲音能量隨距離而衰減,如果命令發出者距離機器人較遠,則需要提升有效音頻訊號的能量。With the development of embedded technology and artificial intelligence technology, voice recognition technology has begun to be widely used in smart robots, once again revolutionizing human-computer interaction. Speech recognition technology is a technique that allows a machine to transform a natural voice signal into a corresponding text or command through a process of recognition and understanding. The key performance reference of speech recognition technology is the recognition rate. If the recognition rate is too low, the user will affect the fluency of communication between human and computer because of the need to read the voice command multiple times. Audio front-end processing is the full name of a series of process from machine voice acquisition to algorithm preprocessing to improve the effective voice-to-noise ratio. Common voice front-end processing technologies include environmental noise cancellation technology, self-sound source cancellation technology, and gain automatic control technology. Environmental noise cancellation technology is used to reduce steady-state and unsteady noise in the real world. General environmental elimination techniques have a good effect on steady-state noise, while for non-stationary noise, due to its high energy, regularity is not Strong characteristics, common environmental noise elimination effect is poor. The self-sound elimination technology aims to reduce the influence of the robot's own vocalization on its own audio collection. For example, a robot reading a newspaper, the content of the newspaper will be converted into voice information by the TTS technology and played by the robot. The voice information played at this time has It may interfere with the robot's speech recognition system, causing the robot to have problems with false recognition and reduced recognition rate. The automatic gain control technology is designed to automatically adjust the gain of the audio collected by the microphone. If the collected audio energy is too large in a certain microphone, there will be a problem that the signal cuts due to the signal cut and the recognition rate decreases. In addition, since the sound energy is attenuated with distance, if the commander is far away from the robot, the energy of the effective audio signal needs to be increased.

大多數智慧機器人採用的操作系統Linux或Android已經將上述技術以單獨演算法模組的形式集成在系統內部。比如在Android系統中,環境噪音消除技術和自身音源消除技術被抽象爲音頻特效(Audio Effect),這些音效以單獨演算法的形式構造成鏈式結構,通過設定檔由音頻服務在啓動時決定是否使用這些演算法,而增益自動控制則可選的實現在更底層的驅動抽象層或音頻服務當中。這些獨立存在於不同組件中的音頻前端處理演算法雖然能滿足常規的諸如手機或平板的智慧設備應用,但是由於模組之間相互獨立,很多場景需要演算法協同配合以及參考訊號採集困難的問題不能滿足複雜且使用場景靈活多變的智慧機器人。The operating system Linux or Android used by most smart robots has integrated the above technology into the system as a separate algorithm module. For example, in the Android system, the ambient noise cancellation technology and its own sound source cancellation technology are abstracted into an audio effect. These sound effects are constructed in a chain structure in the form of a separate algorithm, and the audio service is determined by the audio service at startup. These algorithms are used, and gain auto-control is optionally implemented in a lower-level driver abstraction layer or audio service. These audio front-end processing algorithms, which exist independently in different components, can satisfy the conventional smart device applications such as mobile phones or tablets. However, because the modules are independent of each other, many scenarios require algorithmic cooperation and difficulty in reference signal acquisition. Intelligent robots that are complex and flexible in use scenarios cannot be satisfied.

由於目前智慧操作系統的前端音頻處理系統存在演算法設計和結構設計上的兩個問題。Due to the current front-end audio processing system of the smart operating system, there are two problems in algorithm design and structural design.

首先這些演算法仍然是針對傳統平板或手機等傳統智能設備設計的。環境降噪演算法在傳統手機上的目標是降低穩態噪音,演算法參數配置上不注重對穩態噪音的消除。自身音源消除演算法則依賴於自身參考音源,傳統智慧操作系統的自身參考音樂來自自身的音頻輸出緩衝區,而緩衝區的不確定性這會導致自身參考音源訊號和接收到的音源訊號延時不固定,從而影響演算法的效果,基於上述原因針對手機或平板的自身音源消除演算法都比較保守,在有效語音和自身音源訊雜比較低的情況下,效果較差。由於傳統智慧操作系統多針對手機平板,這些智能設備多配備指向性麥克風,並且使用者使用麥克風時習慣性離設備很近,故傳統操作系統的自動增益控制並不是必須技術。First of all, these algorithms are still designed for traditional smart devices such as traditional tablets or mobile phones. The goal of the environmental noise reduction algorithm on traditional mobile phones is to reduce the steady-state noise, and the algorithm parameter configuration does not pay attention to the elimination of steady-state noise. The self-sound elimination algorithm relies on its own reference source. The reference memory of the traditional smart operating system comes from its own audio output buffer, and the uncertainty of the buffer will cause the reference source signal and the received source signal delay to be unfixed. Therefore, the effect of the algorithm is affected. For the above reasons, the self-sound elimination algorithm for the mobile phone or the tablet is relatively conservative, and the effect is poor when the effective speech and the self-sound source are relatively low. Since the traditional smart operating system is mostly directed to mobile phone tablets, these smart devices are often equipped with directional microphones, and the user is habitually close to the device when using the microphone, so the automatic gain control of the traditional operating system is not a necessary technology.

其次在結構設計上爲當前智慧操作系統添加這些演算法模組並不能解決問題,這是因爲智慧機器人所位於的真實場景十分複雜多變,原來各種相互獨立的前端音頻問題會互相關聯在一起。比如自動增益演算法如果參數不正確或調用順序不對會將本身細小的噪音放大然後干擾其他演算法。Secondly, adding these algorithm modules to the current smart operating system in the structural design can not solve the problem. This is because the real scene where the intelligent robot is located is very complicated and varied. The original independent front-end audio problems are related to each other. For example, if the automatic gain algorithm is incorrect or the calling sequence is incorrect, it will amplify its own small noise and then interfere with other algorithms.

鑒於上述問題,本發明提供一種前端音頻處理系統,應用於家庭智慧機器人,其中,包括:訊號分離單元,用以對一採集訊號進行分離處理以獲得有效訊號和參考訊號;第一處理單元,連接所述訊號分離單元,用以接收所述訊號分離單元輸出的所述有效訊號,並對所述有效訊號進行分析去除所述有效訊號中的低頻噪音訊號;第二處理單元,分別連接所述訊號分離單元和所述第一處理單元,分別接收所述訊號分離單元輸出的所述參考訊號和所述第一處理單元輸出的經過去除低頻噪音訊號處理的所述有效訊號,用以根據所述參考訊號按照預定的演算法去除所述有效訊號中的自噪音訊號形成純淨音頻訊號;比較單元,連接所述第二處理單元,用以接收經所述第二處理單元的所述純淨音頻訊號,並將所述純淨音頻訊號和所述有效訊號做比較,形成一比較結果;計算單元,於所述有效音頻訊號小於所述純淨音頻訊號一預設閾值的狀態下,對所述有效音頻訊號進行放大,於所述有效音頻訊號不小於所述純淨音頻訊號的所述預設閾值的狀態下,對所述有效音頻訊號進行減小。In view of the above problems, the present invention provides a front-end audio processing system for a home smart robot, comprising: a signal separation unit for separating an acquisition signal to obtain a valid signal and a reference signal; a first processing unit, connecting The signal separating unit is configured to receive the valid signal output by the signal separating unit, and analyze the valid signal to remove the low frequency noise signal in the valid signal; and the second processing unit respectively connects the signal The separating unit and the first processing unit respectively receive the reference signal output by the signal separating unit and the valid signal processed by the first processing unit after removing the low frequency noise signal, according to the reference The signal removes the self-noise signal in the valid signal to form a clear audio signal according to a predetermined algorithm; the comparing unit is connected to the second processing unit to receive the pure audio signal through the second processing unit, and Comparing the pure audio signal with the valid signal to form a comparison knot And the calculating unit is configured to amplify the valid audio signal in a state that the effective audio signal is less than a preset threshold of the clear audio signal, where the effective audio signal is not less than the pre-predetermined audio signal In the state where the threshold is set, the effective audio signal is reduced.

於一較佳實施方式中,其中更包括:採集轉化單元,連接所述訊號分離單元,用以接收不同採集單元獲取的不同格式的採集訊號,並對所述採集訊號轉化成預定格式的採集訊號輸出至所述訊號分離單元。In a preferred embodiment, the method further includes: collecting a conversion unit, connecting the signal separation unit, for receiving the acquisition signals of different formats acquired by different acquisition units, and converting the collected signals into a collection signal of a predetermined format. Output to the signal separation unit.

於一較佳實施方式中,其中更包括:一麥克風,設置於所述音頻播放裝置的輸出端,用以採集所述音頻播放裝置輸出的音頻並形成所述參考訊號。In a preferred embodiment, the method further includes: a microphone disposed at an output end of the audio playback device for collecting audio output by the audio playback device and forming the reference signal.

於另一較佳實施方式中,其中所述訊號分離單元將所述有效訊號和參考訊號分佈於複數個不同聲道中,並將每個所述聲道上的所述有效訊號和所述參考訊號進行分離。In another preferred embodiment, the signal separation unit distributes the valid signal and the reference signal in a plurality of different channels, and the valid signal and the reference on each of the channels The signal is separated.

於又一較佳實施方式中,其中所述純淨音頻訊號的獲取方法爲回聲時延估計方法。In another preferred embodiment, the method for obtaining the pure audio signal is an echo delay estimation method.

於另一較佳實施方式中,其中所述純淨音頻訊號的獲取方法爲歸一化最小均方自適應演算法。In another preferred embodiment, the method for obtaining the pure audio signal is a normalized least mean square adaptive algorithm.

於另一較佳實施方式中,其中所述純淨音頻訊號的獲取方法爲非線性濾波和舒適噪音産生方法。In another preferred embodiment, the method for acquiring the pure audio signal is a nonlinear filtering and a comfort noise generating method.

於又一較佳實施方式中,其中應用單元,連接所述計算單元,用於將所述計算單元輸出的所述有效音頻訊號進行變換並輸出。In another preferred embodiment, the application unit is connected to the computing unit for converting and outputting the valid audio signal output by the computing unit.

綜上所述,本發明設計的一種前端音頻處理系統,該系統填補了目前市面常見嵌入式操作系統在智慧機器人領域語音前端處理方面的缺失,該框架可以再不修改現有嵌入式操作系統代碼的基礎上爲後端語音識別應用提供前端語音降噪功能,具有較高的系統可擴展性和靈活性。In summary, the front-end audio processing system designed by the present invention fills the gap in the voice front-end processing of the common embedded operating system in the smart robot field, and the framework can not modify the basis of the existing embedded operating system code. Provides front-end voice noise reduction for back-end speech recognition applications with high system scalability and flexibility.

爲了使本發明的技術方案及優點更加易於理解,下面結合附圖作進一步詳細說明。應當說明,此處所描述的具體實施例僅用以解釋本發明,並並不用於限定本發明。In order to make the technical solutions and advantages of the present invention easier to understand, the following detailed description will be made with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

本發明的核心思想是:通過對採集到的音頻數據進行層層處理後得到家庭智慧機器人上的應用所需要的音頻訊號,在不修改現有嵌入式操作系統代碼的基礎上爲後端語音識別應用提供前端語音降噪功能,具有較高的系統可擴展性和靈活性。The core idea of the present invention is to obtain an audio signal required by an application on a home smart robot by layer-processing the collected audio data, and to provide a back-end speech recognition application without modifying the existing embedded operating system code. Provides front-end voice noise reduction for high system scalability and flexibility.

所以本發明涉及一種前端音頻處理系統,該系統應用於家庭智慧機器人中,包括有:採集轉化單元,該單元採集音頻並進行預處理,由於目前不同的操作系統對音頻數據的採集方式不同,所以需要採集轉化單元來抽象音頻訊號的數據採集;訊號分離單元,與採集轉化單元連接,並用來採集參考訊號資訊,該訊號分離單元通過硬體參考訊號採集法在家庭智慧機器人的音頻模擬訊號輸出端進行訊號採集,然後將採集到的訊號合成複數不同聲道,並將每個聲道上的有效訊號和參考訊號進行分離;第一處理單元,與訊號分離單元連接,用以接收訊號分離單元輸出的有效訊號,並對有效訊號進行分析去除有效訊號中的低頻噪音訊號;第二處理單元,分別連接所述訊號分離單元和所述第一處理單元,分別接收所述訊號分離單元輸出的所述參考訊號和所述第一處理單元輸出的經過去除低頻噪音訊號處理的所述有效訊號,用以根據所述參考訊號按照預定的演算法去除所述有效訊號中的自噪音訊號形成純淨音頻訊號;比較單元,連接所述第二處理單元,用以接收經所述第二處理單元的所述純淨音頻訊號,並將所述純淨音頻訊號和所述有效訊號做比較,形成一比較結果;計算單元,於所述有效音頻訊號小於所述純淨音頻訊號一預設閾值的狀態下,對所述有效音頻訊號進行放大,於所述有效音頻訊號不小於所述純淨音頻訊號的所述預設閾值的狀態下,對所述有效音頻訊號進行減小。Therefore, the present invention relates to a front-end audio processing system, which is applied to a home smart robot, and includes: a collection conversion unit that collects audio and performs pre-processing. Since different operating systems currently collect audio data in different manners, The conversion unit is required to abstract the data collection of the audio signal; the signal separation unit is connected to the acquisition conversion unit and used to collect the reference signal information, and the signal separation unit is outputted by the hardware reference signal at the audio analog signal output end of the home smart robot. Perform signal acquisition, and then combine the collected signals into a plurality of different channels, and separate the valid signal and the reference signal on each channel; the first processing unit is connected to the signal separation unit for receiving the signal separation unit output. The effective signal, and analyzing the effective signal to remove the low frequency noise signal in the effective signal; the second processing unit is respectively connected to the signal separating unit and the first processing unit, respectively receiving the output of the signal separating unit Reference signal and the first place The effective signal processed by the unit to remove the low frequency noise signal, for removing the self-noise signal in the valid signal according to the predetermined algorithm according to the predetermined signal to form a pure audio signal; and comparing the unit to the second a processing unit, configured to receive the clear audio signal by the second processing unit, and compare the pure audio signal with the valid signal to form a comparison result; and the calculating unit is configured to be less than the effective audio signal The valid audio signal is amplified in a state where the clear audio signal is a predetermined threshold, and the effective audio signal is in a state that the effective audio signal is not less than the preset threshold of the clear audio signal, and the effective audio is The signal is reduced.

應用介面,與計算單元連接,將有效音頻訊號根據家庭智慧機器人的應用需要的聲道進行轉換並傳輸給家庭智能機器人的應用。The application interface is connected to the computing unit to convert and transmit the effective audio signal to the application of the home intelligent robot according to the channel required by the application of the home smart robot.

下面將結合具體實施例進行說明:The following description will be made in conjunction with specific embodiments:

如圖1所示,本發明針對目前家庭智慧機器人前端音頻處理問題,設計一種應用於家庭智能機器人的前端音頻處理系統,該系統主要包括有:採集轉化單元,訊號分離單元,第一處理單元,第二處理單元,比較單元和計算單元,以及應用介面;採集轉化單元是用於面向不同操作系統設計的音頻採集預處理模組,由於不同操作系統間對音頻數據的採集數據方式不同,需要一個採集轉化單元將不同數據格式轉化爲訊號分離單元可識別的數據格式輸出。As shown in FIG. 1 , the present invention is directed to a front-end audio processing system for a home smart robot, and the system mainly includes: a collection conversion unit, a signal separation unit, and a first processing unit. The second processing unit, the comparison unit and the calculation unit, and the application interface; the acquisition conversion unit is an audio acquisition pre-processing module designed for different operating systems, and different methods for collecting audio data between different operating systems require a The acquisition conversion unit converts different data formats into data format outputs identifiable by the signal separation unit.

訊號分離單元用於將採集訊號進行分離從而獲得有效訊號和參考訊號,本方案中採用透過設置一硬體電路獲取參考訊號,例如採用一個麥克風設置在音頻播放裝置的輸出端,麥克風採集音頻播放裝置輸出的音頻訊號並形成參考訊號,所述訊號分離單元將所述有效訊號和參考訊號分佈於複數個不同聲道中,並將每個所述聲道上的所述有效訊號和所述參考訊號進行分離。The signal separating unit is configured to separate the collected signals to obtain the effective signal and the reference signal. In this solution, the reference signal is obtained by setting a hardware circuit, for example, a microphone is disposed at the output end of the audio playing device, and the microphone is used to collect the audio playing device. The output audio signal forms a reference signal, and the signal separation unit distributes the valid signal and the reference signal in a plurality of different channels, and the valid signal and the reference signal on each of the channels Separate.

第一處理單元用於給有效音頻訊號進行降噪處理,本方案採用基於改進的維納濾波器設計的環境音降噪演算法,該層僅對有效音頻訊號進行降噪處理,參考訊號不經過處理直接送往上層。The first processing unit is configured to perform noise reduction processing on the effective audio signal. The scheme adopts an environmental noise reduction algorithm based on the improved Wiener filter design, and the layer only performs noise reduction processing on the effective audio signal, and the reference signal does not pass through. Processing is sent directly to the upper layer.

第二處理單元根據參考訊號消除採集轉化單元採集到的來自自身的干擾音頻訊號,該單元需要來自訊號分離單元的參考訊號和來自第一處理單元降噪後有效音頻訊號作爲輸入訊號,用以根據所述參考訊號按照預定的演算法去除所述有效訊號中的自噪音訊號形成純淨音頻訊號;其中,純淨音頻訊號的獲取方法可採用下述的任一中或幾種結合:回聲時延估計、 歸一化最小均方自適應演算法、 非線性濾波、舒適噪音産生。The second processing unit cancels the interference audio signal collected by the acquisition conversion unit according to the reference signal, and the unit needs the reference signal from the signal separation unit and the effective audio signal from the first processing unit after the noise reduction as an input signal, according to The reference signal removes the self-noise signal in the valid signal according to a predetermined algorithm to form a pure audio signal; wherein the method for obtaining the pure audio signal may be combined with any one or more of the following: echo delay estimation, Normalized least mean square adaptive algorithm, nonlinear filtering, comfort noise generation.

比較單元和計算單元用於根據當前純淨音頻訊號的平均能量值對當前音頻訊號進行處理,如果當前音頻訊號能量值小於預設閾值則放大當前音頻訊號能量,如果大於則減小當前音頻訊號能量。The comparing unit and the calculating unit are configured to process the current audio signal according to the average energy value of the current pure audio signal, and if the current audio signal energy value is less than the preset threshold, the current audio signal energy is amplified, and if it is greater, the current audio signal energy is decreased.

應用介面針對家庭智慧機器人的應用需要的聲道數,採樣並進行最後轉換工作,然後將需要的音頻訊號導出給家庭智慧機器人的語音應用。The application interface is for the number of channels required for the application of the home smart robot, sampling and performing the final conversion work, and then exporting the required audio signals to the voice application of the home smart robot.

在結構整個處理過程採用流水線式設計,每個單元都有工作線程來處理本單元的內容,然後單元間通過一個無鎖循環緩衝區進行數據通信,這樣可以提升數據的輸送量,儘量減少音頻處理帶來的延遲,除此之外,工作線程僅執行自己模組有助於在某些處理器上提高分支預測命中率。In the whole process of the structure, the pipeline design is adopted. Each unit has a working thread to process the contents of the unit, and then the units communicate with each other through a lock-free circular buffer. This can improve the data transmission and minimize the audio processing. In addition to the delay, the worker thread only executes its own module to help improve the branch prediction hit rate on some processors.

當上述系統應用到家庭智慧機器人當中,在一個基於Android嵌入式智慧操作系統的基於語音互動的家庭智慧機器人中,實現本發明前端音頻處理系統來保證家庭智慧機器人語音識別功能在多個場景下的正常使用。首先系統的操作系統音頻介面對Android的音頻庫tinyalsa進行再封裝,然後在封裝基礎上接入採集轉化單元,這裏封裝了tinyalsa的pcm_open(用於打開一個PCM音頻流),pcm_close(用於關閉一個PCM音頻流),pcm_frames_to_bytes(用於將音頻幀數值轉換成字節值),pcm_get_buffer_size(獲得緩衝區大小),pcm_read(從tinyalsa中讀取音頻數據)等函數。訊號分離單元通過採集轉化單元提供的xread函數來讀取來音頻數據,此時系統的模擬參考音頻訊號和採集到音頻訊號混合成了一個雙聲道數據,其中第一個聲道爲採集到的音頻訊號,第二個聲道爲來自系統自身的參考音頻訊號。訊號分離單元將訊號的左右聲道進行分離,一路提交給第二處理單元,一路提交給第一處理單元。第一處理單元對麥克風採集的音頻訊號進行降噪處理,然後記錄環境音降噪的演算法所消耗的時間,將時間和經過處理的音頻訊號提交給第二處理單元。第二處理單元根據音頻參考訊號,採集的音頻訊號,參考延遲時間進行降噪處理,並將結果提交到比較單元和計算單元。比較單元和計算單元根據當前音頻訊號的平均能量對音頻訊號進行增益調整,最後提交給應用介面的緩衝區。When the above system is applied to a home smart robot, the front end audio processing system of the present invention is implemented in a voice interactive interactive home intelligent robot based on the Android embedded smart operating system to ensure the home intelligent robot voice recognition function in multiple scenarios. Normal use. First, the operating system audio interface of the system is repackaged with the Android audio library tinyalsa, and then accesses the acquisition and conversion unit on the package basis, which encapsulates tinyalsa's pcm_open (used to open a PCM audio stream), pcm_close (used to close one) PCM audio stream), pcm_frames_to_bytes (for converting audio frame values to byte values), pcm_get_buffer_size (get buffer size), pcm_read (read audio data from tinyalsa) and other functions. The signal separation unit reads the audio data by collecting the xread function provided by the conversion unit, and the analog reference audio signal of the system and the collected audio signal are mixed into a two-channel data, wherein the first channel is collected. The audio signal, the second channel is the reference audio signal from the system itself. The signal separation unit separates the left and right channels of the signal, submits them all the way to the second processing unit, and submits them to the first processing unit all the way. The first processing unit performs noise reduction processing on the audio signal collected by the microphone, and then records the time consumed by the algorithm of the ambient sound noise reduction, and submits the time and the processed audio signal to the second processing unit. The second processing unit performs noise reduction processing according to the audio reference signal, the collected audio signal, and the reference delay time, and submits the result to the comparison unit and the calculation unit. The comparing unit and the calculating unit perform gain adjustment on the audio signal according to the average energy of the current audio signal, and finally submit to the buffer of the application interface.

本發明設計的一種前端音頻處理系統,該系統填補了目前市面常見嵌入式操作系統在智慧機器人領域語音前端處理方面的缺失,該框架可以再不修改現有嵌入式操作系統代碼的基礎上爲後端語音識別應用提供前端語音降噪功能,具有較高的系統可擴展性和靈活性。The invention provides a front-end audio processing system, which fills the gap in the voice front-end processing of the common embedded operating system in the smart robot field, and the framework can be modified to the back-end voice based on the existing embedded operating system code. The recognition application provides front-end voice noise reduction with high system scalability and flexibility.

通過說明和附圖,給出了具體實施方式的特定結構的典型實施例,基於本發明精神,還可作其他的轉換。儘管上述發明提出了現有的較佳實施例,然而,這些內容並不作爲局限。Exemplary embodiments of the specific structure of the specific embodiments are given by way of illustration and the accompanying drawings, and other transitions are possible in accordance with the spirit of the invention. Although the above invention proposes a prior preferred embodiment, these are not intended to be limiting.

對於本領域的技術人員而言,閱讀上述說明後,各種變化和修正無疑將顯而易見。因此,所附的權利要求書應看作是涵蓋本發明的真實意圖和範圍的全部變化和修正。在權利要求書範圍內任何和所有等價的範圍與內容,都應認爲仍屬本發明的意圖和範圍內。Various changes and modifications will no doubt become apparent to those skilled in the <RTIgt; Accordingly, the appended claims are to cover all such modifications and modifications The scope and content of any and all equivalents are intended to be within the scope and spirit of the invention.

no

參考所附附圖,以更加充分的描述本發明的實施例。然而,所附附圖僅用於說明和闡述,並不構成對本發明範圍的限制: 圖1本發明系統框架示意圖。Embodiments of the present invention are described more fully with reference to the accompanying drawings. The accompanying drawings, which are for the purpose of illustration and description

Claims (8)

一種前端音頻處理系統,應用於家庭智慧機器人,包括: 訊號分離單元,用以對一採集訊號進行分離處理以獲得有效訊號和參考訊號; 第一處理單元,連接所述訊號分離單元,用以接收所述訊號分離單元輸出的所述有效訊號,並對所述有效訊號進行分析去除所述有效訊號中的低頻噪音訊號; 第二處理單元,分別連接所述訊號分離單元和所述第一處理單元,分別接收所述訊號分離單元輸出的所述參考訊號和所述第一處理單元輸出的經過去除低頻噪音訊號處理的所述有效訊號,用以根據所述參考訊號按照預定的演算法去除所述有效訊號中的自噪音訊號形成純淨音頻訊號; 比較單元,連接所述第二處理單元,用以接收經所述第二處理單元的所述純淨音頻訊號,並將所述純淨音頻訊號和所述有效訊號做比較,形成一比較結果; 計算單元,於所述有效音頻訊號小於所述純淨音頻訊號一預設閾值的狀態下,對所述有效音頻訊號進行放大,於所述有效音頻訊號不小於所述純淨音頻訊號的所述預設閾值的狀態下,對所述有效音頻訊號進行減小。A front-end audio processing system is applied to a home smart robot, comprising: a signal separation unit for separating an acquisition signal to obtain a valid signal and a reference signal; and a first processing unit connecting the signal separation unit for receiving The valid signal output by the signal separating unit, and analyzing the valid signal to remove the low frequency noise signal in the valid signal; the second processing unit is respectively connected to the signal separating unit and the first processing unit Receiving, by the reference signal output by the signal separating unit, the valid signal processed by the first processing unit and removing the low frequency noise signal, respectively, for removing the reference signal according to the predetermined algorithm according to the reference algorithm. The self-noise signal in the valid signal forms a pure audio signal; the comparing unit is connected to the second processing unit for receiving the pure audio signal through the second processing unit, and the pure audio signal and the The effective signals are compared to form a comparison result; the calculation unit is in the effective tone And validating the valid audio signal in a state that the signal is smaller than the preset threshold of the clear audio signal, in a state where the effective audio signal is not less than the preset threshold of the clear audio signal, The effective audio signal is reduced. 如申請專利範圍第1項所述之系統,其中更包括: 採集轉化單元,連接所述訊號分離單元,用以接收不同採集單元獲取的不同格式的採集訊號,並對所述採集訊號轉化成預定格式的所述採集訊號輸出至所述訊號分離單元。The system of claim 1, further comprising: a collection conversion unit connected to the signal separation unit for receiving acquisition signals of different formats acquired by different acquisition units, and converting the collected signals into predetermined The acquisition signal of the format is output to the signal separation unit. 如申請專利範圍第1項所述之系統,其中更包括: 一麥克風,設置於所述音頻播放裝置的輸出端,用以採集所述音頻播放裝置輸出的音頻並形成所述參考訊號。The system of claim 1, further comprising: a microphone disposed at an output of the audio playback device for collecting audio output by the audio playback device and forming the reference signal. 如申請專利範圍第3項所述之系統,其中所述訊號分離單元將所述有效訊號和參考訊號分佈於複數個不同聲道中,並將每個所述聲道上的所述有效訊號和所述參考訊號進行分離。The system of claim 3, wherein the signal separation unit distributes the valid signal and the reference signal in a plurality of different channels, and the valid signal on each of the channels The reference signal is separated. 如申請專利範圍第1項所述之系統,其中所述純淨音頻訊號的獲取方法爲回聲時延估計方法。The system of claim 1, wherein the method for obtaining the pure audio signal is an echo delay estimation method. 如申請專利範圍第1項所述之系統,其中所述純淨音頻訊號的獲取方法爲歸一化最小均方自適應演算法。The system of claim 1, wherein the method for obtaining the pure audio signal is a normalized least mean square adaptive algorithm. 如申請專利範圍第1項所述之系統,其中所述純淨音頻訊號的獲取方法爲非線性濾波和舒適噪音産生方法。The system of claim 1, wherein the method for obtaining the pure audio signal is a nonlinear filtering and a comfort noise generating method. 如申請專利範圍第1項所述之系統,其中還包括應用介面,連接所述計算單元,用於將所述計算單元輸出的所述有效音頻訊號進行變換並輸出。The system of claim 1, further comprising an application interface, connected to the computing unit, for converting and outputting the valid audio signal output by the computing unit.
TW105120417A 2015-06-30 2016-06-29 Front-end audio processing system TWI581255B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510385306.8A CN106328154B (en) 2015-06-30 2015-06-30 A kind of front audio processing system

Publications (2)

Publication Number Publication Date
TW201701275A true TW201701275A (en) 2017-01-01
TWI581255B TWI581255B (en) 2017-05-01

Family

ID=57607841

Family Applications (1)

Application Number Title Priority Date Filing Date
TW105120417A TWI581255B (en) 2015-06-30 2016-06-29 Front-end audio processing system

Country Status (4)

Country Link
CN (1) CN106328154B (en)
HK (1) HK1231622A1 (en)
TW (1) TWI581255B (en)
WO (1) WO2017000772A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI672693B (en) * 2017-05-10 2019-09-21 英商思睿邏輯國際半導體有限公司 Combined reference signal for acoustic echo cancellation

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI671738B (en) * 2018-10-04 2019-09-11 塞席爾商元鼎音訊股份有限公司 Sound playback device and reducing noise method thereof
CN109410935A (en) * 2018-11-01 2019-03-01 平安科技(深圳)有限公司 A kind of destination searching method and device based on speech recognition
CN111179931B (en) * 2020-01-03 2023-07-21 青岛海尔科技有限公司 Method and device for voice interaction and household appliance

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0666942B2 (en) * 1985-07-17 1994-08-24 ソニー株式会社 Helical scan magnetic recording / reproducing device
JPH0746083A (en) * 1993-07-27 1995-02-14 Toshiba Corp Sound synthesizing and band limiting circuit and low-frequency sound reinforcing circuit
JP4456601B2 (en) * 2004-06-02 2010-04-28 パナソニック株式会社 Audio data receiving apparatus and audio data receiving method
JP2006074642A (en) * 2004-09-06 2006-03-16 Matsushita Electric Ind Co Ltd Conference telephone system
CN101031963B (en) * 2004-09-16 2010-09-15 法国电信 Method of processing a noisy sound signal and device for implementing said method
CN1809105B (en) * 2006-01-13 2010-05-12 北京中星微电子有限公司 Dual-microphone speech enhancement method and system applicable to mini-type mobile communication devices
CN1946101A (en) * 2006-10-31 2007-04-11 华为技术有限公司 Method and device for realizing mobile terminal audio signal self adaption
ATE448649T1 (en) * 2007-08-13 2009-11-15 Harman Becker Automotive Sys NOISE REDUCTION USING A COMBINATION OF BEAM SHAPING AND POST-FILTERING
US8175871B2 (en) * 2007-09-28 2012-05-08 Qualcomm Incorporated Apparatus and method of noise and echo reduction in multiple microphone audio systems
US8218397B2 (en) * 2008-10-24 2012-07-10 Qualcomm Incorporated Audio source proximity estimation using sensor array for noise reduction
CN101751918B (en) * 2008-12-18 2012-04-18 李双清 Novel silencer and noise reduction method
CN101562669B (en) * 2009-03-11 2012-10-03 上海朗谷电子科技有限公司 Method of adaptive full duplex full frequency band echo cancellation
CN101667426A (en) * 2009-09-23 2010-03-10 中兴通讯股份有限公司 Device and method for eliminating environmental noise
JP2011107603A (en) * 2009-11-20 2011-06-02 Sony Corp Speech recognition device, speech recognition method and program
CN101901601A (en) * 2010-05-17 2010-12-01 天津大学 Method and system for reducing noise of voice communication in vehicle
CN102347027A (en) * 2011-07-07 2012-02-08 瑞声声学科技(深圳)有限公司 Double-microphone speech enhancer and speech enhancement method thereof
CN102800324A (en) * 2012-07-30 2012-11-28 东莞宇龙通信科技有限公司 Audio processing system and method for mobile terminals
CN102831897A (en) * 2012-08-15 2012-12-19 歌尔声学股份有限公司 Multimedia device and multimedia signal processing method
CN104378774A (en) * 2013-08-15 2015-02-25 中兴通讯股份有限公司 Voice quality processing method and device
CN104517607A (en) * 2014-12-16 2015-04-15 佛山市顺德区美的电热电器制造有限公司 Speed-controlled appliance and method of filtering noise therein

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI672693B (en) * 2017-05-10 2019-09-21 英商思睿邏輯國際半導體有限公司 Combined reference signal for acoustic echo cancellation

Also Published As

Publication number Publication date
CN106328154A (en) 2017-01-11
TWI581255B (en) 2017-05-01
WO2017000772A1 (en) 2017-01-05
CN106328154B (en) 2019-09-17
HK1231622A1 (en) 2017-12-22

Similar Documents

Publication Publication Date Title
US11620983B2 (en) Speech recognition method, device, and computer-readable storage medium
TWI581255B (en) Front-end audio processing system
US20190355354A1 (en) Method, apparatus and system for speech interaction
CN109493877B (en) Voice enhancement method and device of hearing aid device
US11587560B2 (en) Voice interaction method, device, apparatus and server
WO2020147642A1 (en) Voice signal processing method and apparatus, computer readable medium, and electronic device
CN110660407B (en) Audio processing method and device
CN110782907B (en) Voice signal transmitting method, device, equipment and readable storage medium
CN113205803B (en) Voice recognition method and device with self-adaptive noise reduction capability
WO2023284402A1 (en) Audio signal processing method, system, and apparatus, electronic device, and storage medium
CN110992967A (en) Voice signal processing method and device, hearing aid and storage medium
CN111540370A (en) Audio processing method and device, computer equipment and computer readable storage medium
CN108510997A (en) Electronic equipment and echo cancel method applied to electronic equipment
EP4207195A1 (en) Speech separation method, electronic device, chip and computer-readable storage medium
WO2017045512A1 (en) Voice recognition method and apparatus, terminal, and voice recognition device
US20230290335A1 (en) Detection of live speech
CN112243182B (en) Pickup circuit, method and device
US10747494B2 (en) Robot and speech interaction recognition rate improvement circuit and method thereof
CN109473111B (en) Voice enabling device and method
CN113223544A (en) Audio direction positioning detection device and method and audio processing system
CN114023352B (en) Voice enhancement method and device based on energy spectrum depth modulation
WO2018083570A1 (en) Intelligent hearing aid
CN109448724B (en) Intelligent story machine with voice interruption function and implementation method thereof
US11516582B1 (en) Splitting frequency-domain processing between multiple DSP cores
CN111147655A (en) Model generation method and device