TWM619473U

TWM619473U - Voice assistant system

Info

Publication number: TWM619473U
Application number: TW110206916U
Authority: TW
Inventors: 林功藝
Original assignee: 神盾股份有限公司
Priority date: 2021-01-13
Filing date: 2021-06-16
Publication date: 2021-11-11
Also published as: WO2022151651A1; TW202228007A; CN216145422U; TWI790647B; CN113411723A

Abstract

A voice assistant system is provided. The voice assistant system includes a microphone module and a signal processor. The microphone module is suitable for being worn on the user, and senses the user's in throat voice through a diaphragm, and generates an analog sound signal in response to the user's in throat voice. The signal processor operates in a keyword detection mode or in a speech reception mode. Power consumption of the signal processor operating in the speech reception mode is greater than power consumption of the signal processor operating in the keyword detection mode. When the signal processor operates in the keyword detection mode, the signal processor performs keyword detection using multiple analog sample voltages of the analog sound signal. In response to detecting a keyword in the keyword detection mode, the signal processor switches from the keyword detection mode to a speech reception mode.

Description

Voice Assistant System

本新型創作是有關於一種語音助理系統，且特別是有關於一種具有無線麥克風裝置的語音助理系統。 The new creation relates to a voice assistant system, and particularly relates to a voice assistant system with a wireless microphone device.

隨著語音辨識技術的進步，語音助理已經被廣泛應用於現代人的生活之中。語音助理是運行在終端裝置上的一種軟體程式，其能夠和使用者進行語音交流而完成使用者所指派的任務，例如資訊搜索、電器控制或操控終端裝置的其他應用程式等等。可想而知，若使用者可以隨心所欲地使用語音助理，可對生活或工作帶來極大的助益。舉例而言，使用者可透過語音助理隨時進行資訊搜索而即時獲取需要的資訊。目前，使用者皆需要對收音裝置清楚地且大聲地說出語音指令，以順利與語音助理進行語音交流。然而，於一些需要保持安靜的情境中，例如會議情境或公眾環境，使用者並不適合大聲地說出語音指令來操控語音助理，以避免打擾他人。此外，若要讓使用者可隨時隨地與語音助理進行溝通，使用者需要隨時隨身配戴收音裝置來擷取使用者下達的語音指令。因此，如何有效延長配戴於使用者身上的收音裝置的續航力也是一大考驗。 With the advancement of voice recognition technology, voice assistants have been widely used in the lives of modern people. A voice assistant is a software program running on a terminal device that can communicate with the user by voice to complete tasks assigned by the user, such as information search, electrical control, or other applications that control the terminal device. It is conceivable that if users can use the voice assistant as they like, it can bring great help to life or work. For example, the user can search for information at any time through the voice assistant and obtain the required information in real time. Currently, users all need to clearly and loudly speak voice commands to the radio device in order to smoothly communicate with the voice assistant. However, in some situations where it is necessary to keep quiet, such as a meeting situation or a public environment, it is not suitable for users to speak out voice commands to control the voice assistant in order to avoid disturbing others. In addition, in order for the user to communicate with the voice assistant anytime and anywhere, the user needs to wear a radio device at any time to capture the voice commands issued by the user. Therefore, how to effectively extend the Endurance is also a big test.

有鑑於此，本新型創作提供一種語音助理系統，其可大幅節省無線麥克風裝置的功耗而增加無線麥克風裝置的續航力，以使得透過此無線麥克風裝置接收語音訊息的語音助理的應用範圍可更廣泛且不受到限制。 In view of this, the present invention provides a voice assistant system, which can greatly save the power consumption of the wireless microphone device and increase the endurance of the wireless microphone device, so that the application range of the voice assistant that receives voice messages through the wireless microphone device can be wider And is not restricted.

本新型創作實施例提出一種語音助理系統，其包括麥克風模組以及訊號處理器。麥克風模組適於配戴於使用者上，並透過膜片以感測使用者的喉內發聲，且反應於使用者的喉內發聲而產生類比聲音訊號。其中膜片連接至電池以及訊號處理器。訊號處理器操作於話語收音模式或關鍵詞檢測模式。訊號處理器操作於話語收音模式的功耗高於訊號處理器操作於關鍵詞檢測模式的功耗。當訊號處理器操作於關鍵詞檢測模式，訊號處理器根據類比聲音訊號的多筆類比取樣電壓進行關鍵詞檢測。反應於在關鍵詞檢測模式中檢測到關鍵詞，訊號處理器自關鍵詞檢測模式切換為話語收音模。 The creative embodiment of the present invention proposes a voice assistant system, which includes a microphone module and a signal processor. The microphone module is suitable for being worn on the user, and senses the voice in the user's throat through the diaphragm, and generates an analog sound signal in response to the voice in the user's throat. The diaphragm is connected to the battery and the signal processor. The signal processor operates in a speech radio mode or a keyword detection mode. The power consumption of the signal processor operating in the speech radio mode is higher than the power consumption of the signal processor operating in the keyword detection mode. When the signal processor operates in the keyword detection mode, the signal processor performs keyword detection based on multiple analog sampling voltages of the analog audio signal. In response to the detection of keywords in the keyword detection mode, the signal processor switches from the keyword detection mode to the speech radio mode.

本新型創作實施例提出一種語音助理系統，其包括終端裝置、麥克風模組以及訊號處理器。麥克風模組適於配戴於使用者上，並透過膜片以感測使用者的喉內發聲，且反應於使用者的喉內發聲而產生類比聲音訊號。其中膜片連接至電池以及訊號處理器。訊號處理器操作於話語收音模式或關鍵詞檢測模式。訊號處理器操作於話語收音模式的功耗高於訊號處理器操作於關鍵詞檢測模式的功耗。 The creative embodiment of the present invention proposes a voice assistant system, which includes a terminal device, a microphone module, and a signal processor. The microphone module is suitable for being worn on the user, and senses the voice in the user's throat through the diaphragm, and generates an analog sound signal in response to the voice in the user's throat. The diaphragm is connected to the battery and the signal processor. The signal processor operates in a speech radio mode or a keyword detection mode. Signal The power consumption of the processor operating in the speech radio mode is higher than the power consumption of the signal processor operating in the keyword detection mode.

當訊號處理器操作於關鍵詞檢測模式，訊號處理器根據類比聲音訊號的多筆類比取樣電壓進行關鍵詞檢測。反應於在關鍵詞檢測模式中檢測到關鍵詞，訊號處理器自關鍵詞檢測模式切換為話語收音模式。在切換至話語收音模式之後，訊號處理器對類比聲音訊號進行音訊處理而產生經處理數位音訊數據。訊號處理器將經處理數位音訊數據提供給終端裝置所運行的語音助理程序。 When the signal processor operates in the keyword detection mode, the signal processor performs keyword detection based on multiple analog sampling voltages of the analog audio signal. In response to the keyword detection in the keyword detection mode, the signal processor switches from the keyword detection mode to the speech radio mode. After switching to the speech radio mode, the signal processor performs audio processing on the analog audio signal to generate processed digital audio data. The signal processor provides the processed digital audio data to the voice assistant program running on the terminal device.

基於上述，於本新型創作的實施例中，語音助理系統的訊號處理器可切換操作於關鍵詞檢測模式與話語收音模式。當無線麥克風裝置的訊號處理器操作於關鍵詞檢測模式時，訊號處理器在禁能高功耗元件的情況下根據麥克風模組提供的類比聲音訊號判斷是否檢測到關鍵詞。反應於在關鍵詞檢測模式中檢測到關鍵詞，訊號處理器可自關鍵詞檢測模式切換為話語收音模式而啟動高功耗元件。基此，當使用者意圖使用語音助理而說出關鍵詞時，無線麥克風裝置才從關鍵詞檢測模式切換為話語收音模式，以利用高功耗元件對麥克風模組提供的類比聲音訊號進行數位音訊處理，以避免高功耗元件於非必要時持續運作而浪費無線麥克風裝置的電力，從而延長無線麥克風裝置的續航力。 Based on the above, in the embodiment of the present invention, the signal processor of the voice assistant system can switch between the keyword detection mode and the speech radio mode. When the signal processor of the wireless microphone device operates in the keyword detection mode, the signal processor determines whether the keyword is detected according to the analog audio signal provided by the microphone module when the high power consumption component is disabled. In response to the keyword detection in the keyword detection mode, the signal processor can switch from the keyword detection mode to the speech radio mode to activate the high-power components. Based on this, when the user intends to use the voice assistant to speak a keyword, the wireless microphone device switches from the keyword detection mode to the speech radio mode, so as to use high-power components to perform digital audio on the analog audio signal provided by the microphone module. Processing to prevent high-power components from continuously operating when unnecessary and wasting the power of the wireless microphone device, thereby prolonging the endurance of the wireless microphone device.

為讓本新型創作的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。 In order to make the above-mentioned features and advantages of the new creation more obvious and understandable, the following specific examples are given in conjunction with the accompanying drawings to describe in detail as follows.

10:語音助理系統 10: Voice assistant system

100:無線麥克風裝置 100: Wireless microphone device

200:終端裝置 200: terminal device

300:耳機 300: headphones

110:麥克風模組 110: Microphone module

111:膜片 111: diaphragm

120:訊號處理器 120: signal processor

130:電池 130: battery

140:無線收發器 140: wireless transceiver

121:高功耗元件 121: High-power components

122:類比取樣電路 122: analog sampling circuit

123:類比式記憶體 123: Analog memory

124:語音辨識電路 124: Voice recognition circuit

121a:功率放大器 121a: Power amplifier

121b:類比數位轉換器 121b: Analog-to-digital converter

121c:數位信號處理器 121c: Digital Signal Processor

圖1是依照本新型創作一實施例的語音助理系統的示意圖。 Fig. 1 is a schematic diagram of a voice assistant system according to an embodiment of the present invention.

圖2是依照本新型創作一實施例的語音助理系統的使用情境示意圖。 Fig. 2 is a schematic diagram of a usage scenario of a voice assistant system according to an embodiment of the new creation.

圖3是依照本新型創作一實施例的無線麥克風裝置的示意圖。 Fig. 3 is a schematic diagram of a wireless microphone device according to an embodiment of the invention.

圖4是依照本新型創作一實施例的語音助理系統的示意圖。 Fig. 4 is a schematic diagram of a voice assistant system according to an embodiment of the present invention.

圖5是依照本新型創作一實施例的語音助理系統的使用情境示意圖。 Fig. 5 is a schematic diagram of a usage scenario of a voice assistant system according to an embodiment of the new creation.

圖6是依照本新型創作一實施例的無線麥克風裝置的示意圖。 Fig. 6 is a schematic diagram of a wireless microphone device according to an embodiment of the invention.

為了使本新型創作的內容可以被更容易明瞭，以下特舉實施例做為本新型創作確實能夠據以實施的範例。另外，凡可能之處，在圖式及實施方式中使用相同標號的元件/構件/步驟，是代表相同或類似部件。 In order to make the content of the new creation easier to understand, the following specific examples are given as examples on which the new creation can indeed be implemented. In addition, wherever possible, elements/components/steps with the same reference numbers in the drawings and embodiments represent the same or similar components.

應當理解，當元件被稱為“直接在另一元件上”或“直接連接到”另一元件時，不存在中間元件。如本文所使用的，“連接”可以指物理及/或電性連接。再者，“電性連接”或“耦合”可以是二元件間存在其它元件。 It should be understood that when an element is referred to as being "directly on" or "directly connected to" another element, there are no intervening elements. As used herein, "connected" can refer to physical and/or electrical connection. Furthermore, "electrical connection" or "coupling" can be binary There are other components between pieces.

圖1是依照本新型創作一實施例的語音助理系統的示意圖。請參照圖1，語音助理系統10可包括無線麥克風裝置100與終端裝置200。終端裝置200用以運行一語音助理程序，其例如是桌上型電腦、筆記型電腦、智慧型手機、平板電腦、智慧音箱等等，本新型創作對此並不限制。無線麥克風裝置100可經由無線通訊技術連接終端裝置200。 Fig. 1 is a schematic diagram of a voice assistant system according to an embodiment of the present invention. Please refer to FIG. 1, the voice assistant system 10 may include a wireless microphone device 100 and a terminal device 200. The terminal device 200 is used to run a voice assistant program, which is, for example, a desktop computer, a notebook computer, a smart phone, a tablet computer, a smart speaker, etc. The present invention is not limited to this. The wireless microphone device 100 can be connected to the terminal device 200 via wireless communication technology.

舉例而言，無線麥克風裝置100可經由藍芽、Wi-Fi或ZigBee等無線通訊技術與終端裝置200連接，本新型創作對此無線通訊技術的種類並不限制。無線麥克風裝置100用以感測使用者的喉內發聲，以讓使用者可利用無線麥克風裝置100與終端裝置200所運行的語音助理程序進行語音互動。 For example, the wireless microphone device 100 can be connected to the terminal device 200 via wireless communication technologies such as Bluetooth, Wi-Fi, or ZigBee, and the present invention does not limit the types of wireless communication technologies. The wireless microphone device 100 is used to sense the voice in the user's throat, so that the user can use the wireless microphone device 100 to interact with the voice assistant program run by the terminal device 200.

於本實施例中，無線麥克風裝置100可包括麥克風模組110、訊號處理器120，以及電池130。 In this embodiment, the wireless microphone device 100 may include a microphone module 110, a signal processor 120, and a battery 130.

麥克風模組110適於配戴於使用者上，並反應於使用者的喉內發聲而產生類比聲音訊號。喉內發聲為旁人聽不到的聲波振福。麥克風模組110可包括用以感測使用者的發聲的膜片111，其例如是微機電系統(MEMS)麥克風，其中上述膜片111可為壓力感應膜，其用以因應於喉內發聲而產生振動。於一實施例中，無線麥克風裝置100可以是具有可感測到頭頸部骨骼或肌肉振動的膜片的骨感式麥克風。麥克風模組110接觸使用者的肌膚並適於配戴於使用者的喉嚨上或耳後。麥克風模組110可感測到使用者以極低音量所發出的聲音。於另一實施例中，麥克風模組110亦可包括兩膜片，且兩膜片的尺寸不同。所述兩膜片可分別用以感測不同頻率的喉內發聲，或者所述兩膜片可分別用以感測不同部位的頭頸部骨骼或肌肉振動。其中，本領域技術人員可依照語音助理系統10的設計需求來決定麥克風模組110中的膜片的數量，本新型創作並不局限於上述的膜片的數量。 The microphone module 110 is suitable for being worn on the user, and generates an analog sound signal in response to the voice in the throat of the user. The sound in the throat is blessed by sound waves that are not heard by others. The microphone module 110 may include a diaphragm 111 for sensing the user’s vocalization, such as a microelectromechanical system (MEMS) microphone, wherein the diaphragm 111 may be a pressure-sensitive membrane, which is used to respond to the sound produced in the larynx. Vibrate. In an embodiment, the wireless microphone device 100 may be a bone-sensitive microphone with a diaphragm that can sense the vibration of the bones or muscles of the head and neck. The microphone module 110 contacts the user's skin and is suitable for being worn on the user's throat or behind the ear. The microphone module 110 can sense the user A sound made at a very low volume. In another embodiment, the microphone module 110 may also include two diaphragms, and the sizes of the two diaphragms are different. The two diaphragms can be respectively used to sense different frequencies of intra-laryngophonic sound, or the two diaphragms can be respectively used to sense the vibrations of bones or muscles of different parts of the head and neck. Among them, those skilled in the art can determine the number of diaphragms in the microphone module 110 according to the design requirements of the voice assistant system 10, and the present invention is not limited to the number of diaphragms mentioned above.

更詳細而言，圖2是依照本新型創作一實施例的語音助理系統的使用情境示意圖。請參照圖2，無線麥克風裝置100可配戴於使用者耳後乳突骨的附近。當使用者發出聲音時，麥克風模組110可透過膜片111感應到的使用者的骨骼或肌肉振動而據以產生類比聲音訊號。因此，透過與使用者肌膚接觸的麥克風模組110感測使用者的發聲，使用者可以旁人無法清楚聽到的音量下達語音訊息給終端裝置200所運行的語音助理程序。 In more detail, FIG. 2 is a schematic diagram of a usage scenario of the voice assistant system according to an embodiment of the new creation. Please refer to FIG. 2, the wireless microphone device 100 can be worn near the mastoid bone behind the ear of the user. When the user makes a sound, the microphone module 110 can generate an analog sound signal based on the vibration of the user's bones or muscles sensed by the diaphragm 111. Therefore, the microphone module 110 in contact with the user's skin senses the user's utterance, and the user can deliver a voice message to the voice assistant program run by the terminal device 200 at a volume that is not clearly heard by others.

電池130耦接麥克風模組110與訊號處理器120，並用以作為無線麥克風裝置100的電力來源。換言之，電池130可提供電力給麥克風模組110與訊號處理器120。 The battery 130 is coupled to the microphone module 110 and the signal processor 120 and is used as a power source for the wireless microphone device 100. In other words, the battery 130 can provide power to the microphone module 110 and the signal processor 120.

訊號處理器120可切換操作於話語收音模式或關鍵詞檢測模式。訊號處理器120操作於話語收音模式的功耗高於訊號處理器操作於關鍵詞檢測模式的功耗。也就是說，訊號處理器120可操作於功耗較高的話語收音模式或操作於功耗較低的關鍵詞檢測模式。於一實施例中，訊號處理器120包括高功耗元件121，並接收麥克風模組110所產生的類比聲音訊號。當訊號處理器120 操作於關鍵詞檢測模式時，高功耗元件121被禁能而停止運作。當訊號處理器120操作於話語收音模式時，高功耗元件121被致能來對麥克風模組110提供的類比聲音訊號進行音訊處理。於一實施例中，高功耗元件121可包括類比數位轉換器、數位信號處理器、功率放大器或其組合。 The signal processor 120 can be switched to operate in a speech radio mode or a keyword detection mode. The power consumption of the signal processor 120 operating in the speech radio mode is higher than the power consumption of the signal processor operating in the keyword detection mode. In other words, the signal processor 120 can be operated in a speech receiving mode with higher power consumption or in a keyword detection mode with lower power consumption. In one embodiment, the signal processor 120 includes a high power consumption component 121 and receives the analog audio signal generated by the microphone module 110. When the signal processor 120 When operating in the keyword detection mode, the high power consumption component 121 is disabled and stops operating. When the signal processor 120 is operating in the speech radio mode, the high power consumption component 121 is enabled to perform audio processing on the analog audio signal provided by the microphone module 110. In an embodiment, the high power consumption component 121 may include an analog-to-digital converter, a digital signal processor, a power amplifier, or a combination thereof.

因此，當訊號處理器120操作於關鍵詞檢測模式時，用以對麥克風模組110提供的類比聲音訊號進行音訊處理的高功耗元件121不會消耗電池130的電量。需說明的是，訊號處理器120是根據使用者是否說出關鍵詞而決定是否從關鍵詞檢測模式切換為話語收音模式。因此，當使用者沒有說出關鍵詞時，訊號處理器120會維持操作於功耗較低的關鍵詞檢測模式中。當使用者說出關鍵詞時，訊號處理器120會切換為操作於功耗較高的話語收音模式中。對應於不同語音助理程序，上述關鍵詞例如是Alexa、Cortana、Hey Siri、OK Google或其他自定義關鍵詞等等，本新型創作對此不限制。 Therefore, when the signal processor 120 is operating in the keyword detection mode, the high power consumption component 121 used to perform audio processing on the analog audio signal provided by the microphone module 110 does not consume the power of the battery 130. It should be noted that the signal processor 120 determines whether to switch from the keyword detection mode to the speech radio mode according to whether the user utters the keyword. Therefore, when the user does not speak the keyword, the signal processor 120 will maintain the operation in the keyword detection mode with lower power consumption. When the user utters a keyword, the signal processor 120 will switch to operate in a speech radio mode with higher power consumption. Corresponding to different voice assistant programs, the above-mentioned keywords are, for example, Alexa, Cortana, Hey Siri, OK Google, or other custom keywords, etc., and the creation of the new model is not limited to this.

於一實施例中，當訊號處理器120操作於關鍵詞檢測模式，訊號處理器120可基於人工神經網路(artificial neural network，ANN)而根據類比聲音訊號的多筆類比取樣電壓進行關鍵詞檢測。詳細而言，訊號處理器120可對類比聲音訊號進行類比訊號取樣而獲取多筆類比取樣電壓。於一實施例中，訊號處理器120可包括實現人工神經網路的類比人工智慧(AI)電路，而此人工神經網路經配置接收多筆類比取樣電壓來進行關鍵詞檢測。相較於數位AI電路，可實現類比乘加器的類比AI電路的功耗更低。也就是說，訊號處理器120可於關鍵詞檢測模式中透過將多筆類比取樣電壓提供給類比AI電路來持續偵測使用者是否說出關鍵詞。 In one embodiment, when the signal processor 120 is operating in the keyword detection mode, the signal processor 120 can perform keyword detection based on multiple analog sample voltages of the analog audio signal based on an artificial neural network (ANN) . In detail, the signal processor 120 can perform analog signal sampling on the analog audio signal to obtain multiple analog sample voltages. In one embodiment, the signal processor 120 may include an analog artificial intelligence (AI) circuit that implements an artificial neural network, and the artificial neural network is configured to receive multiple analog sample voltages for keyword detection. Measurement. Compared with the digital AI circuit, the analog AI circuit that can realize the analog multiplier and adder has lower power consumption. In other words, the signal processor 120 can continuously detect whether the user utters the keyword by providing multiple analog sampling voltages to the analog AI circuit in the keyword detection mode.

於是，反應於在關鍵詞檢測模式中檢測到關鍵詞，訊號處理器120可自關鍵詞檢測模式切換為話語收音模式而啟動高功耗元件121。在切換至話語收音模式之後，訊號處理器120可利用高功耗元件121對類比聲音訊號進行音訊處理而產生經處理數位音訊數據。無線麥克風裝置100將經處理數位音訊數據提供給終端裝置200所運行的語音助理程序，使得語音助理程序可依據經處理數位音訊數據執行相關功能，例如資訊搜索、電器控制或控制終端裝置200的其他應用程式等等。 Therefore, in response to detecting the keyword in the keyword detection mode, the signal processor 120 can switch from the keyword detection mode to the speech radio mode to activate the high power consumption element 121. After switching to the speech radio mode, the signal processor 120 can use the high-power component 121 to perform audio processing on the analog audio signal to generate processed digital audio data. The wireless microphone device 100 provides the processed digital audio data to the voice assistant program run by the terminal device 200, so that the voice assistant program can perform related functions based on the processed digital audio data, such as information search, electrical control, or other control of the terminal device 200 Applications and more.

另一方面，反應於在關鍵詞檢測模式中未檢測到關鍵詞，訊號處理器120維持操作於關鍵詞檢測模式而禁能高功耗元件121。也就是說，若使用者沒有說出關鍵詞，訊號處理器120可長時間維持操作於關鍵詞檢測模式中來節省電力消耗。也就是說，當配戴無線麥克風裝置100的使用者沒有想要使用語音助理時，使用者並不會說出關鍵詞而控制無線麥克風裝置100的訊號處理器120一直維持操作於關鍵詞檢測模式中。當使用者想要使用語音助理時，使用者可以極低音量說出關鍵詞而控制無線麥克風裝置100的訊號處理器120切換為操作於話語收音模式，使得操作於話語收音模式的訊號處理器120可對麥克風模組110提供的類比聲音訊號進行類比數位轉換與數位音訊處理。換言之，高功耗元件121只有在使用者對語音助理下達語音訊息時才會被致能來運作，其餘時段是被禁能的。因此，可大幅節省語音助理系統10中的無線麥克風裝置100的電力消耗，好讓使用者可以長時間配戴無線麥克風裝置100且不需要頻繁對無線麥克風裝置100進行充電。 On the other hand, in response to no keywords being detected in the keyword detection mode, the signal processor 120 maintains the operation in the keyword detection mode to disable the high power consumption component 121. In other words, if the user does not say the keyword, the signal processor 120 can maintain the operation in the keyword detection mode for a long time to save power consumption. That is to say, when the user wearing the wireless microphone device 100 does not want to use the voice assistant, the user does not speak the keyword and controls the signal processor 120 of the wireless microphone device 100 to keep operating in the keyword detection mode. middle. When the user wants to use the voice assistant, the user can speak keywords at a very low volume and control the signal processor 120 of the wireless microphone device 100 to switch to operate in the speech radio mode, so that the signal processor 120 operates in the speech radio mode Available for microphone module 110 The analog audio signal is converted from analog to digital and processed with digital audio. In other words, the high-power component 121 is enabled to operate only when the user sends a voice message to the voice assistant, and is disabled for the rest of the time. Therefore, the power consumption of the wireless microphone device 100 in the voice assistant system 10 can be greatly reduced, so that the user can wear the wireless microphone device 100 for a long time without charging the wireless microphone device 100 frequently.

圖3是依照本新型創作一實施例的無線麥克風裝置的示意圖。請參照圖3，無線麥克風裝置100可包括麥克風模組110、訊號處理器120、電池130，以及無線收發器140。 Fig. 3 is a schematic diagram of a wireless microphone device according to an embodiment of the invention. 3, the wireless microphone device 100 may include a microphone module 110, a signal processor 120, a battery 130, and a wireless transceiver 140.

相較於圖1的實施例，於本實施例中，無線麥克風裝置100可更包括無線收發器140。無線收發器140耦接訊號處理器120並與終端裝置200建立無線通訊連結。具體而言，無線收發器140可用以傳輸數據至終端裝置200或從終端裝置200接收數據。無線收發器140可包括天線或其他通訊相關電路，其例如是藍牙收發器，但本新型創作並不限制於此。於此，無線收發器140可將操作於話語收音模式中的訊號處理器120產生的經處理數位音訊數據傳輸至終端裝置200，使得終端裝置200所運行的語音助理程序可依據經處理數位音訊數據進行語音辨識而獲取使用者下達的語音訊息。 Compared with the embodiment of FIG. 1, in this embodiment, the wireless microphone device 100 may further include a wireless transceiver 140. The wireless transceiver 140 is coupled to the signal processor 120 and establishes a wireless communication link with the terminal device 200. Specifically, the wireless transceiver 140 can be used to transmit data to or receive data from the terminal device 200. The wireless transceiver 140 may include an antenna or other communication-related circuits, such as a Bluetooth transceiver, but the invention is not limited to this. Here, the wireless transceiver 140 can transmit the processed digital audio data generated by the signal processor 120 operating in the speech radio mode to the terminal device 200, so that the voice assistant program run by the terminal device 200 can be based on the processed digital audio data Perform voice recognition to obtain the voice message issued by the user.

此外，於本實施例中，訊號處理器120可包括類比取樣電路122、類比式記憶體123，以及語音辨識電路124。 In addition, in this embodiment, the signal processor 120 may include an analog sampling circuit 122, an analog memory 123, and a voice recognition circuit 124.

於一實施例中，類比取樣電路122可包括一或多個類比取樣維持電路(analog sampling-and-hold circuit)。類比取樣電路122可依據取樣頻率而對類比聲音訊號進行取樣與保持，藉以輸出已取樣及保持過後的多筆類比取樣電壓。於一實施例中，麥克風模組110的振膜可感測到使用者的骨骼或肌肉振動，使麥克風模組110據以輸出類比聲音訊號至訊號處理器120。類比取樣電路122耦接麥克風模組110。類比取樣電路122接收麥克風模組110產生的類比聲音訊號，並取樣類比聲音訊號而產生多筆類比取樣電壓。於一實施例中，類比取樣電路122例如可以16k Hz的取樣頻率來取樣類比聲音訊號。 In one embodiment, the analog sampling circuit 122 may include one or more analogs Sampling and holding circuit (analog sampling-and-hold circuit). The analog sampling circuit 122 can sample and hold the analog audio signal according to the sampling frequency, thereby outputting multiple analog sampling voltages that have been sampled and held. In one embodiment, the diaphragm of the microphone module 110 can sense the user's bone or muscle vibration, so that the microphone module 110 outputs analog sound signals to the signal processor 120 accordingly. The analog sampling circuit 122 is coupled to the microphone module 110. The analog sampling circuit 122 receives the analog audio signal generated by the microphone module 110, and samples the analog audio signal to generate a plurality of analog sampling voltages. In one embodiment, the analog sampling circuit 122 may sample the analog audio signal at a sampling frequency of 16 kHz, for example.

類比式記憶體123耦接類比取樣電路122，並記錄來自類比取樣電路122的多筆類比取樣電壓。於一實施例中，類比式記憶體123可以是電荷耦合裝置(charge coupled device，CCD)記憶體。類比式記憶體123可以是三相CCD記憶體或四相CCD記憶體，本新型創作對此不限制。詳細而言，類比式記憶體123可分別將多筆類比取樣電壓轉換為對應的電荷，以將多筆類比取樣電壓各自對應的電荷量記錄下來。基於將多個時脈訊號施加於CCD記憶體上的多個閘級電極而產生的電荷轉移效應，類比式記憶體123可將多筆類比取樣電壓依照取樣順序暫存下來。 The analog memory 123 is coupled to the analog sampling circuit 122 and records multiple analog sampling voltages from the analog sampling circuit 122. In one embodiment, the analog memory 123 may be a charge coupled device (CCD) memory. The analog memory 123 can be a three-phase CCD memory or a four-phase CCD memory, which is not limited by the present invention. In detail, the analog memory 123 can respectively convert multiple analog sampling voltages into corresponding charges, so as to record the respective charges of the multiple analog sampling voltages. Based on the charge transfer effect generated by applying multiple clock signals to multiple gate electrodes on the CCD memory, the analog memory 123 can temporarily store multiple analog sampling voltages in accordance with the sampling order.

或者，於一實施例中，類比式記憶體123可以是相變記憶體(Phase-change memory，PCM)。詳細而言，多筆類比取樣電壓可分別轉換為具有對應脈波寬度的電流脈波，而這些電流脈波可被施加於類比式記憶體123中的多個記憶單元的電極上，使各記憶單元中的相變材料發生物理相態的變化而具有對應的電阻狀態。透過將多筆類比取樣電壓轉換為相變記憶體中多個記憶單元所對應電阻狀態，類比式記憶體123中可將多筆類比取樣電壓記錄下來。 Alternatively, in an embodiment, the analog memory 123 may be a phase-change memory (PCM). In detail, multiple analog sampling voltages can be converted into current pulses with corresponding pulse widths, and these current pulses can be applied to the electrodes of multiple memory cells in the analog memory 123, so that each The phase change material in the memory cell undergoes a physical phase change and has a corresponding resistance state. By converting multiple analog sampling voltages into resistance states corresponding to multiple memory cells in the phase change memory, the analog memory 123 can record multiple analog sampling voltages.

於一實施例中，類比式記憶體123可記錄於一個預設取樣時段內取樣獲取的多筆類比取樣電壓。上述預設取樣時段例如是250ms，但本新型創作對此不限制。 In one embodiment, the analog memory 123 can record multiple analog sampling voltages acquired during a predetermined sampling period. The above-mentioned preset sampling period is, for example, 250 ms, but the present invention does not limit this.

語音辨識電路124耦接類比式記憶體123。語音辨識電路124可自類比式記憶體123獲取對應至一預設取樣時段的多筆類比取樣電壓。語音辨識電路124可基於人工神經網路而對這些類比取樣電壓進行特徵擷取，以判斷是否檢測到關鍵詞。可知的，人工神經網路包括排列於多個層的多個神經元，這些神經元分別會依據權重資訊進行乘法運算與加法運算，而這些層的輸出可視為擷取到特徵向量。於一實施例中，語音辨識電路124可包括實現類比乘加器的類比AI電路，其可根據人工神經網路對多筆類比取樣電壓進行類比AI運算，以對這些類比取樣電壓進行特徵擷取。最終，語音辨識電路124可依據這些類比取樣電壓的特徵向量進行分類操作而判斷是否檢測到關鍵詞。 The voice recognition circuit 124 is coupled to the analog memory 123. The voice recognition circuit 124 can obtain multiple analog sampling voltages corresponding to a predetermined sampling period from the analog memory 123. The voice recognition circuit 124 can perform feature extraction on these analog sample voltages based on an artificial neural network to determine whether a keyword is detected. It can be seen that the artificial neural network includes multiple neurons arranged in multiple layers, and these neurons perform multiplication and addition operations respectively according to the weight information, and the output of these layers can be regarded as the feature vector extracted. In one embodiment, the speech recognition circuit 124 may include an analog AI circuit that implements an analog multiplier and adder, which can perform analog AI operations on multiple analog sample voltages according to an artificial neural network to perform feature extraction on these analog sample voltages. . Finally, the speech recognition circuit 124 can perform a classification operation based on the feature vectors of these analog sample voltages to determine whether a keyword is detected.

於一實施例中，關鍵詞可由多個音節組成，這些音節至少包括第一音節與第二音節。語音辨識電路124可基於人工神經網路判斷多筆類比取樣電壓中的多筆第一取樣電壓是否符合關鍵詞的第一音節。第一取樣電壓是於一預設取樣時段內進行類比取樣而產生，且類比式記憶體123可同時地暫存於一預設取樣時段內進行類比取樣而產生的多筆取樣電壓。舉例而言，基於人說出一個音節大概需要花費1/4秒，因而可假設預設取樣時段為250ms。假設取樣頻率為16k HZ(亦即一秒取樣16k筆類比取樣電壓)，則類比式記憶體123所暫存之對應至預設取樣時段的第一取樣電壓共有4k筆。首先，第一取樣電壓輸入至語音辨識電路124，語音辨識電路124可判斷多筆第一取樣電壓是否符合關鍵詞的第一音節。 In one embodiment, the keyword may be composed of multiple syllables, and these syllables include at least a first syllable and a second syllable. The voice recognition circuit 124 can determine whether the first sampled voltages of the plurality of analog sampled voltages match the first syllable of the keyword based on the artificial neural network. The first sampling voltage is analogously taken during a preset sampling period The analog memory 123 can simultaneously temporarily store multiple sampling voltages generated by analog sampling in a preset sampling period. For example, based on the fact that it takes about 1/4 second for a person to speak a syllable, it can be assumed that the preset sampling period is 250ms. Assuming that the sampling frequency is 16k HZ (that is, 16k analog sampling voltages are sampled in one second), the first sampling voltage temporarily stored in the analog memory 123 corresponding to the preset sampling period is 4k. First, the first sampled voltage is input to the voice recognition circuit 124, and the voice recognition circuit 124 can determine whether the multiple first sampled voltages match the first syllable of the keyword.

接著，反應於基於人工神經網路判定多筆類比取樣電壓中的第一取樣電壓符合關鍵詞的第一音節，語音辨識電路124才可基於人工神經網路判斷類比取樣電壓中的多筆第二取樣電壓是否符合關鍵詞的第二音節。反之，反應於基於人工神經網路判定多筆類比取樣電壓中的第一取樣電壓並未符合關鍵詞的第一音節，語音辨識電路124會再次基於人工神經網路判斷類比取樣電壓中的多筆第二取樣電壓是否符合關鍵詞的第一音節。 Then, in response to the decision based on the artificial neural network that the first sampled voltage of the multiple analog sampled voltages matches the first syllable of the keyword, the speech recognition circuit 124 can determine the multiple second ones of the analog sampled voltages based on the artificial neural network. Whether the sampling voltage matches the second syllable of the keyword. Conversely, in response to the decision based on the artificial neural network that the first sampled voltage of the multiple analog sampled voltages does not match the first syllable of the keyword, the speech recognition circuit 124 will again determine the multiple of the analog sampled voltages based on the artificial neural network. Whether the second sampling voltage matches the first syllable of the keyword.

於一實施例中，語音辨識電路124使用第一神經網路權重數據判斷多筆類比取樣電壓中的第一取樣電壓是否符合關鍵詞的第一音節。並且，語音辨識電路124使用第二神經網路權重數據判斷多筆類比取樣電壓中的第二取樣電壓是否符合關鍵詞的第二音節。亦即，對應於不同發音的第一音節與第二音節，語音辨識電路124可使用不同的經訓練的神經網路權重數據來進行判斷。 In one embodiment, the speech recognition circuit 124 uses the first neural network weight data to determine whether the first sampled voltage among the plurality of analog sampled voltages matches the first syllable of the keyword. In addition, the speech recognition circuit 124 uses the second neural network weight data to determine whether the second sampling voltage of the plurality of analog sampling voltages matches the second syllable of the keyword. That is, corresponding to the first syllable and the second syllable of different pronunciations, the speech recognition circuit 124 can use different trained neural network weight data to make a judgment.

也就是說，當語音辨識電路124判定多筆第一取樣電壓符合關鍵詞的第一音節時，語音辨識電路124才會接續判斷後續的其他取樣電壓是否符合關鍵詞的第二音節。否則，語音辨識電路124會繼續判斷類比式記憶體123所暫存的類比取樣電壓是否符合關鍵詞的第一音節。換言之，於一實施例中，當語音辨識電路124基於人工神經網路判定類比取樣電壓依照特定順序符合關鍵詞的多個音節，語音辨識電路124判定檢測到關鍵詞。 In other words, when the voice recognition circuit 124 determines that multiple first sampled voltages When it matches the first syllable of the keyword, the speech recognition circuit 124 will continue to determine whether other subsequent sampled voltages match the second syllable of the keyword. Otherwise, the speech recognition circuit 124 will continue to determine whether the analog sampling voltage temporarily stored in the analog memory 123 matches the first syllable of the keyword. In other words, in one embodiment, when the voice recognition circuit 124 determines that the analog sampling voltage matches multiple syllables of the keyword in a specific order based on the artificial neural network, the voice recognition circuit 124 determines that the keyword is detected.

舉例而言，以關鍵詞為「ok！google」為例，此關鍵詞會包括4個音節「o」、「k」、「goo」、「gle」。語音辨識電路124可先依據對應至「o」的第一神經網路權重數據來判定第1筆至第i筆類比取樣電壓是否符合關鍵詞的第一音節「o」。若是，語音辨識電路124可依據對應至「k」的第二神經網路權重數據來判定第(i+1)筆至第2i筆類比取樣電壓是否符合關鍵詞的第二音節「k」。若否，語音辨識電路124可再次依據對應至「o」的第一神經網路權重數據來判定第(i+1)筆至第2i筆類比取樣電壓是否符合關鍵詞的第一音節「o」。 For example, taking the keyword "ok! google" as an example, this keyword would include 4 syllables "o", "k", "goo", and "gle". The speech recognition circuit 124 may first determine whether the first to i-th analog sampled voltages match the first syllable "o" of the keyword according to the first neural network weight data corresponding to "o". If so, the speech recognition circuit 124 can determine whether the (i+1)th to 2ith analog sampling voltages match the second syllable "k" of the keyword according to the second neural network weight data corresponding to "k". If not, the speech recognition circuit 124 can again determine whether the (i+1)th to 2ith analog sampling voltages match the first syllable of the keyword "o" according to the first neural network weight data corresponding to "o". .

若語音辨識電路124判定第(i+1)筆至第2i筆類比取樣電壓未符合關鍵詞的第二音節「k」，語音辨識電路124可再次依據對應至「o」的第一神經網路權重數據來判定第(2i+1)筆至第3i筆類比取樣電壓是否符合關鍵詞的第一音節「o」。若語音辨識電路124判定第(i+1)筆至第2i筆類比取樣電壓符合關鍵詞的第二音節「k」，語音辨識電路124接著可依據對應至「goo」的第三神經網路權重數據來判定第(2i+1)筆至第3i筆類比取樣電壓是否符合關鍵詞的第三音節「goo」。 If the voice recognition circuit 124 determines that the (i+1)th to 2ith analog sampling voltages do not match the second syllable "k" of the keyword, the speech recognition circuit 124 can again rely on the first neural network corresponding to "o" The weight data is used to determine whether the (2i+1)th to 3ith analog sampling voltages match the first syllable "o" of the keyword. If the speech recognition circuit 124 determines that the (i+1)th to 2ith analog sampling voltages match the second syllable "k" of the keyword, the speech recognition circuit 124 can then base on the third neural network weight corresponding to "goo" Data to determine whether the analog sampling voltage from the (2i+1)th to the 3ith The third syllable of the key word "goo".

若語音辨識電路124判定第(2i+1)筆至第3i筆類比取樣電壓未符合關鍵詞的第三音節「goo」，語音辨識電路124可再次依據對應至「o」的第一神經網路權重數據來判定第(3i+1)筆至第4i筆類比取樣電壓是否符合關鍵詞的第一音節「o」。若語音辨識電路124判定第(2i+1)筆至第3i筆類比取樣電壓符合關鍵詞的第三音節「goo」，語音辨識電路124接著可依據對應至「gle」的第四神經網路權重數據來判定第(3i+1)筆至第4i筆類比取樣電壓是否符合關鍵詞的第四音節「gle」。 If the voice recognition circuit 124 determines that the (2i+1)th to 3ith analog sampling voltages do not match the third syllable "goo" of the keyword, the voice recognition circuit 124 can again rely on the first neural network corresponding to "o" The weight data is used to determine whether the (3i+1)th to 4ith analog sampling voltages match the first syllable "o" of the keyword. If the speech recognition circuit 124 determines that the (2i+1)th to 3ith analog sample voltages match the third syllable of the keyword "goo", the speech recognition circuit 124 can then base on the fourth neural network weight corresponding to "gle" The data is used to determine whether the analog sampling voltage from the (3i+1)th to the 4ith matches the fourth syllable "gle" of the keyword.

若語音辨識電路124判定第(3i+1)筆至第4i筆類比取樣電壓未符合關鍵詞的第四音節「gle」，語音辨識電路124可再次依據對應至「o」的第一神經網路權重數據來判定第(4i+1)筆至第5i筆類比取樣電壓是否符合關鍵詞的第一音節「o」。若語音辨識電路124判定第(3i+1)筆至第4i筆類比取樣電壓符合關鍵詞的第四音節「gle」，語音辨識電路124可判定檢測到關鍵詞「ok！google」。 If the speech recognition circuit 124 determines that the (3i+1)th to 4ith analog sampling voltages do not match the fourth syllable "gle" of the keyword, the speech recognition circuit 124 can again rely on the first neural network corresponding to "o" The weight data is used to determine whether the (4i+1)th to 5ith analog sampling voltages match the first syllable "o" of the keyword. If the voice recognition circuit 124 determines that the (3i+1)th to 4ith analog sample voltages match the fourth syllable of the keyword "gle", the voice recognition circuit 124 may determine that the keyword "ok! google" is detected.

於一實施例中，若語音辨識電路124判定並未檢測到關鍵詞，訊號處理器120可維持操作於關鍵詞檢測模式中。相對的，若語音辨識電路124判定檢測到關鍵詞，訊號處理器120可從關鍵詞檢測模式切換為話語收音模式而致能高功耗元件121。 In one embodiment, if the voice recognition circuit 124 determines that the keyword is not detected, the signal processor 120 can maintain the operation in the keyword detection mode. In contrast, if the voice recognition circuit 124 determines that a keyword is detected, the signal processor 120 can switch from the keyword detection mode to the speech radio mode to enable the high power consumption component 121.

舉例而言，於一實施例中，語音辨識電路124可提供通知訊號給訊號處理器120中的電源控制電路，好讓電源控制電路決定是否將電池130的電力供應至高功耗元件121。由此可知，類比取樣電路122、類比式記憶體123，以及語音辨識電路124可於關鍵詞檢測模式中持續偵測使用者是否說出關鍵詞。當語音辨識電路124判定檢測到關鍵詞時，無線麥克風裝置100才會使用高功耗元件121來處理類比聲音訊號以及將經處理數位音訊數據傳輸至終端裝置200。 For example, in one embodiment, the voice recognition circuit 124 can provide a notification signal to the power control circuit in the signal processor 120 so that the power control circuit can decide whether to supply power from the battery 130 to the high power consumption component 121. It can be seen that the class The ratio sampling circuit 122, the analog memory 123, and the voice recognition circuit 124 can continuously detect whether the user utters a keyword in the keyword detection mode. When the voice recognition circuit 124 determines that the keyword is detected, the wireless microphone device 100 uses the high-power component 121 to process the analog audio signal and transmit the processed digital audio data to the terminal device 200.

圖4是依照本新型創作一實施例的語音助理系統的示意圖。圖5是依照本新型創作一實施例的語音助理系統的使用情境示意圖。請參照圖4與圖5，除了相似於圖1實施例的無線麥克風裝置100與終端裝置200之外，語音助理系統10可更包括耳機300。耳機300適於配戴於使用者的耳部，並可播放來自終端裝置200的音訊數據。 Fig. 4 is a schematic diagram of a voice assistant system according to an embodiment of the present invention. Fig. 5 is a schematic diagram of a usage scenario of a voice assistant system according to an embodiment of the new creation. Referring to FIGS. 4 and 5, in addition to the wireless microphone device 100 and the terminal device 200 similar to the embodiment of FIG. 1, the voice assistant system 10 may further include a headset 300. The earphone 300 is suitable for being worn on the ear of a user and can play audio data from the terminal device 200.

於一實施例中，當使用者沒有意圖要使用語音助理程序時，即便使用者一直說話，但無線麥克風裝置100的訊號處理器120還是維持操作於關鍵詞檢測模式中，而不會浪費電力來進行數位音訊處理以及傳輸數據至終端裝置200。當使用者想要使用語音助理程序進行資料搜尋時，使用者可以極低音量先說出關鍵詞。反應於偵測到關鍵詞，無線麥克風裝置100中操作於關鍵詞檢測模式的訊號處理器120可切換為操作於話語收音模式而啟動高功耗元件121。 In one embodiment, when the user does not intend to use the voice assistant program, even if the user keeps talking, the signal processor 120 of the wireless microphone device 100 still operates in the keyword detection mode without wasting power. Perform digital audio processing and transmit data to the terminal device 200. When the user wants to use the voice assistant program to search for data, the user can speak the key words at a very low volume. In response to the detection of the keyword, the signal processor 120 operating in the keyword detection mode in the wireless microphone device 100 can be switched to operate in the speech radio mode to activate the high power consumption component 121.

接著，使用者可以極低音量說出詢問問題，此時，高功耗元件121已經被啟用來對類比聲音訊號進行音訊處理而產生經處理數位音訊數據。經處理數位音訊數據可發送至終端裝置200，致使終端裝置200的語音助理可依據經處理數位音訊數據進行語音辨識並執行資訊搜索。最後，終端裝置200可將使用者詢問問題的回答回傳至耳機300，並由耳機300播放回答給使用者。在此情況下，使用者可在不打擾他人或甚至是他人沒有察覺的情況下使用語音助理來查詢資料。 Then, the user can speak the question at a very low volume. At this time, the high-power component 121 has been activated to perform audio processing on the analog audio signal to generate processed digital audio data. The processed digital audio data can be sent to the terminal device 200, As a result, the voice assistant of the terminal device 200 can perform voice recognition and perform information search based on the processed digital audio data. Finally, the terminal device 200 can return the answer to the user's question to the earphone 300, and the earphone 300 can play the answer to the user. In this case, the user can use the voice assistant to query data without disturbing others or even without being aware of them.

圖6是依照本新型創作一實施例的無線麥克風裝置的示意圖。請參照圖6，相較於圖3實施例，於本實施例中，高功耗元件121可包括功率放大器121a、類比數位轉換器121b，以及數位信號處理器121c。功率放大器121a、類比數位轉換器121b，以及數位信號處理器121c用以根據麥克風模組110提供的類比聲音訊來產生經處理數位音訊數據。 Fig. 6 is a schematic diagram of a wireless microphone device according to an embodiment of the invention. Referring to FIG. 6, compared with the embodiment in FIG. 3, in this embodiment, the high power consumption component 121 may include a power amplifier 121a, an analog-to-digital converter 121b, and a digital signal processor 121c. The power amplifier 121a, the analog-to-digital converter 121b, and the digital signal processor 121c are used to generate processed digital audio data according to the analog audio signal provided by the microphone module 110.

相對於類比取樣電路122、類比式記憶體123，以及語音辨識電路124，功率放大器121a、類比數位轉換器121b，以及數位信號處理器121c的操作需要消耗相對高的電力。然而，由於本新型創作實施例的功率放大器121a、類比數位轉換器121b，以及數位信號處理器121c可僅於話語收音模式中被啟動，因而使得無線麥克風裝置100的續航力可大幅提昇。 Compared with the analog sampling circuit 122, the analog memory 123, and the speech recognition circuit 124, the operation of the power amplifier 121a, the analog-to-digital converter 121b, and the digital signal processor 121c requires relatively high power consumption. However, since the power amplifier 121a, the analog-to-digital converter 121b, and the digital signal processor 121c of the creative embodiment of the present invention can only be activated in the speech radio mode, the endurance of the wireless microphone device 100 can be greatly improved.

綜上所述，於本新型創作實施例中，在使用者沒有說出關鍵詞的情況下，無線麥克風裝置可維持操作於關鍵詞檢測模式，並利用功耗較低的類比電路來偵測使用者是否說出關鍵詞。反應於使用者說出關鍵詞，無線麥克風裝置才切換為操作於話語收音模式而啟用高功耗元件。接著，無線麥克風裝置可利用高功耗元件進行進行數位音訊處理而產生經處理音訊資料，接著將經處理音訊資料發送給終端裝置。基此，高功耗元件只會在需要時刻被啟動而消耗電力，因而使得無線麥克風裝置不會很快地將電池的電量使用完畢，從而大幅延長無線麥克風裝置的續航力。藉此，與此無線麥克風裝置搭配使用的語音助理程序的應用範圍可更加不受到限制，使用者可更隨心所欲地使用語音助理。 In summary, in the creative embodiment of the present invention, the wireless microphone device can maintain operation in the keyword detection mode when the user does not speak the keyword, and use the analog circuit with lower power consumption to detect the usage. Whether the person said the key words. In response to the user uttering a keyword, the wireless microphone device switches to operate in the speech radio mode to enable high-power components. Then, the wireless microphone device can use high-power The consumer performs digital audio processing to generate processed audio data, and then sends the processed audio data to the terminal device. Based on this, the high-power components will only be activated when needed to consume power, so that the wireless microphone device will not quickly use up the battery power, thereby greatly extending the endurance of the wireless microphone device. In this way, the application range of the voice assistant program used with the wireless microphone device can be more unrestricted, and the user can use the voice assistant more freely.

最後應說明的是：以上各實施例僅用以說明本新型創作的技術方案，而非對其限制；儘管參照前述各實施例對本新型創作進行了詳細的說明，本領域的普通技術人員應當理解：其依然可以對前述各實施例所記載的技術方案進行修改，或者對其中部分或者全部技術特徵進行等同替換；而這些修改或者替換，並不使相應技術方案的本質脫離本新型創作各實施例技術方案的範圍。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the new creation, not to limit it; although the new creation is described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand : It can still modify the technical solutions recorded in the foregoing embodiments, or equivalently replace some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the various embodiments of the invention The scope of the technical solution.

10:語音助理系統 10: Voice assistant system

100:無線麥克風裝置 100: Wireless microphone device

200:終端裝置 200: terminal device

110:麥克風模組 110: Microphone module

111:膜片 111: diaphragm

120:訊號處理器 120: signal processor

130:電池 130: battery

121:高功耗元件 121: High-power components

Claims

A voice assistant system includes: a microphone module adapted to be worn on a user, and through a diaphragm to sense the voice in the user’s throat, and to respond to the user’s throat Internal sound generation generates an analog sound signal; and a signal processor operating in a speech radio mode or a keyword detection mode, wherein the power consumption of the signal processor operating in the speech radio mode is higher than that of the signal processing The power consumption of the device operating in the keyword detection mode, wherein, when the signal processor is operating in the keyword detection mode, the signal processor performs keywords based on multiple analog sample voltages of the analog audio signal The detection is in response to detecting a keyword in the keyword detection mode, and the signal processor switches from the keyword detection mode to the speech radio mode.

The voice assistant system according to claim 1, wherein the microphone module is in contact with the skin of the user and is suitable for being worn on the throat or behind the ear of the user.

The voice assistant system according to claim 1, wherein the voice in the throat is a sound wave that cannot be heard by others.

The voice assistant system according to claim 1, wherein the signal processor is based on an artificial neural network to perform the keyword detection based on a plurality of analog sampling voltages of the analog audio signal.

The voice assistant system according to claim 1, wherein the signal processor includes a high power consumption component, and the signal processor switches from the keyword detection mode to the speech radio mode to activate the high power consumption element.

The voice assistant system according to claim 5, wherein in response to the keyword detection mode not detecting the keyword, the signal processor maintains operation in the keyword detection mode and disables the high Power components.

The voice assistant system according to claim 5, wherein after switching to the speech radio mode, the signal processor uses the high-power component to perform audio processing on the analog sound signal to generate processed digital audio data .

The voice assistant system according to claim 5, wherein the high power consumption component includes an analog-to-digital converter, a digital signal processor, a power amplifier, or a combination thereof.

The voice assistant system according to claim 1, further comprising: a wireless transceiver, coupled to the signal processor, to establish a wireless communication link with a terminal device to connect the signal operating in the speech radio mode The processed digital audio data generated by the processor is transmitted to the terminal device.

The voice assistant system according to claim 1, wherein the signal processor includes: a voice recognition circuit that performs feature extraction on the analog sampled voltage based on an artificial neural network to determine whether the key is detected word.

The voice assistant system according to claim 10, wherein the signal processor further includes: An analog sampling circuit is coupled to the microphone module to sample the analog audio signal to generate a plurality of analog sampling voltages; and an analog memory is coupled to the analog sampling circuit to record the analog sampling voltage.

The voice assistant system according to claim 11, wherein the analog memory includes a charge coupled device (CCD) memory or a phase-change memory (PCM).

The voice assistant system according to claim 10, wherein the voice recognition circuit determines whether multiple first sample voltages in the analog sample voltage match the first syllable of the keyword based on the artificial neural network, wherein , In response to determining based on the artificial neural network that the first sampled voltage in the analog sampled voltage matches the first syllable of the keyword, the speech recognition circuit determines the first syllable based on the artificial neural network Whether the multiple second sampling voltages in the analog sampling voltage match the second syllable of the keyword.

The voice assistant system according to claim 13, wherein the voice recognition circuit uses the first neural network weight data to determine whether the first sampled voltage in the analog sampled voltage matches the first sampled voltage of the keyword And use the second neural network weight data to determine whether the second sampled voltage in the analog sampled voltage matches the second syllable of the keyword.

The voice assistant system according to claim 10, wherein when the voice recognition circuit determines based on the artificial neural network that the analog sampling voltage is in accordance with a If the specific sequence matches the multiple syllables of the keyword, the speech recognition circuit determines that the keyword is detected.

The voice assistant system according to claim 1, wherein the diaphragm is a pressure-sensitive membrane, and is used to generate vibration in response to sound produced in the throat.

The voice assistant system according to claim 1, further comprising another diaphragm, and the diaphragm is different in size from the another diaphragm.

A voice assistant system includes: a terminal device; a microphone module, suitable for being worn on a user, and through a diaphragm to sense the voice in the throat of the user, and respond to the user The voice in the throat generates an analog sound signal; and a signal processor operating in a speech radio mode or a keyword detection mode, wherein the power consumption of the signal processor operating in the speech radio mode is higher than The power consumption of the signal processor operating in the keyword detection mode, wherein, when the signal processor operates in the keyword detection mode, the signal processor is based on multiple analog samples of the analog audio signal The voltage performs keyword detection, in response to detecting a keyword in the keyword detection mode, the signal processor switches from the keyword detection mode to the speech radio mode, and then switches to the speech radio mode After that, the signal processor performs audio processing on the analog audio signal to generate processed digital audio data, wherein the signal processor provides the processed digital audio data to all A voice assistant program run by the terminal device.

The voice assistant system according to claim 18, wherein the microphone module is in contact with the skin of the user and is suitable for being worn on the throat or behind the ears of the user.

The voice assistant system according to claim 18, wherein the voice in the throat is a blessing of sound waves that cannot be heard by others.

The voice assistant system according to claim 18, wherein the signal processor is based on an artificial neural network to perform the keyword detection based on multiple analog sample voltages of the analog sound signal.

The voice assistant system according to claim 18, wherein the signal processor includes a high power consumption component, and the signal processor switches from the keyword detection mode to the speech radio mode to activate the high power consumption element.

The voice assistant system according to claim 22, wherein in response to the keyword detection mode not detecting the keyword, the signal processor maintains operation in the keyword detection mode and disables the high Power components.

The voice assistant system according to claim 22, wherein after switching to the speech radio mode, the signal processor uses the high power consumption component to perform the audio processing on the analog sound signal.

The voice assistant system according to claim 22, wherein the high power consumption component includes an analog-to-digital converter, a digital signal processor, a power amplifier, or a combination thereof.

The voice assistant system according to claim 18, which further includes a wireless transceiver, the wireless transceiver is coupled to the signal processor and establishes a wireless communication link with the terminal device, so as to operate on the speech radio The processed digital audio data generated by the signal processor in the mode is transmitted to the terminal device.

The voice assistant system according to claim 18, wherein the signal processor includes: a voice recognition circuit that performs feature extraction on the analog sampled voltage based on an artificial neural network to determine whether the key is detected word.

The voice assistant system according to claim 27, wherein the signal processor further includes: an analog sampling circuit, coupled to the microphone module, to sample the analog sound signal to generate a plurality of analog sampling voltages; and an analog A memory, coupled to the analog sampling circuit, records the analog sampling voltage.

The voice assistant system according to claim 28, wherein the analog memory includes a charge coupled device (CCD) memory or a phase-change memory (PCM).

The voice assistant system according to claim 27, wherein the voice recognition circuit determines whether a plurality of first sampled voltages in the analog sampled voltage match the first syllable of the keyword based on the artificial neural network, wherein , In response to determining based on the artificial neural network that the first sampled voltage in the analog sampled voltage matches the first syllable of the keyword, the language The sound recognition circuit determines whether multiple second sample voltages in the analog sample voltage match the second syllable of the keyword based on the artificial neural network.

The voice assistant system according to claim 30, wherein the voice recognition circuit uses the first neural network weight data to determine whether the first sampling voltage in the analog sampling voltage matches the first sampling voltage of the keyword And use the second neural network weight data to determine whether the second sampled voltage in the analog sampled voltage matches the second syllable of the keyword.

The voice assistant system according to claim 27, wherein when the voice recognition circuit determines based on the artificial neural network that the analog sampling voltage matches the multiple syllables of the keyword in a specific order, the voice recognition circuit It is determined that the keyword is detected.

The voice assistant system according to claim 18, wherein the diaphragm is a pressure-sensitive membrane, and is used to generate vibration in response to sound produced in the throat.

The voice assistant system according to claim 18 further includes another diaphragm, and the size of the diaphragm is different from the another diaphragm.