TW202422535A - Language data processing system and method and computer program product - Google Patents

Language data processing system and method and computer program product Download PDF

Info

Publication number
TW202422535A
TW202422535A TW111145456A TW111145456A TW202422535A TW 202422535 A TW202422535 A TW 202422535A TW 111145456 A TW111145456 A TW 111145456A TW 111145456 A TW111145456 A TW 111145456A TW 202422535 A TW202422535 A TW 202422535A
Authority
TW
Taiwan
Prior art keywords
intent
designated
processing unit
specified
intentions
Prior art date
Application number
TW111145456A
Other languages
Chinese (zh)
Other versions
TWI847393B (en
Inventor
黃信橋
馬世英
沈書緯
Original Assignee
犀動智能科技股份有限公司
Filing date
Publication date
Application filed by 犀動智能科技股份有限公司 filed Critical 犀動智能科技股份有限公司
Priority to JP2023070061A priority Critical patent/JP2024077568A/en
Publication of TW202422535A publication Critical patent/TW202422535A/en
Application granted granted Critical
Publication of TWI847393B publication Critical patent/TWI847393B/en

Links

Images

Abstract

一種由語言資料處理系統實施的語言資料處理方法,其包含:(A)利用一語言處理模型從一語音文字資料中辨識出多個被該語音文字資料所表達出的指定意圖,並根據該語音文字資料中的語彙將每一指定意圖判定為一明確指定意圖及一模糊指定意圖的其中一者,其中,每一指定意圖與多個意圖標籤中的其中一個意圖標籤相對應;(B)從該等明確指定意圖中決定出其中N個目標指定意圖,並且執行每一目標指定意圖所對應之該意圖標籤所對應的一控制程序,其中,N為大於等於1的整數。A language data processing method implemented by a language data processing system comprises: (A) using a language processing model to identify multiple designated intents expressed by a speech data, and determining each designated intent as one of a clear designated intent and a fuzzy designated intent according to the vocabulary in the speech data, wherein each designated intent corresponds to one of a plurality of intent labels; (B) determining N target designated intents from the clear designated intents, and executing a control program corresponding to the intent label corresponding to each target designated intent, wherein N is an integer greater than or equal to 1.

Description

語言資料處理系統及方法與電腦程式產品Language data processing system and method and computer program product

本發明是有關於一種資料處理系統,特別是指一種適合對使用者之口述內容進行處理的語言資料處理系統。本發明還有關於一種適合對使用者之口述內容進行處理的語言資料處理方法,以及一種用於使電子裝置能夠實施該語言資料處理方法的電腦程式產品。The present invention relates to a data processing system, and more particularly to a language data processing system suitable for processing oral contents of a user. The present invention also relates to a language data processing method suitable for processing oral contents of a user, and a computer program product for enabling an electronic device to implement the language data processing method.

隨著自然語言處理技術的發展,現有技術中有愈來愈多電子裝置的聲控功能不再受限於預設好的特定語音指令,而允許使用者以更加口語化的表達方式來描述其需求。With the development of natural language processing technology, the voice control functions of more and more electronic devices in the existing technology are no longer limited to preset specific voice commands, but allow users to describe their needs in a more colloquial way.

以一般人的說話習慣而言,用一句話一次表達多個需求顯然是較為方便的表達方式,但是,此種表達方式會使得語句較長,且可能會有部分的需求被描述得不夠清楚,而造成現有技術在進行意圖分析時產生阻礙。因此,如何在自然語言處理技術的基礎上更好地對使用者一次表達出的多個需求進行處理,便成為本發明所欲探討的議題。In terms of the speaking habits of ordinary people, it is obviously a more convenient way of expression to express multiple requirements at once in one sentence. However, this way of expression will make the sentence longer, and some requirements may not be described clearly enough, which will cause obstacles to the existing technology when performing intent analysis. Therefore, how to better process multiple requirements expressed by users at one time based on natural language processing technology has become an issue to be explored by the present invention.

因此,本發明的其中一目的,便在於提供一種能對現有技術提供改善的語言資料處理系統。Therefore, one of the objects of the present invention is to provide a language data processing system that can provide improvements to the prior art.

本發明語言資料處理系統包含一處理單元及一電連接該處理單元的儲存單元,該儲存單元儲存有一利用機器學習技術實現的語言處理模型,該語言處理模型包括多個意圖標籤,且每一意圖標籤對應於一相關於該處理單元之運作方式的控制程序。其中,該處理單元用於:利用該語言處理模型從一語音文字資料中辨識出多個被該語音文字資料所表達出的指定意圖,並根據該語音文字資料中的語彙將每一指定意圖判定為一明確指定意圖及一模糊指定意圖的其中一者,其中,每一指定意圖與該等意圖標籤中的其中一個意圖標籤相對應;從該等明確指定意圖中決定出其中N個目標指定意圖,並且執行每一目標指定意圖所對應之該意圖標籤所對應的該控制程序,其中,N為大於等於1的整數。The language data processing system of the present invention comprises a processing unit and a storage unit electrically connected to the processing unit. The storage unit stores a language processing model implemented by machine learning technology. The language processing model comprises a plurality of intention labels, and each intention label corresponds to a control program related to the operation mode of the processing unit. The processing unit is used to: use the language processing model to identify multiple designated intentions expressed by a speech data, and determine each designated intention as one of a clear designated intention and a fuzzy designated intention according to the vocabulary in the speech data, wherein each designated intention corresponds to one of the intention labels; determine N target designated intentions from the clear designated intentions, and execute the control program corresponding to the intention label corresponding to each target designated intention, wherein N is an integer greater than or equal to 1.

在本發明語言資料處理系統的一些實施態樣中,對於每一指定意圖,該處理單元將該指定意圖判定為該明確指定意圖或該模糊指定意圖的方式包含:該處理單元判斷該語音文字資料中是否存在一或多個能供該處理單元據以執行與該指定意圖相關之該控制程序的關鍵語彙,若該處理單元判斷出該語音文字資料中存在該(等)關鍵語彙,該處理單元將該指定意圖判定為該明確指定意圖,若該處理單元判斷出該語音文字資料中不存在關鍵語彙,該處理單元將該指定意圖判定為該模糊指定意圖。In some implementations of the language data processing system of the present invention, for each designated intent, the processing unit determines the designated intent as the clear designated intent or the fuzzy designated intent in the manner that includes: the processing unit determines whether there are one or more key words in the speech and text data that can be used by the processing unit to execute the control program related to the designated intent; if the processing unit determines that the speech and text data contains the key words, the processing unit determines the designated intent as the clear designated intent; if the processing unit determines that the key words do not exist in the speech and text data, the processing unit determines the designated intent as the fuzzy designated intent.

在本發明語言資料處理系統的一些實施態樣中,在該處理單元將每一指定意圖判定為該明確指定意圖或該模糊指定意圖之後,該處理單元還用於:在該等指定意圖的其中一或多個指定意圖被該處理單元判定為該模糊指定意圖的情況下,對於該(等)模糊指定意圖的其中至少一個模糊指定意圖,利用該語言處理模型產生一對應於該模糊指定意圖的詢問訊息,並使該詢問訊息被一輸出模組輸出;在使該詢問訊息被該輸出模組輸出之後,當獲得另一對應於另一語音輸入的語音文字資料時,判斷該另一語音文字資料的語意是否與該模糊指定意圖匹配,而能供該處理單元據以執行與該模糊指定意圖相關的該控制程序,並且,在判斷出該另一語音文字資料的語意與該模糊指定意圖匹配的情況下,將該模糊指定意圖改判定為一明確指定意圖。In some implementations of the language data processing system of the present invention, after the processing unit determines each designated intent as the clear designated intent or the fuzzy designated intent, the processing unit is further used to: when one or more of the designated intents are determined by the processing unit as the fuzzy designated intent, for at least one of the fuzzy designated intents, use the language processing model to generate a query message corresponding to the fuzzy designated intent, and make the query The message is output by an output module; after the query message is output by the output module, when another voice-text data corresponding to another voice input is obtained, it is determined whether the semantics of the other voice-text data matches the fuzzy specified intent, so that the processing unit can execute the control program related to the fuzzy specified intent, and, when it is determined that the semantics of the other voice-text data matches the fuzzy specified intent, the fuzzy specified intent is changed to a clear specified intent.

在本發明語言資料處理系統的一些實施態樣中,該處理單元從該等明確指定意圖中決定出該N個目標指定意圖的方式包含:該處理單元判斷該等明確指定意圖中是否存在其中多個彼此衝突而被共同作為一群衝突意圖的明確指定意圖,並且,在判斷出該等明確指定意圖中存在該群衝突意圖的情況下,該處理單元僅將該群衝突意圖的其中單一個明確指定意圖作為該N個目標指定意圖的其中一個目標指定意圖。In some implementation aspects of the language data processing system of the present invention, the processing unit determines the N target specified intentions from the explicit specified intentions in a manner that includes: the processing unit determines whether there are multiple explicit specified intentions among the explicit specified intentions that conflict with each other and are collectively regarded as a group of conflicting intentions, and, when it is determined that the group of conflicting intentions exists among the explicit specified intentions, the processing unit only uses a single explicit specified intention among the group of conflicting intentions as one of the target specified intentions of the N target specified intentions.

在本發明語言資料處理系統的一些實施態樣中,該等意圖標籤的其中一個意圖標籤被設定為一排他意圖標籤,並且,該處理單元從該等明確指定意圖中決定出該N個目標指定意圖的方式包含:該處理單元判斷該等明確指定意圖中是否有其中一個明確指定意圖所對應的意圖標籤為該排他意圖標籤,若判斷結果為是,該處理單元將對應於該排他意圖標籤的該明確指定意圖作為唯一一個目標指定意圖。In some implementations of the language data processing system of the present invention, one of the intent labels is set as an exclusive intent label, and the processing unit determines the N target specified intentions from the explicitly specified intentions in a manner that includes: the processing unit determines whether the intent label corresponding to one of the explicitly specified intentions is the exclusive intent label; if the determination result is yes, the processing unit uses the explicitly specified intent corresponding to the exclusive intent label as the only target specified intention.

在本發明語言資料處理系統的一些實施態樣中,該等意圖標籤之間存在順序性,並且,在該處理單元所決定出之目標指定意圖的數量為多個的情況下,該處理單元是根據該等目標指定意圖所分別對應之該等意圖標籤之間的順序,而依序地逐一執行該等目標指定意圖所對應之該等意圖標籤所對應的該等控制程序。In some implementations of the language data processing system of the present invention, there is an order between the intent labels, and when the number of target designated intentions determined by the processing unit is multiple, the processing unit executes the control programs corresponding to the intent labels corresponding to the target designated intentions one by one in sequence according to the order between the intent labels respectively corresponding to the target designated intentions.

本發明的另一目的,在於提供一種能對現有技術提供改善的語言資料處理方法。Another object of the present invention is to provide a language data processing method that can provide improvement over the prior art.

本發明語言資料處理方法由一語言資料處理系統實施,該語言資料處理系統包含一處理單元及一電連接該處理單元的儲存單元,該儲存單元儲存有一利用機器學習技術實現的語言處理模型,該語言處理模型包括多個意圖標籤,且每一意圖標籤對應於一相關於該處理單元之運作方式的控制程序;該語言資料處理方法包含:(A)該處理單元利用該語言處理模型從一語音文字資料中辨識出多個被該語音文字資料所表達出的指定意圖,並根據該語音文字資料中的語彙將每一指定意圖判定為一明確指定意圖及一模糊指定意圖的其中一者,其中,每一指定意圖與該等意圖標籤中的其中一個意圖標籤相對應;(B)該處理單元從該等明確指定意圖中決定出其中N個目標指定意圖,並且執行每一目標指定意圖所對應之該意圖標籤所對應的該控制程序,其中,N為大於等於1的整數。The language data processing method of the present invention is implemented by a language data processing system. The language data processing system includes a processing unit and a storage unit electrically connected to the processing unit. The storage unit stores a language processing model implemented by machine learning technology. The language processing model includes a plurality of intention labels, and each intention label corresponds to a control program related to the operation mode of the processing unit. The language data processing method includes: (A) the processing unit uses the language processing model to identify a plurality of intention labels from a speech data; (A) the processing unit determines N target designated intentions from the clear designated intentions, and executes the control program corresponding to the intention label corresponding to each target designated intention, wherein N is an integer greater than or equal to 1.

在本發明語言資料處理方法的一些實施態樣中,在步驟(A)中,對於每一指定意圖,該處理單元將該指定意圖判定為該明確指定意圖或該模糊指定意圖的方式包含:該處理單元判斷該語音文字資料中是否存在一或多個能供該處理單元據以執行與該指定意圖相關之該控制程序的關鍵語彙,若該處理單元判斷出該語音文字資料中存在該(等)關鍵語彙,該處理單元將該指定意圖判定為該明確指定意圖,若該處理單元判斷出該語音文字資料中不存在關鍵語彙,該處理單元將該指定意圖判定為該模糊指定意圖。In some implementations of the language data processing method of the present invention, in step (A), for each designated intent, the processing unit determines the designated intent as the clear designated intent or the fuzzy designated intent in a manner that includes: the processing unit determines whether there are one or more key words in the speech and text data that can be used by the processing unit to execute the control program related to the designated intent; if the processing unit determines that the speech and text data contains the key words, the processing unit determines the designated intent as the clear designated intent; if the processing unit determines that the speech and text data does not contain the key words, the processing unit determines the designated intent as the fuzzy designated intent.

在本發明語言資料處理方法的一些實施態樣中,該語言資料處理方法還包含介於步驟(A)及(B)之間的:(C)在該等指定意圖的其中一或多個指定意圖被該處理單元判定為該模糊指定意圖的情況下,對於該(等)模糊指定意圖的其中至少一個模糊指定意圖,該處理單元利用該語言處理模型產生一對應於該模糊指定意圖的詢問訊息,並使該詢問訊息被一輸出模組輸出;(D)在該處理單元使該詢問訊息被該輸出模組輸出之後,當該處理單元獲得另一對應於另一語音輸入的語音文字資料時,該處理單元判斷該另一語音文字資料的語意是否與該模糊指定意圖匹配,而能供該處理單元據以執行與該模糊指定意圖相關的該控制程序,並且,在該處理單元判斷出該另一語音文字資料的語意與該模糊指定意圖匹配的情況下,該處理單元將該模糊指定意圖改判定為一明確指定意圖。In some embodiments of the language data processing method of the present invention, the language data processing method further includes between steps (A) and (B): (C) when one or more of the specified intents are determined by the processing unit to be the fuzzy specified intent, for at least one of the fuzzy specified intents, the processing unit generates a query message corresponding to the fuzzy specified intent using the language processing model, and causes the query message to be output by an output module; (D) in the processing After the query message is output by the output module, when the processing unit obtains another voice-text data corresponding to another voice input, the processing unit determines whether the semantics of the other voice-text data matches the fuzzy designated intent, so that the processing unit can execute the control program related to the fuzzy designated intent. Moreover, when the processing unit determines that the semantics of the other voice-text data matches the fuzzy designated intent, the processing unit changes the fuzzy designated intent into a clear designated intent.

在本發明語言資料處理方法的一些實施態樣中,在步驟(B)中,該處理單元從該等明確指定意圖中決定出該N個目標指定意圖的方式包含:該處理單元判斷該等明確指定意圖中是否存在其中多個彼此衝突而被共同作為一群衝突意圖的明確指定意圖,並且,在判斷出該等明確指定意圖中存在該群衝突意圖的情況下,該處理單元僅將該群衝突意圖的其中單一個明確指定意圖作為該N個目標指定意圖的其中一個目標指定意圖。In some implementation aspects of the language data processing method of the present invention, in step (B), the processing unit determines the N target specified intentions from the explicit specified intentions in a manner that includes: the processing unit determines whether there are multiple explicit specified intentions among the explicit specified intentions that conflict with each other and are collectively regarded as a group of conflicting intentions, and, when it is determined that the group of conflicting intentions exists among the explicit specified intentions, the processing unit only uses a single explicit specified intention of the group of conflicting intentions as one of the target specified intentions of the N target specified intentions.

在本發明語言資料處理方法的一些實施態樣中,該等意圖標籤的其中一個意圖標籤被設定為一排他意圖標籤;其中,在步驟(B)中,該處理單元從該等明確指定意圖中決定出該N個目標指定意圖的方式包含:該處理單元判斷該等明確指定意圖中是否有其中一個明確指定意圖所對應的意圖標籤為該排他意圖標籤,若判斷結果為是,該處理單元將對應於該排他意圖標籤的該明確指定意圖作為唯一一個目標指定意圖。In some implementations of the language data processing method of the present invention, one of the intent labels is set as an exclusive intent label; wherein, in step (B), the processing unit determines the N target specified intentions from the explicitly specified intentions in a manner that includes: the processing unit determines whether the intent label corresponding to one of the explicitly specified intentions is the exclusive intent label; if the determination result is yes, the processing unit uses the explicitly specified intent corresponding to the exclusive intent label as the only target specified intention.

在本發明語言資料處理方法的一些實施態樣中,該等意圖標籤之間存在順序性;其中,在步驟(B)中,在該處理單元所決定出之目標指定意圖的數量為多個的情況下,該處理單元是根據該等目標指定意圖所分別對應之該等意圖標籤之間的順序,而依序地逐一執行該等目標指定意圖所對應之該等意圖標籤所對應的該等控制程序。In some implementations of the language data processing method of the present invention, there is an order between the intent labels; wherein, in step (B), when the number of target designated intents determined by the processing unit is multiple, the processing unit executes the control programs corresponding to the intent labels corresponding to the target designated intents one by one in sequence according to the order between the intent labels respectively corresponding to the target designated intents.

本發明的再一目的,在於提供一種能對現有技術提供改善的電腦程式產品。Another object of the present invention is to provide a computer program product that can provide improvements to the prior art.

本發明電腦程式產品包含一利用機器學習技術實現的語言處理模型,其中,該語言處理模型包括多個意圖標籤,且每一意圖標籤對應於一相關於一電子裝置之運作方式的控制程序,該電腦程式產品用於被該電子裝置載入並運行,以使該電子裝置實施如前述任一實施態樣中所述的語言資料處理方法。The computer program product of the present invention includes a language processing model implemented using machine learning technology, wherein the language processing model includes multiple intention labels, and each intention label corresponds to a control program related to the operation mode of an electronic device. The computer program product is used to be loaded and run by the electronic device so that the electronic device implements the language data processing method described in any of the aforementioned implementations.

本發明之功效在於:該語言資料處理系統在獲得該語音文字資料並從其中辨識出該等指定意圖之後,能夠根據該語音文字資料的語彙而將每一指定意圖歸類為明確指定意圖或者模糊指定意圖,並且僅根據該(等)明確指定意圖來執行相關的控制程序,所以,即便該語音文字資料所表達出的一些指定意圖不夠清楚(即存在模糊指定意圖),該語言資料處理系統也仍能針對表達足夠清楚的指定意圖(即針對明確指定意圖)進行處理,以滿足使用者的需求。The effect of the present invention is that after the language data processing system obtains the speech and text data and identifies the designated intents therefrom, it can classify each designated intent as a clear designated intent or a fuzzy designated intent according to the vocabulary of the speech and text data, and execute the relevant control program only according to the (such) clear designated intent. Therefore, even if some designated intents expressed by the speech and text data are not clear enough (i.e., there are fuzzy designated intents), the language data processing system can still process the designated intents that are expressed clearly enough (i.e., the clear designated intents) to meet the needs of the user.

在本發明被詳細描述之前應當注意:若未特別定義,則本專利說明書中所述的「電連接」泛指多個電子設備/裝置/元件之間透過導電材料彼此相連而實現的「有線電連接」,以及透過無線通訊技術進行單/雙向無線信號傳輸的「無線電連接」。並且,本專利說明書中所述的「電連接」也泛指多個電子設備/裝置/元件之間彼此直接相連而形成的「直接電連接」,以及多個電子設備/裝置/元件之間還透過其他電子設備/裝置/元件彼此間接相連而形成的「間接電連接」。Before the invention is described in detail, it should be noted that, if not specifically defined, the "electrical connection" described in this patent specification generally refers to "wired electrical connection" achieved by connecting multiple electronic devices/devices/components to each other through conductive materials, and "radio connection" for unidirectional/bidirectional wireless signal transmission through wireless communication technology. In addition, the "electrical connection" described in this patent specification also generally refers to "direct electrical connection" formed by direct connection between multiple electronic devices/devices/components, and "indirect electrical connection" formed by indirect connection between multiple electronic devices/devices/components through other electronic devices/devices/components.

參閱圖1,本發明語言資料處理系統1的一實施例適用於供多個使用端裝置5(圖1僅示出其中一者)透過網路電連接,而使得該等使用端裝置5能各自透過網路與該語言資料處理系統1通訊。然而,為了便於描述,以下僅會以圖1所示的該使用端裝置5來輔助說明本實施例的運作方式。Referring to FIG. 1 , an embodiment of the language data processing system 1 of the present invention is applicable to multiple end devices 5 (only one of which is shown in FIG. 1 ) being electrically connected via a network, so that the end devices 5 can communicate with the language data processing system 1 via the network. However, for the convenience of description, the operation of the present embodiment will be explained below with reference to the end device 5 shown in FIG. 1 .

在本實施例的應用中,該使用端裝置5包括一處理模組51、一電連接於該處理模組51的輸入模組52,以及一電連接於該處理模組51的輸出模組53。其中,該處理模組51為一中央處理器,該輸入模組52至少具有一電連接於該處理模組51的麥克風,該輸出模組53則至少具有電連接於該處理模組51的一揚聲器及一顯示器。更具體地說,在本實施例的應用中,該使用端裝置5是一台適合被設置在一住宿設施之一客房內以供房客使用的聲控服務裝置,而且,該使用端裝置5能夠利用該輸入模組52的麥克風接收房客的語音、利用該輸出模組53的揚聲器播放用來回應房客的語音,以及利用該輸出模組53的顯示器顯示用於供房客參考的資訊。此外,該使用端裝置5的處理模組51還能以無線的方式(例如Wi-Fi或藍芽)輸出控制指令,以控制該客房內的空調、電話、電視及燈具等電子設備運作。In the application of this embodiment, the end device 5 includes a processing module 51, an input module 52 electrically connected to the processing module 51, and an output module 53 electrically connected to the processing module 51. The processing module 51 is a central processor, the input module 52 has at least a microphone electrically connected to the processing module 51, and the output module 53 has at least a speaker and a display electrically connected to the processing module 51. More specifically, in the application of this embodiment, the user device 5 is a voice control service device suitable for being installed in a guest room of an accommodation facility for use by the guest, and the user device 5 can use the microphone of the input module 52 to receive the voice of the guest, use the speaker of the output module 53 to play the voice used to respond to the guest, and use the display of the output module 53 to display information for the guest's reference. In addition, the processing module 51 of the user device 5 can also output control instructions in a wireless manner (such as Wi-Fi or Bluetooth) to control the operation of electronic equipment such as air conditioners, telephones, televisions and lamps in the guest room.

補充說明的是,在本實施例的不同應用方式中,該使用端裝置5也可以被實施為一台行動電子裝置(例如手機、平板電腦或者筆記型電腦)、桌上型電腦或者網路電視,所以,該語言資料處理系統1並不限於被應用在住宿設施客房中的聲控服務裝置。It should be noted that in different application modes of the present embodiment, the user-end device 5 can also be implemented as a mobile electronic device (such as a mobile phone, a tablet computer or a laptop), a desktop computer or an Internet-connected television. Therefore, the language data processing system 1 is not limited to being used as a voice-controlled service device in guest rooms of accommodation facilities.

在本實施例中,該語言資料處理系統1是一台伺服設備,而且,該語言資料處理系統1包含一處理單元11,以及一電連接該處理單元11的儲存單元12。其中,該處理單元11在本實施例中為一具有資料運算及處理功能的中央處理器,該儲存單元12則是一用於儲存數位資料的資料儲存裝置(例如硬碟),然而,在其他實施例中,該處理單元11亦可被實施為多個中央處理器的組合,該儲存單元12亦可被實施為不同種類的電腦可讀取記錄媒體(例如快閃記憶體),或者是多個相同/相異種類之電腦可讀取記錄媒體的組合,而並不以本實施例為限。補充說明的是,在不同的實施例中,該語言資料處理系統1亦可被實施為手機、平板電腦、筆記型電腦、桌上型電腦等不同類型的電子裝置,或者,該語言資料處理系統1亦可被實施為多台彼此電連接的伺服設備。所以,應當理解的是,該語言資料處理系統1在硬體方面的實際實施態樣並不以本實施例為限。In the present embodiment, the language data processing system 1 is a server device, and the language data processing system 1 includes a processing unit 11 and a storage unit 12 electrically connected to the processing unit 11. The processing unit 11 is a central processing unit with data calculation and processing functions in the present embodiment, and the storage unit 12 is a data storage device (such as a hard disk) for storing digital data. However, in other embodiments, the processing unit 11 can also be implemented as a combination of multiple central processing units, and the storage unit 12 can also be implemented as different types of computer-readable recording media (such as flash memory), or a combination of multiple computer-readable recording media of the same/different types, and is not limited to the present embodiment. It should be noted that in different embodiments, the language data processing system 1 can also be implemented as different types of electronic devices such as mobile phones, tablet computers, laptop computers, desktop computers, etc., or the language data processing system 1 can also be implemented as multiple server devices electrically connected to each other. Therefore, it should be understood that the actual implementation of the language data processing system 1 in terms of hardware is not limited to this embodiment.

在本實施例中,該語言資料處理系統1的儲存單元12儲存有一利用機器學習技術實現且能被該處理單元11所運行的語言處理模型M,以及多個被預先設定好而用於供該處理單元11運行以對該使用端裝置5進行控制的控制程序P。In this embodiment, the storage unit 12 of the language data processing system 1 stores a language processing model M implemented using machine learning technology and capable of being run by the processing unit 11, as well as a plurality of control programs P that are pre-set and used for the processing unit 11 to run to control the user device 5.

該語言處理模型M本身為一個能夠實現自然語言理解及生成的類神經網路,其中,該語言處理模型M包括多個被預先設定好的意圖標籤L,且每一個意圖標籤L是用來呈現出一種使用者意圖的語意特徵,亦即用來代表一種使用者意圖。更詳細地說,該語言處理模型M是在該等意圖標籤L被設定好之後,至少以多筆語句資料作為訓練資料進行機器學習而被訓練出的。其中,每一語句資料是一個以自然語言形式表達出意圖的語句,例如「幫我打電話給303號房」、「設定明天早上八點的鬧鐘」、「幫我打開電視」以及「把廁所的燈關掉」等。較佳地,其中一些語句資料可以是以自然語言形式一次表達出多個不同意圖的語句,例如「把電視跟大燈都打開,順便開冷氣」、「把音樂關掉,然後打電話給501號房」以及「放一首周杰倫的歌,然後幫我查明天台北市的天氣」等,但並不以此為限。藉由利用該等語句資料進行機器學習,該語言處理模型M在被訓練完成後能用於對文字資料進行語意分析處理,而根據該等意圖標籤L從文字資料中辨識出被文字資料所表達出的一或多個使用者意圖。The language processing model M itself is a neural network capable of realizing natural language understanding and generation, wherein the language processing model M includes a plurality of pre-set intention labels L, and each intention label L is used to present the semantic features of a user's intention, that is, to represent a user's intention. In more detail, the language processing model M is trained by machine learning using at least a plurality of sentence data as training data after the intention labels L are set. Each sentence data is a sentence expressing an intention in natural language form, such as "help me call room 303", "set the alarm for 8 o'clock tomorrow morning", "help me turn on the TV", and "turn off the light in the bathroom". Preferably, some of the sentence data may be sentences expressing multiple different intentions at once in natural language, such as "turn on the TV and headlights, and turn on the air conditioner", "turn off the music, and call room 501", and "play a Jay Chou song, and then help me check the weather in Taipei tomorrow", etc., but not limited to this. By using the sentence data for machine learning, the language processing model M can be used to perform semantic analysis on the text data after being trained, and identify one or more user intentions expressed by the text data from the text data according to the intention labels L.

對於該語言處理模型M所包括的該等意圖標籤L,每一個意圖標籤L所表示的語意特徵是代表一種使用者意圖。以本實施例而言,由於該語言資料處理系統1是被應用在住宿設施客房中的聲控服務裝置(即該使用端裝置5),所以該等意圖標籤L主要是相關於使用者在客房中可能會產生的意圖,例如「撥出電話」、「設定鬧鐘」、「開啟家電」、「關閉家電」、「查詢資料」及「播放音樂」等。然而,該語言資料處理系統1並不限於被應用在住宿設施客房中的聲控服務裝置,因此,該等意圖標籤L所代表的意圖當然也不以前述所舉之例為限。For the intention labels L included in the language processing model M, the semantic features represented by each intention label L represent a user intention. In this embodiment, since the language data processing system 1 is a voice-controlled service device (i.e., the user-end device 5) applied in the guest rooms of accommodation facilities, the intention labels L are mainly related to the intentions that the user may have in the guest rooms, such as "making a call", "setting an alarm", "turning on home appliances", "turning off home appliances", "querying data" and "playing music". However, the language data processing system 1 is not limited to the voice-controlled service device applied in the guest rooms of accommodation facilities, and therefore, the intentions represented by the intention labels L are certainly not limited to the aforementioned examples.

在本實施例中,該等意圖標籤L之間存在被預先設定好的順序性。舉一例來說,對於代表「開啟家電」及「查詢資料」的該兩意圖標籤L,該兩意圖標籤L的順序都是優先於代表「播放音樂」的意圖標籤L,前述順序的設定意義在於,若使用者同時表達了「開啟家電」、「查詢資料」及「播放音樂」等三個需求,則「開啟家電」及「查詢資料」的需求應優先被滿足,最後才輪到「播放音樂」的需求。應當理解的是,該等意圖標籤L之間的順序可依據使用者體驗或者其他經驗因素的考量而被自由設計與調整,故其實際的排序方式並非技術重點,在此不過度詳述。In this embodiment, there is a pre-set order between the intent labels L. For example, for the two intent labels L representing "turn on home appliances" and "query data", the order of the two intent labels L is higher than the intent label L representing "play music". The meaning of the setting of the above order is that if the user expresses three requirements of "turn on home appliances", "query data" and "play music" at the same time, the requirements of "turn on home appliances" and "query data" should be met first, and the requirement of "play music" should be met last. It should be understood that the order of the intent labels L can be freely designed and adjusted based on user experience or other empirical factors, so the actual ordering method is not a technical focus and will not be described in detail here.

進一步地,在本實施例中,該等意圖標籤L的其中一個意圖標籤L被設定為一個排他意圖標籤L’(示於圖1),而且,該排他意圖標籤L’所代表的使用者意圖是被作為一個具有最高優先順序而必需優先被滿足的最優先意圖。以本實施例舉例來說,該排他意圖標籤L’例如是代表「撥出電話」之使用者意圖的該意圖標籤L,然而,要將哪一種使用者意圖的意圖標籤L設定為排他意圖標籤L’可依據不同的考量及需求而被自由設定,並且,在不同的實施例中,該等意圖標籤L中也可以有其中多者分別被設定為彼此之間存在優先順序的多個排他意圖標籤L’,而並不以本實施例所舉之例為限。Furthermore, in the present embodiment, one of the intent labels L is set as an exclusive intent label L' (shown in FIG. 1), and the user intent represented by the exclusive intent label L' is regarded as a top priority intent that has the highest priority and must be satisfied first. Taking the present embodiment as an example, the exclusive intent label L' is, for example, the intent label L representing the user intent of "making an outgoing call", however, which user intent intent label L is set as the exclusive intent label L' can be freely set according to different considerations and needs, and, in different embodiments, multiple of the intent labels L can be set as multiple exclusive intent labels L' with priorities among each other, and the example is not limited to the present embodiment.

在本實施例中,對於該儲存單元12所儲存的該等控制程序P,該等控制程序P是用於供該處理單元11與該使用端裝置5的處理模組51透過網路進行通訊,藉此透過該處理模組51對該使用端裝置5以不同的方式進行控制。更明確地說,每一控制程序P在本實施例中是用於使該使用端裝置5受該處理單元11控制地以一種特定的方式輸出特定的資料,例如以播放的方式輸出聲音資料、以顯示的方式輸出文字及/或影像資料,或者是以有線/無線通訊的方式輸出控制指令資料至同一客房的空調、電話、電視及燈具等電子設備以控制其運作。進一步地,在本實施例中,該等控制程序P與該語言處理模型M所包括的該等意圖標籤L之間存在對應關係,更具體地說,該語言處理模型M中的每一意圖標籤L至少對應於該等控制程序P的其中一者。舉一例來說,代表「設定鬧鐘」的該意圖標籤L例如是對應於該等控制程序P中用來使該使用端裝置5對其本身內建之鬧鈴功能進行設定的其中一個控制程序P。舉一例來說,代表「撥出電話」的該意圖標籤L例如是對應於該等控制程序P中用來使該使用端裝置5將一語音通話請求輸出至一通話裝置的其中另一個控制程序P。In this embodiment, the control programs P stored in the storage unit 12 are used for the processing unit 11 to communicate with the processing module 51 of the user device 5 through the network, so as to control the user device 5 in different ways through the processing module 51. More specifically, each control program P in this embodiment is used to make the user device 5 output specific data in a specific way under the control of the processing unit 11, such as outputting sound data in a playing way, outputting text and/or image data in a display way, or outputting control command data to the air conditioner, telephone, television, lamp and other electronic equipment in the same guest room in a wired/wireless communication way to control their operation. Furthermore, in the present embodiment, there is a correspondence relationship between the control programs P and the intention labels L included in the language processing model M. More specifically, each intention label L in the language processing model M corresponds to at least one of the control programs P. For example, the intention label L representing "set alarm" corresponds to one of the control programs P used to enable the user device 5 to set its own built-in alarm function. For example, the intention label L representing "make an outgoing call" corresponds to another control program P used to enable the user device 5 to output a voice call request to a call device.

配合參閱圖2,以下示例性地詳細說明本實施例的該語言資料處理系統1如何與該使用端裝置5配合,進而實施一語言資料處理方法。With reference to FIG. 2 , the following exemplarily describes in detail how the language data processing system 1 of this embodiment cooperates with the user device 5 to implement a language data processing method.

首先,在步驟S1中,該處理單元11獲得一對應於一語音輸入的語音文字資料。First, in step S1, the processing unit 11 obtains a speech text data corresponding to a speech input.

更具體地說,在本實施例中,該語音輸入是由該使用端裝置5的處理模組51經由該輸入模組52的麥克風所接收到的一串使用者語音信號,換句話說,該語音輸入是使用者所說出的一句話。進一步地,該使用端裝置5的處理模組51在接收到該語音輸入時,會即時地利用語音轉文字技術對該語音輸入進行處理,以產生與該語音輸入內容相符的該語音文字資料,並將該語音文字資料傳送至該語言資料處理系統1的處理單元11。然而,在不同的實施例中,該使用端裝置5的處理模組51也可以是在接收到該語音輸入時即時地將該語音輸入傳送至該處理單元11,並且由該處理單元11對該語音輸入進行語音轉文字的處理來產生該語音文字資料。所以,該處理單元11獲得該語音文字資料的方式並不以本實施例為限。More specifically, in this embodiment, the voice input is a series of user voice signals received by the processing module 51 of the user device 5 via the microphone of the input module 52. In other words, the voice input is a sentence spoken by the user. Furthermore, when the processing module 51 of the user device 5 receives the voice input, it will immediately process the voice input using voice-to-text technology to generate the voice-to-text data that matches the content of the voice input, and transmit the voice-to-text data to the processing unit 11 of the language data processing system 1. However, in different embodiments, the processing module 51 of the user device 5 may also transmit the voice input to the processing unit 11 immediately upon receiving the voice input, and the processing unit 11 performs voice-to-text processing on the voice input to generate the voice-to-text data. Therefore, the method by which the processing unit 11 obtains the voice-to-text data is not limited to this embodiment.

在該處理單元11獲得該語音文字資料後,流程進行至步驟S2。After the processing unit 11 obtains the voice and text data, the process proceeds to step S2.

在步驟S2中,該處理單元11載入並運行該語言處理模型M,並利用該語言處理模型M對該語音文字資料執行一意圖分析處理。在本實施例中,該處理單元11對該語音文字資料執行該意圖分析處理的方式,包含從該語音文字資料中辨識出多個被該語音文字資料所表達出的指定意圖,以及根據該語音文字資料所具有的多個語彙,而將每一指定意圖判定為一明確指定意圖及一模糊指定意圖的其中一者。其中,所述的「語彙」泛指單一個字以及由多個字組成的詞。另一方面,該處理單元11是根據該語言處理模型M中的該等意圖標籤L來對該語音文字資料進行意圖辨識,所以,該處理單元11所辨識出的每一指定意圖是與該等意圖標籤L中之其中一個匹配的意圖標籤L彼此相對應。In step S2, the processing unit 11 loads and runs the language processing model M, and uses the language processing model M to perform an intention analysis process on the speech and text data. In this embodiment, the processing unit 11 performs the intention analysis process on the speech and text data in a manner that includes identifying multiple designated intentions expressed by the speech and text data from the speech and text data, and determining each designated intention as one of a clear designated intention and a fuzzy designated intention based on multiple vocabularies possessed by the speech and text data. The "vocabulary" generally refers to a single character and a word composed of multiple characters. On the other hand, the processing unit 11 performs intent recognition on the speech data according to the intent labels L in the language processing model M, so each specified intent recognized by the processing unit 11 corresponds to a matching intent label L among the intent labels L.

更詳細地說,對於該處理單元11所辨識出的每一指定意圖,該處理單元11將該指定意圖判定為該明確指定意圖或該模糊指定意圖的方式,是判斷該語音文字資料所具有的該等語彙中,是否存在一或多個語意上與該指定意圖相匹配,而能供該處理單元11據以執行該指定意圖所對應之該意圖標籤L所對應之該控制程序P的關鍵語彙。若該處理單元11判斷出該語音文字資料中存在與該指定意圖在語意上匹配的一或多個關鍵語彙,以致該處理單元11能根據該(等)關鍵語彙執行該指定意圖所對應的控制程序P,該處理單元11便會將該指定意圖判定為該明確指定意圖。反之,若該處理單元11判斷出該語音文字資料中不存在與該指定意圖語意匹配的關鍵語彙,以致該處理單元11無法執行該指定意圖所對應的控制程序P,該處理單元11則會將該指定意圖判定為該模糊指定意圖。To be more specific, for each designated intent identified by the processing unit 11, the processing unit 11 determines whether the designated intent is the explicit designated intent or the fuzzy designated intent by determining whether there are one or more key words in the vocabularies of the speech and text data that semantically match the designated intent and enable the processing unit 11 to execute the control program P corresponding to the intent label L corresponding to the designated intent. If the processing unit 11 determines that there are one or more key words in the speech data that semantically match the designated intent, so that the processing unit 11 can execute the control program P corresponding to the designated intent according to the key words, the processing unit 11 will determine the designated intent as the explicit designated intent. On the contrary, if the processing unit 11 determines that there are no key words in the speech data that semantically match the designated intent, so that the processing unit 11 cannot execute the control program P corresponding to the designated intent, the processing unit 11 will determine the designated intent as the ambiguous designated intent.

舉一例來說,假設該處理單元11所辨識出的其中一指定意圖是對應於代表「撥出電話」的意圖標籤L,則該處理單元11便會判斷該語音文字資料的該等語彙中是否有其中任一個語彙是指示出一個具體的致電對象(例如同一住宿設施的其他房號、特定的商店、機構或者電話號碼),若有,該處理單元11便會將指示出致電對象的該語彙作為與該其中一指定意圖匹配的關鍵語彙,並將該其中一指定意圖關鍵語彙判定為明確指定意圖,若無,該處理單元11則會判定該語音文字資料中不存在與該其中一指定意圖匹配的語彙,並將該其中一指定意圖關鍵語彙判定為模糊指定意圖。For example, assuming that one of the designated intents identified by the processing unit 11 corresponds to the intent label L representing "making an outgoing call", the processing unit 11 will determine whether any of the vocabularies in the voice-text data indicates a specific calling party (such as other room numbers in the same accommodation facility, a specific store, institution or telephone number). If so, the processing unit 11 will use the vocabulary indicating the calling party as a key vocabulary matching one of the designated intents, and determine the key vocabulary of one of the designated intents to be a clear designated intent. If not, the processing unit 11 will determine that there is no vocabulary matching one of the designated intents in the voice-text data, and determine the key vocabulary of one of the designated intents to be an ambiguous designated intent.

舉另一例來說,假設該處理單元11所辨識出的其中另一指定意圖是對應於代表「設定鬧鐘」的意圖標籤L,則該處理單元11便會判斷該語音文字資料的該等語彙中是否有其中任一個語彙是指示出一個具體的鬧鈴時間(例如「上午八點」或「一小時之後」),若有,該處理單元11便會將指示出鬧鈴時間的該語彙作為與該其中另一指定意圖匹配的關鍵語彙,並將該其中另一指定意圖關鍵語彙判定為明確指定意圖,若無,該處理單元11則會判定該語音文字資料中不存在與該其中另一指定意圖匹配的語彙,並將該其中另一指定意圖關鍵語彙判定為模糊指定意圖。For another example, assuming that one of the other specified intents identified by the processing unit 11 corresponds to the intent label L representing "set an alarm", the processing unit 11 will determine whether any of the words in the speech data indicates a specific alarm time (such as "8:00 a.m." or "one hour later"). If so, the processing unit 11 The vocabulary indicating the alarm time will be used as the key vocabulary matching the other specified intention, and the key vocabulary of the other specified intention will be determined as a clear specified intention. If not, the processing unit 11 will determine that there is no vocabulary matching the other specified intention in the voice and text data, and the key vocabulary of the other specified intention will be determined as a fuzzy specified intention.

補充說明的是,在本實施例中,每一意圖標籤L包括一必要資訊特徵,而且,該必要資訊特徵是用來指示出該意圖標籤L所代表之使用者意圖的必要資訊的語意特徵。以代表「撥出電話」之使用者意圖的該意圖標籤L。舉例來說,「具體的致電對象」是「撥出電話」之使用者意圖的必要資訊,所以,該意圖標籤L的必要資訊特徵便是用來指示出「具體的致電對象」的語意特徵。藉此,透過運行該語言處理模型M,該處理單元11便能根據每一指定意圖所對應之意圖標籤L的必要資訊特徵,來判斷該語音文字資料中是否存在匹配於必要資訊特徵的關鍵語彙,進而判定該指定意圖是屬於明確指定意圖還是模糊指定意圖。然而,在不同的實施例中,將每一指定意圖判定為明確指定意圖或者模糊指定意圖的功能,也可以是藉由讓該語言處理模型M進行機器學習來達成,而並不以本實施例為限。It is to be noted that, in the present embodiment, each intention label L includes a necessary information feature, and the necessary information feature is a semantic feature used to indicate the necessary information of the user intention represented by the intention label L. For example, the intention label L represents the user intention of "making an outgoing call". For example, "a specific person to call" is the necessary information of the user intention of "making an outgoing call", so the necessary information feature of the intention label L is a semantic feature used to indicate the "specific person to call". Thus, by running the language processing model M, the processing unit 11 can determine whether there is a key word matching the necessary information feature in the speech data according to the necessary information feature of the intention label L corresponding to each designated intention, and further determine whether the designated intention is a clear designated intention or a fuzzy designated intention. However, in different embodiments, the function of determining each designated intention as a clear designated intention or a fuzzy designated intention can also be achieved by allowing the language processing model M to perform machine learning, and is not limited to this embodiment.

在該處理單元11將每一指定意圖判定為明確指定意圖或模糊指定意圖之後,流程進行至步驟S3。After the processing unit 11 determines each designated intention as a clear designated intention or a fuzzy designated intention, the process proceeds to step S3.

在步驟S3中,該處理單元11判斷該等指定意圖中是否有其中任一者屬於模糊指定意圖,亦即判斷該等指定意圖中是否存在任一模糊指定意圖。若該處理單元11判斷出該等指定意圖中有其中K個指定意圖屬於模糊指定意圖(K為大於等於1的整數),亦即存在模糊指定意圖,流程進行至步驟S4。另一方面,若該處理單元11判斷出該等指定意圖中並未有任何一者屬於模糊指定意圖,亦即不存在任何模糊指定意圖,流程則進行至步驟S6。In step S3, the processing unit 11 determines whether any of the specified intentions belongs to the fuzzy specified intention, that is, whether any fuzzy specified intention exists among the specified intentions. If the processing unit 11 determines that K of the specified intentions belong to the fuzzy specified intention (K is an integer greater than or equal to 1), that is, there is a fuzzy specified intention, the process proceeds to step S4. On the other hand, if the processing unit 11 determines that none of the specified intentions belongs to the fuzzy specified intention, that is, there is no fuzzy specified intention, the process proceeds to step S6.

在接續於步驟S3之後的步驟S4中,在該等指定意圖中存在該K個模糊指定意圖的情況下,對於該K個模糊指定意圖的其中至少一個模糊指定意圖(後稱「該模糊指定意圖」),該處理單元11利用該語言處理模型M產生一對應於該模糊指定意圖且用於引導使用者以口語補充說明的詢問訊息,並將該詢問訊息傳送至該使用端裝置5的處理模組51,以使該處理模組51將該詢問訊息透過該輸出模組53的揚聲器輸出。舉一例來說,若該模糊指定意圖是對應於代表「撥出電話」的意圖標籤L,則該詢問訊息可例如為「請問您要打電話給誰」,舉另一例來說,若該模糊指定意圖是對應於代表「設定鬧鐘」的意圖標籤L,則該詢問訊息可例如為「請問您要設定幾點的鬧鐘」,但並不以此為限。In step S4 following step S3, when there are K fuzzy specified intentions among the specified intentions, for at least one of the K fuzzy specified intentions (hereinafter referred to as "the fuzzy specified intention"), the processing unit 11 uses the language processing model M to generate a query message corresponding to the fuzzy specified intention and used to guide the user to supplement the explanation in spoken language, and transmits the query message to the processing module 51 of the user device 5, so that the processing module 51 outputs the query message through the speaker of the output module 53. For example, if the fuzzy specified intent corresponds to the intent label L representing "making a call", the inquiry message may be, for example, "Who do you want to call?" For another example, if the fuzzy specified intent corresponds to the intent label L representing "setting an alarm", the inquiry message may be, for example, "What time do you want to set the alarm?", but the present invention is not limited thereto.

在該處理單元11使該詢問訊息被該使用端裝置5的輸出模組53輸出之後,流程進行至步驟S5。After the processing unit 11 causes the query message to be output by the output module 53 of the user device 5, the process proceeds to step S5.

在步驟S5中,當該處理單元11在該詢問訊息被該輸出模組53輸出後所起算的一段等待回應期間內(例如該詢問訊息被輸出後的五秒之內)獲得另一筆對應於另一語音輸入的語音文字資料時,該處理單元11判斷該另一語音文字資料的語意是否與該模糊指定意圖匹配,而能供該處理單元11據以執行該模糊指定意圖所對應之該意圖標籤L所對應的該控制程序P。更明確地說,該處理單元11是判斷該另一語音文字資料所具有的另外一或多個語彙中,是否有其中任一個語彙的語意與該模糊指定意圖相匹配,而能被作為對應於該模糊指定意圖的關鍵語彙。In step S5, when the processing unit 11 obtains another voice text data corresponding to another voice input within a waiting response period after the query message is output by the output module 53 (for example, within five seconds after the query message is output), the processing unit 11 determines whether the semantics of the other voice text data matches the fuzzy specified intent, so that the processing unit 11 can execute the control program P corresponding to the intent label L corresponding to the fuzzy specified intent. More specifically, the processing unit 11 determines whether any of the other one or more vocabularies of the other voice text data has a semantic match with the fuzzy specified intent and can be used as a key vocabulary corresponding to the fuzzy specified intent.

若該處理單元11判斷出該另一語音文字資料的語意與該模糊指定意圖匹配,該處理單元11將該模糊指定意圖改判定為另一明確指定意圖,亦即根據該另一語音文字資料中的關鍵語彙而將其轉換為明確指定意圖。另一方面,若該處理單元11判斷出該另一語音文字資料的語意未與該模糊指定意圖匹配,則該理單元11可例如再次產生並傳送用來引導使用者以口語補充說明的又一詢問訊息至該使用端裝置5,以使其被該使用端裝置5的揚聲器輸出。然而,若該處理單元11根據同一模糊指定意圖輸出詢問訊息的次數已達到一被預設好的詢問次數上限門檻值,但仍無法將該模糊指定意圖轉換為明確指定意圖,則該處理單元11不繼續對該模糊指定意圖進行處理,亦即捨棄(忽略)該模糊指定意圖。If the processing unit 11 determines that the semantics of the other voice and text data matches the fuzzy designated intent, the processing unit 11 changes the fuzzy designated intent into another clear designated intent, that is, converts the fuzzy designated intent into a clear designated intent based on the key words in the other voice and text data. On the other hand, if the processing unit 11 determines that the semantics of the other voice and text data does not match the fuzzy designated intent, the processing unit 11 can, for example, generate and transmit another inquiry message to the user device 5 for guiding the user to supplement the explanation in oral language, so that it is output by the speaker of the user device 5. However, if the number of times the processing unit 11 outputs inquiry messages based on the same fuzzy specified intent has reached a preset upper threshold of the number of inquiries, but the fuzzy specified intent is still unable to be converted into a clear specified intent, the processing unit 11 will not continue to process the fuzzy specified intent, that is, it will abandon (ignore) the fuzzy specified intent.

舉一例來說,假設該模糊指定意圖是對應於代表「撥出電話」的該意圖標籤L,且該詢問訊息為「請問您要打電話給誰」,則若該另一語音文字資料為「幫我打給602號房」,則該處理單元11便會判定該另一語音文字資料中的「602號房」在語意上與該模糊指定意圖匹配,從而將該模糊指定意圖改判定為明確指定意圖。舉另一例來說,假設該模糊指定意圖是對應於代表「設定鬧鐘」的該意圖標籤L,且該詢問訊息為「請問您要設定幾點的鬧鐘」,則若該另一語音文字資料為「設個下午五點半好了」,則該處理單元11便會判定該另一語音文字資料中的「下午五點半」在語意上與該模糊指定意圖匹配,從而將該模糊指定意圖改判定為明確指定意圖。For example, assuming that the fuzzy specified intent corresponds to the intent label L representing "making an outgoing call", and the inquiry message is "Who do you want to call?", if the other voice text data is "Call Room 602 for me", the processing unit 11 will determine that "Room 602" in the other voice text data semantically matches the fuzzy specified intent, thereby changing the fuzzy specified intent to a clear specified intent. For another example, assuming that the fuzzy specified intent corresponds to the intent label L representing "set an alarm", and the inquiry message is "What time do you want to set the alarm for?", if the other voice-text data is "Set it to 5:30 p.m.", the processing unit 11 will determine that "5:30 p.m." in the other voice-text data semantically matches the fuzzy specified intent, and thus change the fuzzy specified intent to a clear specified intent.

在該處理單元11決定是否將該模糊指定意圖改判定為明確指定意圖之後,流程進行至步驟S6。After the processing unit 11 determines whether to change the fuzzy designation intention to a clear designation intention, the process proceeds to step S6.

在步驟S6中,該處理單元11利用該語言處理模型M對所有該等明確指定意圖執行一衝突排除處理及一篩選處理,以從該等明確指定意圖中決定出其中N個目標指定意圖。在本實施例中,該處理單元11可例如是先執行該衝突排除處理,再執行該篩選處理,但並不以此為限。此外,N為大於等於1的整數,也就是說,該處理單元11所決定出的目標指定意圖可以是單一個或多個。In step S6, the processing unit 11 uses the language processing model M to perform a conflict elimination process and a screening process on all the explicit designation intentions to determine N target designation intentions from the explicit designation intentions. In this embodiment, the processing unit 11 may, for example, first perform the conflict elimination process and then perform the screening process, but is not limited thereto. In addition, N is an integer greater than or equal to 1, that is, the target designation intention determined by the processing unit 11 may be a single one or multiple ones.

在本實施例中,該處理單元11利用該語言處理模型M對該等明確指定意圖執行該衝突排除處理的方式,是判斷該等明確指定意圖中是否存在其中多個彼此衝突的明確指定意圖,並且,若判定該等明確指定意圖中確實有其中多者彼此衝突,該處理單元11將彼此衝突的該等明確指定意圖共同作為一群衝突意圖。接著,在判斷出該等明確指定意圖中存在一或多群衝突意圖的情況下,對於每一群衝突意圖,該處理單元11從該群衝突意圖中選出其中單一個明確指定意圖以作為一個候選指定意圖,而相當於將同一群衝突意圖中除了該候選指定意圖以外的其他明確指定意圖捨棄。而且,在本實施例的一種實施態樣中,該處理單元11是將該群衝突意圖中最先被該語音文字資料所表達出的該明確指定意圖作為該候選指定意圖,而相當於將該群衝突意圖中較慢被該語音文字資料表達出的其他明確指定意圖捨棄,但並不以此為限。In this embodiment, the processing unit 11 uses the language processing model M to perform the conflict elimination processing on the explicitly specified intentions, which is to determine whether there are multiple explicitly specified intentions among the explicitly specified intentions that conflict with each other, and if it is determined that there are indeed multiple explicitly specified intentions among the explicitly specified intentions that conflict with each other, the processing unit 11 will treat the explicitly specified intentions that conflict with each other as a group of conflicting intentions. Next, when it is determined that there are one or more groups of conflicting intents in the explicit designated intents, for each group of conflicting intents, the processing unit 11 selects a single explicit designated intent from the group of conflicting intents as a candidate designated intent, which is equivalent to discarding the other explicit designated intents in the same group of conflicting intents except the candidate designated intent. Moreover, in an implementation mode of the present embodiment, the processing unit 11 selects the explicit designated intent in the group of conflicting intents that is first expressed by the voice and text data as the candidate designated intent, which is equivalent to discarding the other explicit designated intents in the group of conflicting intents that are expressed by the voice and text data more slowly, but the present invention is not limited thereto.

特別說明的是,前述的「其中多個彼此衝突的明確指定意圖」是代表該等明確指定意圖在語意上存在矛盾,或是該等明確指定意圖所對應之該等控制程序P不適合被該處理單元11同時執行。其中,該等明確指定意圖在語意上是否存在矛盾可利用該語言處理模型M的自然語言理解功能來實現,而哪些控制程序P不適合被同時執行則是被預先設定好的。例如,可以將該等控制程序P中用來控制該使用端裝置5播放聲音的其中多者共同設定為不適合被該處理單元11同時執行的控制程序P。It is particularly noted that the aforementioned "multiple conflicting explicit intents" means that the explicit intents are semantically contradictory, or the control programs P corresponding to the explicit intents are not suitable for simultaneous execution by the processing unit 11. Whether the explicit intents are semantically contradictory can be realized by using the natural language understanding function of the language processing model M, and which control programs P are not suitable for simultaneous execution are pre-set. For example, multiple of the control programs P used to control the user end device 5 to play sound can be collectively set as control programs P that are not suitable for simultaneous execution by the processing unit 11.

舉一例來說,假設該語音文字資料是「音樂大聲一點,然後聲音小一點,不要再放歌了」,則該處理單元11會從該語音文字資料中辨識出「提高音樂音量」、「降低音樂音量」以及「停止播放音樂」等三個在語意上彼此矛盾的明確指定意圖,並且,由於該語音文字資料最先表達出的是「提高音樂音量」的語意,所以該處理單元11會將代表「提高音樂音量」的該明確指定意圖作為候選指定意圖,而捨棄掉較慢被表達出的「降低音樂音量」及「停止播放音樂」等另外兩個明確指定意圖。For example, assuming that the voice text data is "play the music louder, then lower the volume, and stop playing songs", the processing unit 11 will identify three semantically contradictory explicit intents, namely "increase the volume of music", "lower the volume of music", and "stop playing music" from the voice text data. Moreover, since the voice text data first expresses the meaning of "increase the volume of music", the processing unit 11 will use the explicit intent representing "increase the volume of music" as a candidate intent, and discard the other two explicit intents, namely "lower the volume of music" and "stop playing music", which are expressed more slowly.

舉另一例來說,假設該語音文字資料是「幫我播放新聞還有音樂」,則該處理單元11會從該語音文字資料中辨識出「播放新聞」及「播放音樂」等兩個明確指定意圖。其中,該兩明確指定意圖皆是用來指示該使用端裝置5播放聲音,但基於收聽體驗的考量,該使用端裝置5並不適合以「同時播放新聞及音樂」的方式運作,換言之,該兩明確指定意圖所分別對應的該兩控制程序P不適合被該處理單元11同時執行。因此,在此例中,該處理單元11會判定「播放新聞」及「播放音樂」的該兩明確指定意圖彼此衝突,而將其兩者共同作為一群衝突意圖,而且,該處理單元11例如會將先被表達出之「播放新聞」的該明確指定意圖作為候選指定意圖,而捨棄掉較慢被表達出之「播放音樂」的明確指定意圖。For another example, assuming that the voice text data is "Help me play news and music", the processing unit 11 will identify two explicit designated intents, namely "play news" and "play music", from the voice text data. The two explicit designated intents are used to instruct the user end device 5 to play sound, but based on the consideration of listening experience, the user end device 5 is not suitable to operate in the mode of "playing news and music at the same time". In other words, the two control programs P corresponding to the two explicit designated intents are not suitable to be executed by the processing unit 11 at the same time. Therefore, in this example, the processing unit 11 will determine that the two explicitly specified intentions of "play news" and "play music" conflict with each other, and treat the two together as a group of conflicting intentions. In addition, the processing unit 11 will, for example, use the explicitly specified intention of "play news" expressed first as a candidate specified intention, and discard the explicitly specified intention of "play music" expressed more slowly.

在該處理單元11從每一群衝突意圖中選出候選指定意圖後,該衝突排除處理執行完畢。After the processing unit 11 selects a candidate designated intent from each group of conflicting intents, the conflict elimination process is completed.

在本實施例中,該處理單元11執行該篩選處理的方式,是判斷該(等)候選指定意圖中,是否有其中任一個候選指定意圖所對應的意圖標籤L為該排他意圖標籤L’,若判斷結果為是,該處理單元11便將對應於該排他意圖標籤L’的該候選指定意圖作為唯一的一個目標指定意圖,而相當於將非對應於該排他意圖標籤L’的其他候選指定意圖捨棄,而若判斷結果為否(即未有任一候選指定意圖是對應於該排他意圖標籤L’),該處理單元11則將所有候選指定意圖皆作為目標指定意圖。舉一例來說,假設該排他意圖標籤L’是代表「撥出電話」的使用者意圖,並假設該語音文字資料是「幫我打電話給櫃台,順便打開電視跟冷氣」,在此情況下,該處理單元11會從該語音文字資料中辨識出「撥出電話」、「開啟電視」以及「開啟空調」等三個明確指定意圖,並且,藉由對該三個明確指定意圖執行該篩選處理,該處理單元11僅會將代表「撥出電話」的該明確指定意圖決定為唯一的一個目標指定意圖,而將代表「開啟電視」以及「開啟空調」的另外兩個明確指定意圖捨棄。In this embodiment, the processing unit 11 performs the screening process by determining whether any of the candidate designated intents has an intent label L corresponding to the exclusive intent label L’. If the determination result is yes, the processing unit 11 will use the candidate designated intent corresponding to the exclusive intent label L’ as the only target designated intent, which is equivalent to discarding other candidate designated intents that do not correspond to the exclusive intent label L’. If the determination result is no (i.e., no candidate designated intent corresponds to the exclusive intent label L’), the processing unit 11 will use all candidate designated intents as target designated intents. For example, assuming that the exclusive intent label L' represents the user intent of "making an outgoing call", and assuming that the voice text data is "help me call the counter and turn on the TV and air conditioner", in this case, the processing unit 11 will identify three explicitly specified intents, namely "making an outgoing call", "turning on the TV" and "turning on the air conditioner" from the voice text data, and, by performing the filtering process on the three explicitly specified intents, the processing unit 11 will only determine the explicitly specified intent representing "making an outgoing call" as the only target specified intent, and discard the other two explicitly specified intents representing "turning on the TV" and "turning on the air conditioner".

補充說明的是,在另一種實施例中,該處理單元11在步驟S6也可以只對該等明確指定意圖執行該衝突排除處理,而不執行該篩選處理。並且,在所述的實施例中,若該等明確指定意圖中存在一或多群衝突意圖,該處理單元11是直接從每一群衝突意圖中選出其中單一個明確指定意圖來作為目標指定意圖,而若該等明確指定意圖並未彼此衝突,則該處理單元11直接將該等明確指定意圖分別作為該N個目標指定意圖。It is to be noted that, in another embodiment, the processing unit 11 may only perform the conflict elimination process on the explicitly specified intents in step S6, without performing the screening process. Furthermore, in the embodiment, if there are one or more groups of conflicting intents among the explicitly specified intents, the processing unit 11 directly selects a single explicitly specified intent from each group of conflicting intents as the target specified intent, and if the explicitly specified intents do not conflict with each other, the processing unit 11 directly uses the explicitly specified intents as the N target specified intents.

在該處理單元11從該等明確指定意圖中決定出該N個目標指定意圖後,流程進行至步驟S7。After the processing unit 11 determines the N target designated intentions from the explicit designated intentions, the process proceeds to step S7.

在步驟S7中,該處理單元11執行每一目標指定意圖所對應之該意圖標籤L所對應的該控制程序P。值得一提的是,每一目標指定意圖必定是屬於明確指定意圖,所以,若在步驟S5被執行之後仍有模糊指定意圖存在,該處理單元11相當於是將模糊指定意圖捨棄(忽略),且不影響對明確指定意圖的處理。In step S7, the processing unit 11 executes the control program P corresponding to the intent label L corresponding to each target designation intent. It is worth mentioning that each target designation intent must be a clear designation intent, so if there is still a fuzzy designation intent after step S5 is executed, the processing unit 11 is equivalent to abandoning (ignoring) the fuzzy designation intent, and it does not affect the processing of the clear designation intent.

另外,在本實施例中,由於該語言處理模型M中的該等意圖標籤L存在順序性,所以,在該處理單元11所決定出之目標指定意圖的數量為多個的情況下(即N大於1的情況),該處理單元11在本實施例中是根據該等目標指定意圖所分別對應之該等意圖標籤L之間的順序,而依序地逐一執行該等目標指定意圖所對應之該等意圖標籤L所對應的該等控制程序P,但並不以此為限。In addition, in the present embodiment, since the intention labels L in the language processing model M are sequential, when the number of target designation intentions determined by the processing unit 11 is multiple (i.e., N is greater than 1), the processing unit 11 in the present embodiment executes the control programs P corresponding to the intention labels L corresponding to the target designation intentions one by one in sequence according to the order between the intention labels L respectively corresponding to the target designation intentions, but is not limited to this.

舉例來說,假設該語音文字資料為「放鐵達尼號主題曲,然後幫我查明天的天氣,還有把大燈打開」,則該處理單元11例如會根據該語音文字資料決定出「播放音樂」、「查詢天氣資料」及「開啟臥室燈」等三個目標指定意圖,並且,根據該三個目標指定意圖所分別對應之其中三個意圖標籤L之間的順序,該處理單元11例如是先執行對應於「開啟臥室燈」之目標指定意圖所對應的控制程序P,以透過該使用端裝置5的處理模組51控制客房的燈具發光,接著,該處理單元11再執行對應於「查詢天氣資料」之目標指定意圖所對應的控制程序P,以使該使用端裝置5的處理模組51控制該輸出模組53的顯示器顯示天氣資料供使用者參考。最後,該處理單元11再執行對應於「播放音樂」之目標指定意圖所對應的控制程序P,以使該使用端裝置5的處理模組51控制該輸出模組53的揚聲器播放一個音樂媒體檔案。For example, assuming that the voice text data is "Play the theme song of Titanic, then help me check tomorrow's weather, and turn on the headlights", the processing unit 11 will, for example, determine three target-specified intents, namely "play music", "check weather data", and "turn on bedroom lights" based on the voice text data, and, based on the order of the three intent labels L corresponding to the three target-specified intents, the processing unit 11 For example, the control program P corresponding to the target designation intent corresponding to "turn on the bedroom light" is first executed to control the lighting of the guest room lamp through the processing module 51 of the user end device 5. Then, the processing unit 11 executes the control program P corresponding to the target designation intent corresponding to "query weather data" so that the processing module 51 of the user end device 5 controls the display of the output module 53 to display weather data for the user's reference. Finally, the processing unit 11 executes the control program P corresponding to the target designation intent corresponding to "play music" so that the processing module 51 of the user end device 5 controls the speaker of the output module 53 to play a music media file.

在該處理單元11執行每一目標指定意圖所對應之該意圖標籤L所對應的控制程序P後,本實施例的語言資料處理方法結束。After the processing unit 11 executes the control program P corresponding to the intent label L corresponding to each target designated intent, the language data processing method of this embodiment ends.

以上即為本實施例之語言資料處理系統1如何實施該語言資料處理方法的示例說明。The above is an example explanation of how the language data processing system 1 of this embodiment implements the language data processing method.

補充說明的是,若該處理單元11在步驟S2中所判定之模糊指定意圖的數量為多個,則該處理單元11在步驟S4中可例如是針對該等模糊指定意圖中最先被該語音文字資料所表達出的該模糊指定意圖來產生並輸出該詢問訊息,再於步驟S5中判斷是否將其改判定為明確指定意圖。然而,在不同的實施態樣中,該處理單元11在步驟S4中也可以是根據該等模糊指定意圖所對應之該等意圖標籤L之間的順序,而針對所對應之意圖標籤L順序最優先的該模糊指定意圖來產生並輸出該詢問訊息。另一方面,若該處理單元11在步驟S2中所判定之模糊指定意圖的數量為多個,該處理單元11可以是如圖2所示地只針對單一個模糊指定意圖來產生並輸出詢問訊息以判斷是否將其改判定為明確指定意圖,然而,在不同的實施例中,該處理單元11也可以重覆地執行步驟S4及步驟S5多次,以針對所有的模糊指定意圖,或者是其中一部分的多個模糊指定意圖來產生並輸出詢問訊息,並判斷是否將其改判定為明確指定意圖。It is to be noted that if the number of fuzzy designated intentions determined by the processing unit 11 in step S2 is multiple, the processing unit 11 may, for example, generate and output the query message for the fuzzy designated intention that is first expressed by the voice and text data among the fuzzy designated intentions in step S4, and then determine in step S5 whether to change it to a clear designated intention. However, in different implementations, the processing unit 11 may also generate and output the query message for the fuzzy designated intention whose corresponding intention label L has the highest order according to the order between the intention labels L corresponding to the fuzzy designated intentions in step S4. On the other hand, if the number of fuzzy designated intentions determined by the processing unit 11 in step S2 is multiple, the processing unit 11 may generate and output a query message for only a single fuzzy designated intention as shown in FIG. 2 to determine whether to change it to a clear designated intention. However, in different embodiments, the processing unit 11 may also repeatedly execute steps S4 and S5 multiple times to generate and output a query message for all fuzzy designated intentions, or a part of multiple fuzzy designated intentions, and determine whether to change it to a clear designated intention.

應當理解的是,本實施例的步驟S1至步驟S7及圖2的流程圖僅是用於示例說明本發明語言資料處理方法的其中一種可實施方式。應當理解的是,即便將步驟S1至步驟S7進行合併、拆分或順序調整,若合併、拆分或順序調整之後的流程與本實施例相比係以實質相同的方式達成實質相同的功效,便仍屬於本發明語言資料處理方法的可實施態樣,因此,本實施例的步驟S1至步驟S7及圖2的流程圖並非用於限制本發明的可實施範圍。It should be understood that steps S1 to S7 of this embodiment and the flowchart of FIG2 are only used to illustrate one of the practicable ways of the language data processing method of the present invention. It should be understood that even if steps S1 to S7 are merged, split or adjusted in sequence, if the process after merging, splitting or adjusting in sequence achieves substantially the same effect as that of the present embodiment in substantially the same way, it still belongs to the practicable aspects of the language data processing method of the present invention. Therefore, steps S1 to S7 of this embodiment and the flowchart of FIG2 are not used to limit the practicable scope of the present invention.

本發明還提供了一種電腦程式產品的一實施例,該電腦程式產品為一應用程式,而能被儲存於電腦可讀取紀錄媒體,以及被一電子裝置(例如手機、平板電腦、筆記型電腦及桌上型電腦等)載入並運行,並且,該電腦程式產品包含圖1所示的該語言處理模型M。進一步地,當該電腦程式產品被該電子裝置載入並運行時,該電腦程式產品能使該電子裝置被作為本發明所提供的語言資料處理系統(例如圖1所示的該語言資料處理系統1),進而實施本發明所提供的媒體檔案選擇方法。The present invention also provides an embodiment of a computer program product, which is an application program and can be stored in a computer-readable recording medium and loaded and run by an electronic device (such as a mobile phone, a tablet computer, a laptop computer, and a desktop computer, etc.), and the computer program product includes the language processing model M shown in FIG1. Furthermore, when the computer program product is loaded and run by the electronic device, the computer program product can enable the electronic device to be used as a language data processing system provided by the present invention (such as the language data processing system 1 shown in FIG1), thereby implementing the media file selection method provided by the present invention.

綜上所述,藉由實施該語言資料處理方法,該語言資料處理系統1在獲得該語音文字資料並從其中辨識出該等指定意圖之後,能夠根據該語音文字資料的語彙而將每一指定意圖歸類為明確指定意圖或者模糊指定意圖,並且僅根據該(等)明確指定意圖來執行相關的控制程序P,所以,即便該語音文字資料所表達出的一些指定意圖不夠清楚(即存在模糊指定意圖),該語言資料處理系統1也仍能針對表達足夠清楚的指定意圖(即針對明確指定意圖)進行處理,以滿足使用者的需求。另一方面,對於表達不夠清楚的該(等)模糊指定意圖,本實施例的該語言資料處理系統1能針對該(等)模糊指定意圖產生並輸出對應的詢問訊息來引導使用者以口語補充說明其需求,並根據後續獲得的語音文字資料判斷是否能據以將模糊指定意圖轉換成明確指定意圖,如此一來,即便使用者一開始對某些需求描述得不夠清楚,該語言資料處理系統1也能藉由主動輸出詢問訊息來達成釐清使用者需求的效果。再一方面,藉由執行該衝突排除處理,該語言資料處理系統1能夠對彼此衝突的指定意圖進行主動排除,如此一來,即便使用者表達出的多個需求前後矛盾,該語言資料處理系統1也仍能對其中的至少一個需求進行處理,而不會因為語音文字資料中的語意存在矛盾就無法運作。基於上述,該語言資料處理系統1能夠針對使用者一次性表達出的多個需求進行更加完善、細緻的處理,從而對現有技術提供了顯著的改善,故確實能達成本發明之目的。In summary, by implementing the language data processing method, after the language data processing system 1 obtains the voice and text data and identifies the designated intents therefrom, it can classify each designated intent as a clear designated intent or a fuzzy designated intent according to the vocabulary of the voice and text data, and execute the relevant control program P only according to the (etc.) clear designated intents. Therefore, even if some designated intents expressed by the voice and text data are not clear enough (i.e., there are fuzzy designated intents), the language data processing system 1 can still process the designated intents that are expressed clearly enough (i.e., the clear designated intents) to meet the needs of the user. On the other hand, for the fuzzy designated intentions that are not expressed clearly enough, the language data processing system 1 of the present embodiment can generate and output corresponding inquiry messages for the fuzzy designated intentions to guide the user to supplement his or her needs in spoken language, and judge whether the fuzzy designated intentions can be converted into clear designated intentions based on the subsequently obtained voice and text data. In this way, even if the user does not describe certain needs clearly at the beginning, the language data processing system 1 can achieve the effect of clarifying the user's needs by actively outputting inquiry messages. On the other hand, by executing the conflict elimination process, the language data processing system 1 can actively eliminate conflicting designated intentions. In this way, even if the multiple requirements expressed by the user are contradictory, the language data processing system 1 can still process at least one of the requirements, and will not be unable to operate due to the semantic contradictions in the voice and text data. Based on the above, the language data processing system 1 can perform more complete and detailed processing on multiple requirements expressed by the user at one time, thereby providing a significant improvement on the existing technology, and can indeed achieve the purpose of the present invention.

惟以上所述者,僅為本發明之實施例而已,當不能以此限定本發明實施之範圍,凡是依本發明申請專利範圍及專利說明書內容所作之簡單的等效變化與修飾,皆仍屬本發明專利涵蓋之範圍內。However, the above is only an example of the implementation of the present invention, and it cannot be used to limit the scope of the implementation of the present invention. All simple equivalent changes and modifications made according to the scope of the patent application of the present invention and the content of the patent specification are still within the scope of the patent of the present invention.

1:語言資料處理系統 11:處理單元 12:儲存單元 M:語言處理模型 L:意圖標籤 L’:排他意圖標籤 P:控制程序 5:使用端裝置 51:處理模組 52:輸入模組 53:輸出模組 S1~S7:步驟 1: Language data processing system 11: Processing unit 12: Storage unit M: Language processing model L: Intent label L’: Exclusive intent label P: Control program 5: User end device 51: Processing module 52: Input module 53: Output module S1~S7: Steps

本發明之其他的特徵及功效,將於參照圖式的實施方式中清楚地呈現,其中: 圖1是一方塊示意圖,示例性地表示本發明語言資料處理系統的一實施例,以及一適合與該實施例配合的使用端裝置;及 圖2是一流程圖,用於示例性地說明該實施例如何與該使用端裝置配合地實施一語言資料處理方法。 Other features and functions of the present invention will be clearly presented in the implementation method with reference to the drawings, wherein: FIG. 1 is a block diagram, exemplarily showing an implementation of the language data processing system of the present invention, and a user end device suitable for cooperating with the implementation; and FIG. 2 is a flow chart, used to exemplarily illustrate how the implementation and the user end device cooperate to implement a language data processing method.

S1~S7:步驟 S1~S7: Steps

Claims (13)

一種語言資料處理系統,包含: 一處理單元;及 一儲存單元,電連接該處理單元,且儲存有一利用機器學習技術實現的語言處理模型,該語言處理模型包括多個意圖標籤,且每一意圖標籤對應於一相關於該處理單元之運作方式的控制程序; 其中,該處理單元用於: 利用該語言處理模型從一語音文字資料中辨識出多個被該語音文字資料所表達出的指定意圖,並根據該語音文字資料中的語彙將每一指定意圖判定為一明確指定意圖及一模糊指定意圖的其中一者,其中,每一指定意圖與該等意圖標籤中的其中一個意圖標籤相對應;及 從該等明確指定意圖中決定出其中N個目標指定意圖,並且執行每一目標指定意圖所對應之該意圖標籤所對應的該控制程序,其中,N為大於等於1的整數。 A language data processing system comprises: a processing unit; and a storage unit electrically connected to the processing unit and storing a language processing model implemented by machine learning technology, wherein the language processing model comprises a plurality of intention labels, and each intention label corresponds to a control program related to the operation mode of the processing unit; wherein the processing unit is used to: identify a plurality of designated intentions expressed by a speech data using the language processing model, and determine each designated intention as one of a clear designated intention and a fuzzy designated intention according to the vocabulary in the speech data, wherein each designated intention corresponds to one of the intention labels; and Determine N target specified intentions from the explicit specified intentions, and execute the control program corresponding to the intention label corresponding to each target specified intention, where N is an integer greater than or equal to 1. 如請求項1所述的語言資料處理系統,其中,對於每一指定意圖,該處理單元將該指定意圖判定為該明確指定意圖或該模糊指定意圖的方式包含:該處理單元判斷該語音文字資料中是否存在一或多個能供該處理單元據以執行與該指定意圖相關之該控制程序的關鍵語彙,若該處理單元判斷出該語音文字資料中存在該(等)關鍵語彙,該處理單元將該指定意圖判定為該明確指定意圖,若該處理單元判斷出該語音文字資料中不存在關鍵語彙,該處理單元將該指定意圖判定為該模糊指定意圖。A language data processing system as described in claim 1, wherein, for each designated intent, the processing unit determines the designated intent as the clear designated intent or the fuzzy designated intent in a manner that includes: the processing unit determines whether there are one or more key words in the speech and text data that can be used by the processing unit to execute the control program related to the designated intent; if the processing unit determines that the speech and text data contains the key words, the processing unit determines the designated intent as the clear designated intent; if the processing unit determines that the key words do not exist in the speech and text data, the processing unit determines the designated intent as the fuzzy designated intent. 如請求項1所述的語言資料處理系統,其中,在該處理單元將每一指定意圖判定為該明確指定意圖或該模糊指定意圖之後,該處理單元還用於: 在該等指定意圖的其中一或多個指定意圖被該處理單元判定為該模糊指定意圖的情況下,對於該(等)模糊指定意圖的其中至少一個模糊指定意圖,利用該語言處理模型產生一對應於該模糊指定意圖的詢問訊息,並使該詢問訊息被一輸出模組輸出;及 在使該詢問訊息被該輸出模組輸出之後,當獲得另一對應於另一語音輸入的語音文字資料時,判斷該另一語音文字資料的語意是否與該模糊指定意圖匹配,而能供該處理單元據以執行與該模糊指定意圖相關的該控制程序,並且,在判斷出該另一語音文字資料的語意與該模糊指定意圖匹配的情況下,將該模糊指定意圖改判定為一明確指定意圖。 A language data processing system as described in claim 1, wherein, after the processing unit determines each designated intent as the explicit designated intent or the fuzzy designated intent, the processing unit is further used to: When one or more designated intents of the designated intents are determined by the processing unit as the fuzzy designated intent, for at least one of the fuzzy designated intents, use the language processing model to generate a query message corresponding to the fuzzy designated intent, and output the query message by an output module; and After the query message is output by the output module, when another voice-text data corresponding to another voice input is obtained, it is determined whether the semantics of the other voice-text data matches the fuzzy specified intent, so that the processing unit can execute the control program related to the fuzzy specified intent, and, if it is determined that the semantics of the other voice-text data matches the fuzzy specified intent, the fuzzy specified intent is changed to a clear specified intent. 如請求項1所述的語言資料處理系統,其中,該處理單元從該等明確指定意圖中決定出該N個目標指定意圖的方式包含:該處理單元判斷該等明確指定意圖中是否存在其中多個彼此衝突而被共同作為一群衝突意圖的明確指定意圖,並且,在判斷出該等明確指定意圖中存在該群衝突意圖的情況下,該處理單元僅將該群衝突意圖的其中單一個明確指定意圖作為該N個目標指定意圖的其中一個目標指定意圖。A language data processing system as described in claim 1, wherein the processing unit determines the N target specified intentions from the explicit specified intentions in a manner that includes: the processing unit determines whether there are multiple explicit specified intentions among the explicit specified intentions that conflict with each other and are collectively regarded as a group of conflicting intentions, and, when it is determined that the group of conflicting intentions exists among the explicit specified intentions, the processing unit only uses a single explicit specified intention of the group of conflicting intentions as one of the target specified intentions of the N target specified intentions. 如請求項1所述的語言資料處理系統,其中,該等意圖標籤的其中一個意圖標籤被設定為一排他意圖標籤,並且,該處理單元從該等明確指定意圖中決定出該N個目標指定意圖的方式包含:該處理單元判斷該等明確指定意圖中是否有其中一個明確指定意圖所對應的意圖標籤為該排他意圖標籤,若判斷結果為是,該處理單元將對應於該排他意圖標籤的該明確指定意圖作為唯一一個目標指定意圖。A language data processing system as described in claim 1, wherein one of the intent labels is set as an exclusive intent label, and the processing unit determines the N target specified intentions from the explicitly specified intentions in a manner comprising: the processing unit determines whether the intent label corresponding to one of the explicitly specified intentions is the exclusive intent label, and if the determination result is yes, the processing unit uses the explicitly specified intention corresponding to the exclusive intent label as the only target specified intention. 如請求項1所述的語言資料處理系統,其中,該等意圖標籤之間存在順序性,並且,在該處理單元所決定出之目標指定意圖的數量為多個的情況下,該處理單元是根據該等目標指定意圖所分別對應之該等意圖標籤之間的順序,而依序地逐一執行該等目標指定意圖所對應之該等意圖標籤所對應的該等控制程序。A language data processing system as described in claim 1, wherein there is an order between the intent labels, and when the number of target designated intentions determined by the processing unit is multiple, the processing unit executes the control programs corresponding to the intent labels corresponding to the target designated intentions one by one in sequence according to the order between the intent labels respectively corresponding to the target designated intentions. 一種語言資料處理方法,由一語言資料處理系統實施,該語言資料處理系統包含一處理單元及一電連接該處理單元的儲存單元,該儲存單元儲存有一利用機器學習技術實現的語言處理模型,該語言處理模型包括多個意圖標籤,且每一意圖標籤對應於一相關於該處理單元之運作方式的控制程序;該語言資料處理方法包含: (A)該處理單元利用該語言處理模型從一語音文字資料中辨識出多個被該語音文字資料所表達出的指定意圖,並根據該語音文字資料中的語彙將每一指定意圖判定為一明確指定意圖及一模糊指定意圖的其中一者,其中,每一指定意圖與該等意圖標籤中的其中一個意圖標籤相對應;及 (B)該處理單元從該等明確指定意圖中決定出其中N個目標指定意圖,並且執行每一目標指定意圖所對應之該意圖標籤所對應的該控制程序,其中,N為大於等於1的整數。 A language data processing method is implemented by a language data processing system, the language data processing system includes a processing unit and a storage unit electrically connected to the processing unit, the storage unit stores a language processing model implemented using machine learning technology, the language processing model includes a plurality of intention labels, and each intention label corresponds to a control program related to the operation mode of the processing unit; the language data processing method includes: (A) The processing unit uses the language processing model to identify multiple designated intents expressed by a speech data, and determines each designated intent as one of a clear designated intent and a fuzzy designated intent according to the vocabulary in the speech data, wherein each designated intent corresponds to one of the intent labels; and (B) The processing unit determines N target designated intents from the clear designated intents, and executes the control program corresponding to the intent label corresponding to each target designated intent, wherein N is an integer greater than or equal to 1. 如請求項7所述的語言資料處理方法,其中,在步驟(A)中,對於每一指定意圖,該處理單元將該指定意圖判定為該明確指定意圖或該模糊指定意圖的方式包含:該處理單元判斷該語音文字資料中是否存在一或多個能供該處理單元據以執行與該指定意圖相關之該控制程序的關鍵語彙,若該處理單元判斷出該語音文字資料中存在該(等)關鍵語彙,該處理單元將該指定意圖判定為該明確指定意圖,若該處理單元判斷出該語音文字資料中不存在關鍵語彙,該處理單元將該指定意圖判定為該模糊指定意圖。A language data processing method as described in claim 7, wherein, in step (A), for each designated intent, the processing unit determines the designated intent as the clear designated intent or the fuzzy designated intent in a manner that includes: the processing unit determines whether there are one or more key words in the speech and text data that can be used by the processing unit to execute the control program related to the designated intent; if the processing unit determines that the speech and text data contains the key words, the processing unit determines the designated intent as the clear designated intent; if the processing unit determines that the speech and text data does not contain the key words, the processing unit determines the designated intent as the fuzzy designated intent. 如請求項7所述的語言資料處理方法,還包含介於步驟(A)及(B)之間的: (C)在該等指定意圖的其中一或多個指定意圖被該處理單元判定為該模糊指定意圖的情況下,對於該(等)模糊指定意圖的其中至少一個模糊指定意圖,該處理單元利用該語言處理模型產生一對應於該模糊指定意圖的詢問訊息,並使該詢問訊息被一輸出模組輸出;及 (D)在該處理單元使該詢問訊息被該輸出模組輸出之後,當該處理單元獲得另一對應於另一語音輸入的語音文字資料時,該處理單元判斷該另一語音文字資料的語意是否與該模糊指定意圖匹配,而能供該處理單元據以執行與該模糊指定意圖相關的該控制程序,並且,在該處理單元判斷出該另一語音文字資料的語意與該模糊指定意圖匹配的情況下,該處理單元將該模糊指定意圖改判定為一明確指定意圖。 The language data processing method as described in claim 7 further includes between steps (A) and (B): (C) when one or more of the specified intents are determined by the processing unit to be the fuzzy specified intent, for at least one of the fuzzy specified intents, the processing unit generates a query message corresponding to the fuzzy specified intent using the language processing model, and causes the query message to be output by an output module; and (D) After the processing unit causes the query message to be output by the output module, when the processing unit obtains another voice-text data corresponding to another voice input, the processing unit determines whether the semantics of the other voice-text data matches the fuzzy designated intent, so that the processing unit can execute the control program related to the fuzzy designated intent, and when the processing unit determines that the semantics of the other voice-text data matches the fuzzy designated intent, the processing unit changes the fuzzy designated intent into a clear designated intent. 如請求項7所述的語言資料處理方法,其中,在步驟(B)中,該處理單元從該等明確指定意圖中決定出該N個目標指定意圖的方式包含:該處理單元判斷該等明確指定意圖中是否存在其中多個彼此衝突而被共同作為一群衝突意圖的明確指定意圖,並且,在判斷出該等明確指定意圖中存在該群衝突意圖的情況下,該處理單元僅將該群衝突意圖的其中單一個明確指定意圖作為該N個目標指定意圖的其中一個目標指定意圖。A language data processing method as described in claim 7, wherein in step (B), the processing unit determines the N target specified intentions from the explicit specified intentions in a manner that includes: the processing unit determines whether there are multiple explicit specified intentions among the explicit specified intentions that conflict with each other and are collectively regarded as a group of conflicting intentions, and, when it is determined that the group of conflicting intentions exists among the explicit specified intentions, the processing unit only uses a single explicit specified intention of the group of conflicting intentions as one of the target specified intentions of the N target specified intentions. 如請求項7所述的語言資料處理方法,該等意圖標籤的其中一個意圖標籤被設定為一排他意圖標籤;其中,在步驟(B)中,該處理單元從該等明確指定意圖中決定出該N個目標指定意圖的方式包含:該處理單元判斷該等明確指定意圖中是否有其中一個明確指定意圖所對應的意圖標籤為該排他意圖標籤,若判斷結果為是,該處理單元將對應於該排他意圖標籤的該明確指定意圖作為唯一一個目標指定意圖。As described in claim 7, the language data processing method, one of the intent labels is set as an exclusive intent label; wherein, in step (B), the processing unit determines the N target specified intentions from the explicitly specified intentions in a manner that includes: the processing unit determines whether the intent label corresponding to one of the explicitly specified intentions is the exclusive intent label, and if the determination result is yes, the processing unit uses the explicitly specified intent corresponding to the exclusive intent label as the only target specified intention. 如請求項7所述的語言資料處理方法,該等意圖標籤之間存在順序性;其中,在步驟(B)中,在該處理單元所決定出之目標指定意圖的數量為多個的情況下,該處理單元是根據該等目標指定意圖所分別對應之該等意圖標籤之間的順序,而依序地逐一執行該等目標指定意圖所對應之該等意圖標籤所對應的該等控制程序。As described in claim 7, the language data processing method has an order between the intent labels; wherein, in step (B), when the number of target designated intents determined by the processing unit is multiple, the processing unit executes the control programs corresponding to the intent labels corresponding to the target designated intents one by one in sequence according to the order between the intent labels respectively corresponding to the target designated intents. 一種電腦程式產品,包含一利用機器學習技術實現的語言處理模型,其中,該語言處理模型包括多個意圖標籤,且每一意圖標籤對應於一相關於一電子裝置之運作方式的控制程序,該電腦程式產品用於被該電子裝置載入並運行,以使該電子裝置實施如請求項7至12其中任一項所述的語言資料處理方法。A computer program product includes a language processing model implemented using machine learning technology, wherein the language processing model includes multiple intention labels, and each intention label corresponds to a control program related to the operation mode of an electronic device. The computer program product is used to be loaded and run by the electronic device so that the electronic device implements the language data processing method described in any one of claims 7 to 12.
TW111145456A 2022-11-28 2022-11-28 Language data processing system and method and computer program product TWI847393B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023070061A JP2024077568A (en) 2022-11-28 2023-04-21 Language data processing system, language data processing method, and computer program

Publications (2)

Publication Number Publication Date
TW202422535A true TW202422535A (en) 2024-06-01
TWI847393B TWI847393B (en) 2024-07-01

Family

ID=

Similar Documents

Publication Publication Date Title
US11600291B1 (en) Device selection from audio data
US11887590B2 (en) Voice enablement and disablement of speech processing functionality
US11887604B1 (en) Speech interface device with caching component
US20210174802A1 (en) Processing spoken commands to control distributed audio outputs
US11676575B2 (en) On-device learning in a hybrid speech processing system
US20230053350A1 (en) Encapsulating and synchronizing state interactions between devices
US11669300B1 (en) Wake word detection configuration
KR102429436B1 (en) Server for seleting a target device according to a voice input, and controlling the selected target device, and method for operating the same
US11687526B1 (en) Identifying user content
US11093110B1 (en) Messaging feedback mechanism
US11276403B2 (en) Natural language speech processing application selection
US20220358921A1 (en) Speech processing for multiple inputs
US10600419B1 (en) System command processing
US10861453B1 (en) Resource scheduling with voice controlled devices
US20240211206A1 (en) System command processing
US10424292B1 (en) System for recognizing and responding to environmental noises
US11693622B1 (en) Context configurable keywords
TWI847393B (en) Language data processing system and method and computer program product
KR102584324B1 (en) Method for providing of voice recognition service and apparatus thereof
TW202422535A (en) Language data processing system and method and computer program product
US20220161131A1 (en) Systems and devices for controlling network applications
US20220036889A1 (en) Device-specific skill processing
US11893996B1 (en) Supplemental content output
JP2024077568A (en) Language data processing system, language data processing method, and computer program