TWI767498B - Cross-channel artificial intelligence dialogue platform integrating machine learning and operation method thereof - Google Patents

Cross-channel artificial intelligence dialogue platform integrating machine learning and operation method thereof Download PDF

Info

Publication number
TWI767498B
TWI767498B TW110100436A TW110100436A TWI767498B TW I767498 B TWI767498 B TW I767498B TW 110100436 A TW110100436 A TW 110100436A TW 110100436 A TW110100436 A TW 110100436A TW I767498 B TWI767498 B TW I767498B
Authority
TW
Taiwan
Prior art keywords
signal
voice
internal server
text
interface
Prior art date
Application number
TW110100436A
Other languages
Chinese (zh)
Other versions
TW202134933A (en
Inventor
江哲宇
Original Assignee
華南商業銀行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 華南商業銀行股份有限公司 filed Critical 華南商業銀行股份有限公司
Priority to TW110100436A priority Critical patent/TWI767498B/en
Publication of TW202134933A publication Critical patent/TW202134933A/en
Application granted granted Critical
Publication of TWI767498B publication Critical patent/TWI767498B/en

Links

Images

Landscapes

  • Massaging Devices (AREA)
  • Machine Translation (AREA)
  • Feedback Control In General (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A cross-channel artificial intelligence dialogue platform comprises three internal servers. The voice input interface of the first internal server receives the first voice signal. The private-data-hiding module deletes the personal data of the first voice signal to generate the second voice signal. The speech-to-text interface and the semantic recognition module of the first internal server respectively converts the second voice signal to the first text signal and obtaining the intent signal according to the first text signal. The dialogue module and the application module respectively generates the response signal and the control instruction according to the intent signal. The text-to-speech interface of the first internal server converts the response signal to the third voice signal. The output interface outputs the third voice signal and the control instruction.

Description

整合機器學習的跨通路人工智慧對話式平台及其運作方法A cross-channel AI conversational platform integrating machine learning and its operation method

本發明係關於一種對話式平台及對話式平台的運作方法,特別是一種跨通路人工智慧對話式平台及其運作方法。The present invention relates to a dialogue-type platform and an operation method of the dialogue-type platform, in particular to a cross-channel artificial intelligence dialogue-type platform and its operation method.

隨著各種數位行銷通路的普及,民眾遇到任何交易上面的問題,往往希望第一時間獲得回覆。With the popularization of various digital marketing channels, people often hope to get a reply as soon as possible when they encounter any transaction problems.

然而,對於提供客服的金融機構而言,提高客服人員數量將不可避免地導致人力成本大幅提高。此外,訓練一名優秀的客服人員需要一定時間,在突如其來的大量客戶湧入時,可從容應付各種客戶問題的客服人員經常應接不暇,到職不久的客服人員又未必能滿足客戶各式各樣的問題。因此,對於無法提供良好服務的金融機構,民眾的評價將顯著下降,連帶影響民眾對於該金融機構在其他項目的信任度與參與意願。However, for financial institutions that provide customer service, increasing the number of customer service personnel will inevitably lead to a significant increase in labor costs. In addition, it takes a certain amount of time to train an excellent customer service staff. When a sudden influx of customers comes in, the customer service staff who can deal with various customer problems calmly are often overwhelmed. question. Therefore, for a financial institution that cannot provide good services, the public's evaluation will drop significantly, which will affect the public's trust and willingness to participate in other projects.

有鑑於此,本發明提出一種跨通路人工智慧(Artificial Intelligence ,AI)對話式平台。所述的通路包括:數位通路、客服中心及營業單位等。透過導入語音辨識系統,人工智慧對話式系統與人工智慧對話式後台,結合新一代對話式人工智慧技術,包括:自然語言處理(Natural Language Processing,NLP)、動態學習機制、多輪情境對話設計及動態資訊收集機制等建立客戶對話分析後台,藉此提升數位通路之使用者體驗及市場影響力。In view of this, the present invention proposes a cross-channel artificial intelligence (Artificial Intelligence, AI) conversational platform. The channels include: digital channels, customer service centers, business units, and the like. Through the introduction of speech recognition system, artificial intelligence conversational system and artificial intelligence conversational backend, combined with a new generation of conversational artificial intelligence technology, including: natural language processing (Natural Language Processing, NLP), dynamic learning mechanism, multi-round situational dialogue design and The dynamic information collection mechanism, etc. establishes the customer dialogue analysis background, thereby enhancing the user experience and market influence of the digital channel.

依據本發明一實施例的一種跨通路人工智慧對話式平台,包括:一第一內部伺服器,包括:一語音輸入介面、一語音轉文字介面、一語意辨識介面、一文字轉語音介面及一輸出介面,其中該語音輸入介面用於接收一第一語音訊號;一第二內部伺服器,通訊連接該第一內部伺服器,包括:一客戶音訊資料庫,用以儲存複數個音訊檔,該些音訊檔之內容分別對應至複數個個人資料;及一個資隱藏模組,電性連接該客戶音訊資料庫,該個資隱藏模組用以將該第一語音訊號分割為複數個語音片段,且當該個資隱藏模組判斷該些語音片段中之任一者符合該些音訊檔中之任一者時,該個資隱藏模組從該第一語音訊號刪除該語音片段對應之音頻資訊,並將被刪除該語音片段之音訊資料的第一語音訊號作為一第二語音訊號回傳至該第一內部伺服器;其中該第一內部伺服器之該語音轉文字介面用以根據該第二語音訊號產生一第一文字訊號;該語意辨識介面用以根據該第一文字訊號產生一意圖訊號;以及一第三內部伺服器,通訊連接該第一內部伺服器,包括:一對話模組,用以選擇性地根據該意圖訊號產生一回覆訊號;及一應用模組,用以產生對應於該意圖分析訊號的一控制指令;其中該第一內部伺服器之該文字轉語音介面用以根據該回覆訊號產生一第三語音訊號;該第一內部伺服器之該輸出介面用以輸出該第三語音訊號及該控制指令。A cross-channel artificial intelligence conversational platform according to an embodiment of the present invention includes: a first internal server, including: a voice input interface, a voice-to-text interface, a semantic recognition interface, a text-to-speech interface, and an output interface, wherein the voice input interface is used to receive a first voice signal; a second internal server, communicating with the first internal server, includes: a client audio database for storing a plurality of audio files, the The content of the audio file corresponds to a plurality of personal data respectively; and a data hiding module is electrically connected to the customer audio database, and the data hiding module is used for dividing the first voice signal into a plurality of voice segments, and When the information hiding module determines that any one of the voice segments matches any one of the audio files, the information hiding module deletes the audio information corresponding to the voice segment from the first voice signal, and send back the first voice signal of the audio data of the deleted voice segment as a second voice signal to the first internal server; wherein the voice-to-text interface of the first internal server is used for according to the second voice signal The voice signal generates a first text signal; the semantic recognition interface is used for generating an intention signal according to the first text signal; and a third internal server communicates with the first internal server, including: a dialogue module for selectively generating a reply signal according to the intention signal; and an application module for generating a control command corresponding to the intention analysis signal; wherein the text-to-speech interface of the first internal server is used for according to the reply The signal generates a third voice signal; the output interface of the first internal server is used for outputting the third voice signal and the control command.

依據本發明一實施例的一種跨通路人工智慧對話式平台的運作方法,包括:以一第一內部伺服器之一語音輸入介面接收一第一語音訊號;以一第二內部伺服器之一客戶音訊資料庫儲存複數個音訊檔,該些音訊檔之內容分別對應至複數個個人資料,其中該第二內部伺服器通訊連接該第一內部伺服器;以該第二內部伺服器之一個資隱藏模組將該第一語音訊號分割為複數個語音片段,其中,該個資隱藏模組電性連接該客戶音訊資料庫,且當該個資隱藏模組判斷該些語音片段中之任一者符合該些音訊檔中之任一者時,以該個資隱藏模組從該第一語音訊號刪除該語音片段對應之音頻資訊;以該個資隱藏模組回傳一第二語音訊號至該第一內部伺服器之一語音轉文字介面,其中該第二語音訊號係刪除該語音片段之音訊資料的第一語音訊號;以該語音轉文字介面根據該第二語音訊號產生一第一文字訊號;以該第一內部伺服器之一語意辨識介面根據該第一文字訊號產生一意圖訊號;以一第三內部伺服器之一對話模組根據該意圖訊號產生一回覆訊號,其中該第三內部伺服器通訊連接該第一內部伺服器;以該第一內部伺服器之一文字轉語音介面根據該回覆訊號產生一第三語音訊號;以該第三內部伺服器之一應用模組根據該意圖訊號產生一控制指令;以及以該第一內部伺服器之一輸出介面輸出該第三語音訊號及該控制指令。An operation method of a cross-channel artificial intelligence conversational platform according to an embodiment of the present invention includes: receiving a first voice signal through a voice input interface of a first internal server; using a client of a second internal server The audio database stores a plurality of audio files, and the contents of the audio files correspond to a plurality of personal data respectively, wherein the second internal server communicates with the first internal server; hides a piece of information from the second internal server The module divides the first voice signal into a plurality of voice segments, wherein the information hiding module is electrically connected to the customer audio database, and when the information hiding module determines any one of the voice segments When it matches any one of the audio files, use the data hiding module to delete the audio information corresponding to the voice segment from the first voice signal; use the data hiding module to return a second voice signal to the A voice-to-text interface of the first internal server, wherein the second voice signal is a first voice signal for deleting the audio data of the voice segment; the voice-to-text interface generates a first text signal according to the second voice signal; A semantic recognition interface of the first internal server is used to generate an intention signal according to the first text signal; a dialogue module of a third internal server is used to generate a reply signal according to the intention signal, wherein the third internal server The first internal server is communicated and connected; a text-to-speech interface of the first internal server is used to generate a third voice signal according to the reply signal; an application module of the third internal server is used to generate a a control command; and outputting the third voice signal and the control command through an output interface of the first internal server.

以上之關於本揭露內容之說明及以下之實施方式之說明係用以示範與解釋本發明之精神與原理,並且提供本發明之專利申請範圍更進一步之解釋。The above description of the present disclosure and the following description of the embodiments are used to demonstrate and explain the spirit and principle of the present invention, and provide further explanation of the scope of the patent application of the present invention.

以下在實施方式中詳細敘述本發明之詳細特徵以及優點,其內容足以使任何熟習相關技藝者了解本發明之技術內容並據以實施,且根據本說明書所揭露之內容、申請專利範圍及圖式,任何熟習相關技藝者可輕易地理解本發明相關之目的及優點。以下之實施例係進一步詳細說明本發明之觀點,但非以任何觀點限制本發明之範疇。The detailed features and advantages of the present invention are described in detail in the following embodiments, and the content is sufficient to enable any person skilled in the relevant art to understand the technical content of the present invention and implement it accordingly, and according to the content disclosed in this specification, the scope of the patent application and the drawings , any person skilled in the related art can easily understand the related objects and advantages of the present invention. The following examples further illustrate the viewpoints of the present invention in detail, but do not limit the scope of the present invention in any viewpoint.

請參考圖1,其係繪示本發明之一實施例的跨通路人工智慧對話式平台的架構圖100。所述的跨通路人工智慧對話式平台100,包括第一內部伺服器2、第二內部伺服器4及第三內部伺服器6。如圖1所示,第二內部伺服器4及第三內部伺服器6分別通訊連接至第一內部伺服器2。另外,第一內部伺服器2中的元件各自與客戶端裝置91、第一外部伺服器93、第二外部伺服器95及第三外部伺服器97通訊連接。Please refer to FIG. 1 , which is an architecture diagram 100 of a cross-channel AI conversational platform according to an embodiment of the present invention. The cross-channel AI conversational platform 100 includes a first internal server 2 , a second internal server 4 and a third internal server 6 . As shown in FIG. 1 , the second internal server 4 and the third internal server 6 are respectively connected to the first internal server 2 for communication. In addition, the components in the first internal server 2 are respectively connected to the client device 91 , the first external server 93 , the second external server 95 and the third external server 97 .

實務上,第一內部伺服器2、第二內部伺服器4及第三內部伺服器6例如係金融機構機房內配置的刀鋒伺服器(Blade Servers)、機架伺服器(Rack Servers)或直立式伺服器(Pedestal Servers),本發明對於第一、第二及第三內部伺服器2、4及6的硬體類型不予限制。In practice, the first internal server 2 , the second internal server 4 , and the third internal server 6 are, for example, Blade Servers, Rack Servers or stand-up servers in the computer room of a financial institution. Servers (Pedestal Servers), the present invention does not limit the hardware types of the first, second and third internal servers 2, 4 and 6.

第一內部伺服器2、第二內部伺服器4及第三內部伺服器6各自具有記憶體以實現後文述及的各項功能。上述記憶體可以是例如隨機存取記憶體、唯讀記憶體或是快閃記憶體等。在一實施例中,第一內部伺服器2、第二內部伺服器4及第三內部伺服器6中更包括支援有線網路、無線網路、行動網路及/或無線通訊的通訊裝置。在一實施例中,第一內部伺服器2、第二內部伺服器4及第三內部伺服器6各自包括一處理電路,可執行後文述及的功能。處理電路例如係微控制器(microcontroller)、微處理器(microprocessor)、處理器(processor)、中央處理器(central processing unit,CPU)、數位訊號處理器(digital signal processor)、特殊應用積體電路(application specific integrated circuit,ASIC)、數位邏輯電路、現場可程式邏輯閘陣列(field programmable gate array,FPGA) 及/或其它具有運算處理功能的硬體元件,本發明對於處理電路之硬體類型不予限制。The first internal server 2 , the second internal server 4 and the third internal server 6 each have a memory to implement the functions described later. The above-mentioned memory may be, for example, random access memory, read-only memory, or flash memory. In one embodiment, the first internal server 2 , the second internal server 4 and the third internal server 6 further include communication devices supporting wired network, wireless network, mobile network and/or wireless communication. In one embodiment, the first internal server 2 , the second internal server 4 and the third internal server 6 each include a processing circuit, which can perform the functions described later. The processing circuit is, for example, a microcontroller, a microprocessor, a processor, a central processing unit (CPU), a digital signal processor, and an application-specific integrated circuit. (application specific integrated circuit, ASIC), digital logic circuit, field programmable gate array (field programmable gate array, FPGA) and/or other hardware components with computing processing functions, the present invention has no effect on the hardware type of the processing circuit. be restricted.

請繼續參考圖1。第一內部伺服器2包括語音輸入介面21、語音轉文字介面23、語意辨識介面25、文字轉語音介面27及一輸出介面29。語音輸入介面21通訊連接至客戶端裝置91。所述的客戶端裝置91例如是使用者安裝有行動銀行App(應用程式)的智慧型手機、平板電腦,亦可以是智慧分行櫃台的智能音箱或是智能機器人等,對於使用者而言,其係與客戶端裝置91進行交談。實務上,使用者所發出的聲音將由客戶端裝置91的收音器(例如麥克風)據以產生第一語音訊號,再由客戶端裝置91的通訊元件發送此第一語音訊號至第一內部伺服器2的語音輸入介面21。簡言之,當使用者需要進行金融相關操作時,可直接對客戶端裝置91說話,產生第一語音訊號,然後此第一語音訊號將被送至語音輸入介面21進行處理。Please continue to refer to Figure 1. The first internal server 2 includes a voice input interface 21 , a voice-to-text interface 23 , a semantic recognition interface 25 , a text-to-speech interface 27 and an output interface 29 . The voice input interface 21 is communicatively connected to the client device 91 . The client device 91 is, for example, a smart phone or a tablet computer with a mobile banking App (application) installed by the user, or a smart speaker or a smart robot at the counter of a smart branch. chat with the client device 91. In practice, the sound made by the user will be used by the receiver (eg, microphone) of the client device 91 to generate a first voice signal, and then the communication element of the client device 91 will send the first voice signal to the first internal server. 2 of the voice input interface 21. In short, when the user needs to perform financial-related operations, he can directly speak to the client device 91 to generate a first voice signal, and then the first voice signal will be sent to the voice input interface 21 for processing.

請先參考圖1的第二內部伺服器4,其包括彼此電性連接的客戶音訊資料庫41及個資隱藏模組43。客戶音訊資料庫41儲存複數個音訊檔,該些音訊檔之內容分別對應至複數個個人資料。實務上,第二內部伺服器可更包括一動態資訊學習模組,該動態資訊學習模組例如預先以金融機構的人工客服錄音記錄作為訓練資料,並以機器學習的方式,從客服錄音記錄中自動辨別出屬於客戶個資的音訊片段,然後儲存這些音訊片段至客戶音訊資料庫41中。動態資訊學習模組更可以根據每次由語音輸入介面21獲得的第一語音訊號更新客戶音訊資料庫中的記錄,本發明對此不予限制。Please refer first to the second internal server 4 in FIG. 1 , which includes a client audio database 41 and a data hiding module 43 that are electrically connected to each other. The client audio database 41 stores a plurality of audio files, and the contents of the audio files correspond to a plurality of personal data respectively. In practice, the second internal server may further include a dynamic information learning module. For example, the dynamic information learning module uses the manual customer service recordings of financial institutions as training data in advance, and uses machine learning to learn from the customer service recordings. The audio clips belonging to the customer's personal data are automatically identified and stored in the customer audio database 41 . The dynamic information learning module can further update the records in the customer audio database according to the first voice signal obtained from the voice input interface 21 each time, which is not limited in the present invention.

請繼續參考圖1的第二內部伺服器4。個資隱藏模組43電性連接客戶音訊資料庫41且通訊連接至第一內部伺服器2的語音轉文字介面23。個資隱藏模組43用以將第一語音訊號分割為複數個語音片段,且當個資隱藏模組43判斷這些語音片段中之任一者符合客戶資訊資料庫41中儲存的複數個音訊檔中之任一者時,個資隱藏模組43從第一語音訊號刪除該語音片段對應之音頻資訊,並將被刪除的語音片段之音訊資料的第一語音訊號作為第二語音訊號回傳至第一內部伺服器2的語音轉文字介面23。在個資隱藏模組43進行比對時,例如可採用模糊比對演算法。另外,當比對到的使用者個資被分割到多個語音片段時,個資隱藏模組43將這些帶有使用者個資的語音片段重組以擷取出屬於使用者個資的完整的音訊資料。透過上述個資隱藏模組43的處理機制,可以將屬於使用者個人的隱私資料限制在金融機構的機房所設置的第一內部伺服器2及第二內部伺服器4中,而在後續的語音辨識時,不致於將使用者的個資外洩到網路上。Please continue to refer to the second internal server 4 of FIG. 1 . The personal information hiding module 43 is electrically connected to the client audio database 41 and is communicatively connected to the speech-to-text interface 23 of the first internal server 2 . The personal information hiding module 43 is used for dividing the first voice signal into a plurality of voice segments, and when the personal information hiding module 43 determines that any one of these voice segments matches the multiple audio files stored in the customer information database 41 When any of the above, the personal information hiding module 43 deletes the audio information corresponding to the voice segment from the first voice signal, and returns the first voice signal of the audio data of the deleted voice segment as a second voice signal to The speech-to-text interface 23 of the first internal server 2 . When the personal information hiding module 43 performs comparison, for example, a fuzzy comparison algorithm may be used. In addition, when the compared user profile is divided into multiple voice segments, the profile hiding module 43 reorganizes these voice segments with the user profile to extract the complete audio belonging to the user profile material. Through the processing mechanism of the above-mentioned personal information hiding module 43, the private information belonging to the user can be restricted to the first internal server 2 and the second internal server 4 set in the computer room of the financial institution, and in the subsequent voice During identification, the user's personal information will not be leaked to the Internet.

請參考圖1。第一內部伺服器2的語音轉文字介面23分別通訊連接第二內部伺服器4的個資隱藏模組43以及第一外部伺服器93,語意辨識介面25通訊連接至第二外部伺服器95。語音轉文字介面23根據第二語音訊號產生第一文字訊號,語意辨識介面25根據第一文字訊號產生一意圖訊號。換言之,語音轉文字介面23將包含使用者個資的聲音資料轉換為文字,語意辨識介面25在從文字中解讀出使用者的意圖。舉例來說,當第一文字訊號為:「我要轉帳一仟元」時,語意辨識模組25可從中獲知「使用者欲進行轉帳」,且「轉帳金額為一仟元」的這兩個意圖。實務上,語音轉文字介面23及語意辨識介面25例如係應用程式介面(Application Programming Interface,API),第一外部伺服器93例如係Google Cloud語音轉文字(speech-to-text,STT)外部伺服器。第二外部伺服器95例如係IBM華生(Watson)外部伺服器,可提供各項Watson認知運算服務,包括用以判斷客戶意圖的自然語言處理(NLP)的服務,可透過句型式(Pattern)機器學習機制,提高語意理解準確度。Please refer to Figure 1. The speech-to-text interface 23 of the first internal server 2 is communicatively connected to the data hiding module 43 of the second internal server 4 and the first external server 93 respectively, and the semantic recognition interface 25 is communicatively connected to the second external server 95 . The speech-to-text interface 23 generates a first text signal according to the second voice signal, and the semantic recognition interface 25 generates an intent signal according to the first text signal. In other words, the speech-to-text interface 23 converts the audio data including the user's personal information into text, and the semantic recognition interface 25 interprets the user's intention from the text. For example, when the first text signal is: "I want to transfer one thousand dollars", the semantic recognition module 25 can learn the two intentions of "the user wants to transfer money" and "the transfer amount is one thousand dollars". . In practice, the speech-to-text interface 23 and the semantic recognition interface 25 are, for example, application programming interfaces (APIs), and the first external server 93 is, for example, a Google Cloud speech-to-text (STT) external server. device. The second external server 95 is, for example, an IBM Watson external server, which can provide various Watson cognitive computing services, including natural language processing (NLP) services for judging customer intentions. ) machine learning mechanism to improve semantic understanding accuracy.

請參考圖1。第三內部伺服器6包括通訊連接至語意辨識介面25的對話模組61及應用模組63。對話模組61選擇性地根據意圖訊號產生回覆訊號。應用模組63可產生對應於意圖分析訊號的控制指令。實務上,第三內部伺服器6的對話模組61可透過機器學習模型,提供動態學習機制,因此可大幅提升維護效率。對話模組61更具有多輪情境對話設計。實務上,例如透過將待分析的人工客服記錄預先以Watson平臺訓練出對話分析模型,再將訓練出的對話分析模型儲存於對話模組61的資料庫中,藉此使對話模組61可提供互動情境式對話設計,並具備前後文(Context)連貫的功效。舉例來說,當使用者說出:「我要轉帳一仟元」時,對話模組61除了從語意辨識介面25獲得包含「使用者欲進行轉帳」且「轉帳金額為一仟元」這兩個意圖的意圖訊號之外,更能夠向使用者提出:『詢問使用者要轉帳對象的帳號』以及『詢問使用者要用來轉帳號的帳號』之類的回覆訊號,以便於在客戶端裝置91上運行的行動銀行App能收集足夠的資訊以完成後續的轉帳操作。此外,對話模組61更具有動態資訊收集機制,可以快速設定參數以快速部署,提高使用者體驗。當對話模組61無法辨識使用者的意圖訊號的時候,對話模組61可轉接至人工客服系統,由線上的客服人員回應使用者的問題。Please refer to Figure 1. The third internal server 6 includes a dialogue module 61 and an application module 63 that are communicatively connected to the semantic recognition interface 25 . The dialogue module 61 selectively generates a reply signal according to the intention signal. The application module 63 can generate control commands corresponding to the intended analysis signal. In practice, the dialogue module 61 of the third internal server 6 can provide a dynamic learning mechanism through the machine learning model, thus greatly improving the maintenance efficiency. The dialogue module 61 is designed with multiple rounds of situational dialogue. In practice, for example, a dialogue analysis model is trained on the Watson platform by pre-training the human customer service records to be analyzed, and then the trained dialogue analysis model is stored in the database of the dialogue module 61, so that the dialogue module 61 can provide Interactive situational dialogue design, and has the effect of context coherence. For example, when the user says: "I want to transfer one thousand dollars", the dialogue module 61 obtains information from the semantic recognition interface 25 that includes "the user wants to transfer money" and "the transfer amount is one thousand dollars". In addition to the intention signal of an intention, it can also provide the user with reply signals such as "ask the user's account to transfer the account" and "ask the user's account to be used to transfer the account", so as to facilitate the client device The mobile banking app running on 91 can collect enough information to complete subsequent transfer operations. In addition, the dialogue module 61 has a dynamic information collection mechanism, which can quickly set parameters for rapid deployment and improve user experience. When the dialogue module 61 cannot identify the user's intention signal, the dialogue module 61 can be transferred to the manual customer service system, and the online customer service personnel will respond to the user's question.

請繼續參考圖1。第一內部伺服器2之文字轉語音介面27分別通訊連接第三內部伺服器6的對話模組61以及第三外部伺服器97。文字轉語音介面27用以根據回覆訊號產生第三語音訊號;換言之,將對話模組61產生的回覆轉換為使用者可以理解的電腦語音,然後再由第一內部伺服器2的輸出介面29輸出此第三語音訊號到客戶端裝置91,以便於客戶端裝置91的揚聲器或喇叭播放此第三語音訊號供使用者聆聽。實務上,第三外部伺服器97例如係工研院文字轉語音Web服務外部伺服器,可提供文字轉語音(Text To Speech,TTS)的網路服務(web service),伺服器提供 SOAP(Simple Object Access Protocol)協議的網路服務,將所輸入的文字轉換為語音進行輸出。須注意的是,雖本發明的第一、第二及第三外部伺服器93、95及97在上述實施例中係第一內部伺服器2透過網際網路連線到各家的雲端服務,然而在另一實施例中,上述的外部伺服器亦可由金融機構自行購置具有文字語音互轉功能以及語意理解功能的伺服器並設立於本地端的機房,本發明並不限制第一至第三外部伺服器93~97必須連線至雲端方可達到上述的功能。Please continue to refer to Figure 1. The text-to-speech interface 27 of the first internal server 2 is respectively connected to the dialogue module 61 of the third internal server 6 and the third external server 97 for communication. The text-to-speech interface 27 is used to generate a third voice signal according to the reply signal; in other words, the reply generated by the dialogue module 61 is converted into a computer voice that the user can understand, and then output by the output interface 29 of the first internal server 2 The third voice signal is sent to the client device 91 so that the speaker or speaker of the client device 91 can play the third voice signal for the user to listen to. In practice, the third external server 97 is, for example, a text-to-speech web service external server of the Industrial Technology Research Institute, which can provide a text-to-speech (TTS) web service (web service), and the server provides SOAP (Simple Object Access Protocol) protocol network service, which converts the input text into voice for output. It should be noted that although the first, second and third external servers 93, 95 and 97 of the present invention are the first internal server 2 connected to each cloud service through the Internet in the above-mentioned embodiment, However, in another embodiment, the above-mentioned external server can also be purchased by a financial institution with a text-to-speech interconversion function and a server with semantic understanding function and set up in the local computer room. The present invention does not limit the first to third external servers. Servers 93~97 must be connected to the cloud to achieve the above functions.

請繼續參考圖1。第一內部伺服器2之輸出介面29通訊連接至第三內部伺服器6的應用模組63以及客戶端裝置91。輸出介面29除用以輸出第三語音訊號外,更輸出由應用模組63產生的控制指令。所述的控制指令例如係控制行動銀行App完成轉帳操作的指令。Please continue to refer to Figure 1. The output interface 29 of the first internal server 2 is communicatively connected to the application module 63 of the third internal server 6 and the client device 91 . In addition to outputting the third voice signal, the output interface 29 also outputs the control commands generated by the application module 63 . The control instruction is, for example, an instruction to control the mobile banking App to complete the transfer operation.

根據上述的跨通路人工智慧對話式平台100的內容,實務上可根據需要,將跨通路人工智慧對話式平台100通訊連接至使用者的智慧型手機或是智慧分行櫃台的智能音箱。藉此,使用者得以透過和客戶端裝置91對話的方式完成所欲進行的金融交易操作項目。According to the content of the above-mentioned cross-channel artificial intelligence conversational platform 100, in practice, the cross-channel artificial intelligence conversational platform 100 can be communicatively connected to the user's smartphone or the smart speaker at the smart branch counter. In this way, the user can complete the desired financial transaction operation item through dialogue with the client device 91 .

請一併參考圖1及圖2。圖2係繪示本發明一實施例的跨通路人工智慧對話式平台的運作方法,適用於前述的跨通路人工智慧對話式平台100。請參考步驟S11,語音輸入介面21接收第一語音訊號。詳言之,客戶端裝置91以有線或無線通訊方式傳送使用者的語音,再由第一內部伺服器2的語音輸入介面21接收。請參考步驟S21,第二內部伺服器4的個資隱藏模組43將第一語音訊號分割為複數個語音片段。請參考步驟S23,個資隱藏模組43比對語音片段與客戶音訊資料庫41中的音訊檔。請參考步驟S25,個資隱藏模組43判斷任一語音片段是否符合任一音訊檔。如果符合,則繼續執行步驟S27。否則回到步驟S23。請參考步驟S27,個資隱藏模組43回傳第二語音訊號S27,所述的第二語音訊號係係刪除該語音片段之音訊資料(例如代表使用者個資的音訊的波形訊號)的第一語音訊號。請參考步驟S13,第一內部伺服器2的語音轉文字介面23根據第二語音訊號產生第一文字訊號,具體而言係將不包含使用者個人資料的音訊資料轉換為文字資料。請參考步驟S15,第一內部伺服器2的語意辨識介面25根據第一文字訊號產生意圖訊號S15,例如將文字訊號透過提供雲端服務的語意理解伺服器分析以得到使用者的意圖。請參考步驟S31,第三內部伺服器6的對話模組61根據該意圖訊號產生回覆訊號,換言之,對話模組61可從回覆訊號中辨別出使用者的意圖以給予個人化的服務或是回應。實務上,若是客戶的問題不在對話模組61的回應範圍,對話模組61可轉接到人工客服系統進行後續的客戶服務。請參考步驟S17,第一內部伺服器2的文字轉語音介面27根據回覆訊號產生第三語音訊號S17,其係將系統給予使用者的回覆轉換為使用者可聽到的電腦語音。請參考步驟S33,應用模組63根據意圖訊號產生控制指令S33。該控制指令用以在客戶端裝置上進行對應於使用者語音的操作。請參考步驟S35,輸出介面29輸出第三語音訊號及控制指令至客戶端裝置91,例如將系統回覆播放給使用者聆聽,實現與使用者的對話互動以取得使用者想要執行的操作所需要的其他參數,或是執行控制指令以完成使用者想要的金融交易操作項目。Please refer to Figure 1 and Figure 2 together. FIG. 2 illustrates an operation method of the cross-channel artificial intelligence conversational platform according to an embodiment of the present invention, which is applicable to the aforementioned cross-channel artificial intelligence conversational platform 100 . Please refer to step S11, the voice input interface 21 receives the first voice signal. To be more specific, the client device 91 transmits the user's voice through wired or wireless communication, and is then received by the voice input interface 21 of the first internal server 2 . Referring to step S21, the personal information hiding module 43 of the second internal server 4 divides the first voice signal into a plurality of voice segments. Please refer to step S23 , the personal information hiding module 43 compares the voice segment with the audio file in the customer audio database 41 . Referring to step S25, the personal information hiding module 43 determines whether any voice segment matches any audio file. If so, continue to step S27. Otherwise, go back to step S23. Please refer to step S27, the personal information hiding module 43 returns a second voice signal S27, the second voice signal is the first part of the audio data (for example, the waveform signal representing the user's personal data) of the audio segment deleted. a voice signal. Referring to step S13, the voice-to-text interface 23 of the first internal server 2 generates a first text signal according to the second voice signal, and specifically, converts the audio data without the user's personal data into text data. Referring to step S15, the semantic recognition interface 25 of the first internal server 2 generates an intention signal S15 according to the first text signal, for example, the text signal is analyzed by a semantic understanding server providing cloud services to obtain the user's intention. Please refer to step S31, the dialogue module 61 of the third internal server 6 generates a reply signal according to the intention signal. In other words, the dialogue module 61 can identify the user's intention from the reply signal to provide personalized service or response . In practice, if the customer's question is not within the response range of the dialogue module 61, the dialogue module 61 can transfer to the manual customer service system for subsequent customer service. Referring to step S17, the text-to-speech interface 27 of the first internal server 2 generates a third voice signal S17 according to the reply signal, which converts the reply given to the user by the system into a computer voice audible to the user. Please refer to step S33, the application module 63 generates the control command S33 according to the intention signal. The control command is used to perform an operation corresponding to the user's voice on the client device. Please refer to step S35, the output interface 29 outputs the third voice signal and the control command to the client device 91, for example, playing the system reply to the user for listening, realizing the dialogue interaction with the user and obtaining the operation required by the user to perform the operation. other parameters, or execute control instructions to complete the financial transaction operation items desired by the user.

綜合以上所述,本發明所揭露的跨通路人工智慧對話式平台藉由提供與使用者對話來完成金融交易操作的服務,使客戶感受到最佳的體驗與服務,並且可防止使用者的個人隱私資訊外洩到雲端,保護使用者個資安全。另外,藉由跨通路人工智慧對話式平台串接到行動銀行App、智慧分行櫃檯或是智慧個人理財服務,更可以減少金融機構額外聘雇與訓練可提供上述金融服務的人員所需耗費的人力與時間成本。To sum up the above, the cross-channel artificial intelligence dialogue platform disclosed in the present invention provides the service of completing financial transaction operations through dialogue with the user, so that the customer can feel the best experience and service, and can prevent the user's personal Private information is leaked to the cloud to protect the security of users' personal information. In addition, by connecting the cross-channel artificial intelligence dialogue platform to the mobile banking app, smart branch counter or smart personal financial management service, it can also reduce the additional manpower required by financial institutions to hire and train personnel who can provide the above financial services. with time cost.

雖然本發明以前述之實施例揭露如上,然其並非用以限定本發明。在不脫離本發明之精神和範圍內,所為之更動與潤飾,均屬本發明之專利保護範圍。關於本發明所界定之保護範圍請參考所附之申請專利範圍。Although the present invention is disclosed in the foregoing embodiments, it is not intended to limit the present invention. Changes and modifications made without departing from the spirit and scope of the present invention belong to the scope of patent protection of the present invention. For the protection scope defined by the present invention, please refer to the attached patent application scope.

100:跨通路人工智慧對話式平台 2:第一內部伺服器 21:語音輸入介面 23:語音轉文字介面 25:語意辨識介面 27:文字轉語音介面 29:輸出介面 4:第二內部伺服器 41:客戶音訊資料庫 43:個資隱藏模組 6:第三內部伺服器 61:對話模組 63:應用模組 91:客戶端裝置 93:第一外部伺服器 95:第二外部伺服器 97:第三外部伺服器 S11~S35:步驟100: A Cross-Channel AI Conversational Platform 2: The first internal server 21: Voice input interface 23: Speech-to-text interface 25: Semantic recognition interface 27: Text-to-speech interface 29: Output interface 4: Second internal server 41:Customer audio database 43: Personal information hidden module 6: Third internal server 61: Dialogue Mods 63: Application Mods 91: Client Device 93:First external server 95: Second external server 97: Third external server S11~S35: Steps

圖1係依據本發明一實施例的跨通路人工智慧對話式平台所繪示的架構圖。 圖2係依據本發明一實施例的跨通路人工智慧對話式平台的運作方法所繪示的流程圖。FIG. 1 is a structural diagram of a cross-channel artificial intelligence conversational platform according to an embodiment of the present invention. FIG. 2 is a flowchart illustrating an operation method of a cross-channel AI conversational platform according to an embodiment of the present invention.

100:跨通路人工智慧對話式平台 100: A Cross-Channel AI Conversational Platform

2:第一內部伺服器 2: The first internal server

21:語音輸入介面 21: Voice input interface

23:語音轉文字介面 23: Speech-to-text interface

25:語意辨識介面 25: Semantic recognition interface

27:文字轉語音介面 27: Text-to-speech interface

29:輸出介面 29: Output interface

4:第二內部伺服器 4: Second internal server

41:客戶音訊資料庫 41:Customer audio database

43:個資隱藏模組 43: Personal information hidden module

6:第三內部伺服器 6: Third internal server

61:對話模組 61: Dialogue Mods

63:應用模組 63: Application module

91:客戶端裝置 91: Client Device

93:第一外部伺服器 93:First external server

95:第二外部伺服器 95: Second external server

97:第三外部伺服器 97: Third external server

Claims (3)

一種整合機器學習的跨通路人工智慧對話式平台,包括:一第一內部伺服器,包括:一語音輸入介面、一語音轉文字介面、一語意辨識介面、一文字轉語音介面及一輸出介面,其中該語音輸入介面用於接收一第一語音訊號;一第二內部伺服器,通訊連接該第一內部伺服器,包括:一客戶音訊資料庫,用以儲存複數個音訊檔,該些音訊檔之內容分別對應至複數個個人資料;及一個資隱藏模組,電性連接該客戶音訊資料庫,該個資隱藏模組用以將該第一語音訊號分割為複數個語音片段,且當該個資隱藏模組判斷該些語音片段中之任一者符合該些音訊檔中之任一者時,該個資隱藏模組從該第一語音訊號刪除該語音片段對應之音頻資訊,並將被刪除該語音片段之音訊資料的第一語音訊號作為一第二語音訊號回傳至該第一內部伺服器;其中該第一內部伺服器之該語音轉文字介面用以根據該第二語音訊號產生一第一文字訊號;該語意辨識介面用以根據該第一文字訊號產生一意圖訊號;以及一第三內部伺服器,通訊連接該第一內部伺服器,包括:一對話模組,用以選擇性地根據該意圖訊號產生一回覆訊號;及一應用模組,用以產生對應於該意圖分析訊號的一控制指令;其中該第一內部伺服器之該文字轉語音介面用以根據該回覆訊號產生一第三語音訊號; 該第一內部伺服器之該輸出介面用以輸出該第三語音訊號及該控制指令;其中該第一內部伺服器、該第二內部伺服器及該第三內部伺服器係內部伺服器係刀鋒伺服器、機架伺服器或直立式伺服器;該語音轉文字介面係通訊連接至Google Cloud語音轉文字外部伺服器,該語意辨識介面係通訊連接至IBM華生外部伺服器,且該文字轉語音介面係通訊連接至工研院文字轉語音Web服務外部伺服器;該第二內部伺服器更包括一動態資訊學習模組,該動態資訊學習模組用以根據一客服錄音記錄並以機器學習方式從該客服錄音記錄中取出關聯於該些個人資料的該些音訊檔,以及儲存這些音訊檔至該客戶音訊資料庫。A cross-channel artificial intelligence conversational platform integrating machine learning, comprising: a first internal server, including: a voice input interface, a voice-to-text interface, a semantic recognition interface, a text-to-speech interface, and an output interface, wherein The voice input interface is used for receiving a first voice signal; a second internal server, communicating with the first internal server, includes: a client audio database for storing a plurality of audio files, among which the audio files are The content corresponds to a plurality of personal data respectively; and a data hiding module is electrically connected to the customer audio database, and the data hiding module is used for dividing the first voice signal into a plurality of voice segments, and when the When the data hiding module determines that any one of the voice clips matches any of the audio files, the data hiding module deletes the audio information corresponding to the voice clip from the first voice signal, and will be Deleting the first voice signal of the audio data of the voice segment is sent back to the first internal server as a second voice signal; wherein the voice-to-text interface of the first internal server is used for generating according to the second voice signal a first text signal; the semantic recognition interface is used for generating an intention signal according to the first text signal; and a third internal server, communicating with the first internal server, including: a dialogue module for selectively A reply signal is generated according to the intention signal; and an application module is used for generating a control command corresponding to the intention analysis signal; wherein the text-to-speech interface of the first internal server is used for generating a reply signal according to the reply signal a third voice signal; the output interface of the first internal server is used for outputting the third voice signal and the control command; wherein the first internal server, the second internal server and the third internal server are Internal servers are blade servers, rack servers, or stand-up servers; the speech-to-text interface is communicatively connected to a Google Cloud speech-to-text external server, and the semantic recognition interface is communicatively connected to an IBM Watson external server , and the text-to-speech interface is communicatively connected to the ITRI text-to-speech web service external server; the second internal server further includes a dynamic information learning module, which is used for recording according to a customer service Recording and extracting the audio files associated with the personal data from the customer service recording in a machine learning manner, and storing the audio files in the customer audio database. 如請求項1所述的整合機器學習的跨通路人工智慧對話式平台,其中當該個資隱藏模組判斷該第一語音訊號中具有複數個語音片段分別符合該客戶音訊資料庫中的複數個音訊檔時,該個資隱藏模組更用以重組該些語音片段以擷取該些個人資料中的一完整個人資料。The cross-channel artificial intelligence conversational platform integrating machine learning as claimed in claim 1, wherein when the information hiding module determines that the first voice signal has a plurality of voice segments respectively corresponding to the plurality of voice segments in the customer audio database When the audio file is generated, the data hiding module is further used for recombining the voice clips to extract a complete personal data among the personal data. 一種整合機器學習的跨通路人工智慧對話式平台的運作方法,包括:以一第一內部伺服器之一語音輸入介面接收一第一語音訊號;以該第二內部伺服器之一個資隱藏模組將該第一語音訊號分割為複數個語音片段,其中,該個資隱藏模組電性連接該客戶音訊資料庫,且當該個資隱藏模組判斷該些語音片段中之任一者符合該第二內部伺服器之一客戶音訊資料庫所儲存之複數個音訊檔中之任一者時,以該個資隱藏模組從該第一語音訊號刪除該語音片段對應之音頻資訊,其中該第二內部伺服器通訊連接該第一內部伺服器,該些音訊檔之內容分別對應至複數個個人資料;以該個資隱藏模組回傳一第二語音訊號至該第一內部伺服器之一語音轉文字介面,其中該第二語音訊號係刪除該語音片段之音訊資料的第一語音訊號;以該語音轉文字介面根據該第二語音訊號產生一第一文字訊號;以該第一內部伺服器之一語意辨識介面根據該第一文字訊號產生一意圖訊號;以一第三內部伺服器之一對話模組根據該意圖訊號產生一回覆訊號,其中該第三內部伺服器通訊連接該第一內部伺服器;以該第一內部伺服器之一文字轉語音介面根據該回覆訊號產生一第三語音訊號;以該第三內部伺服器之一應用模組根據該意圖訊號產生一控制指令;以及以該第一內部伺服器之一輸出介面輸出該第三語音訊號及該控制指令;其中該第一內部伺服器、該第二內部伺服器及該第三內部伺服器係內部伺服器係刀鋒伺服器、機架伺服器或直立式伺服器;該語音轉文字介面係通訊連接至Google Cloud語音轉文字外部伺服器,該語意辨識介面係通訊連接至IBM華生外部伺服器,且該文字轉語音介面係通訊連接至工研院文字轉語音Web服務外部伺服器;在接收該第一語音訊號之前,更包括以該第二內部伺服器之一動態資訊學習模組以機器學習方式從一客服錄音記錄中取得關聯於該些個人資料的該些音訊檔;以及以該動態資訊學習模組儲存該些音訊檔至該客戶音訊資料庫。An operation method of a cross-channel artificial intelligence conversational platform integrating machine learning, comprising: receiving a first voice signal through a voice input interface of a first internal server; hiding a module with an information of the second internal server Divide the first voice signal into a plurality of voice segments, wherein the information hiding module is electrically connected to the client audio database, and when the information hiding module determines that any one of the voice segments conforms to the When any one of the plurality of audio files stored in a client audio database of the second internal server is used, the data hiding module is used to delete the audio information corresponding to the audio segment from the first audio signal, wherein the audio information corresponding to the audio segment is deleted from the first audio signal. Two internal servers are connected to the first internal server in communication, and the contents of the audio files correspond to a plurality of personal data respectively; the data hiding module returns a second voice signal to one of the first internal servers A voice-to-text interface, wherein the second voice signal is the first voice signal of the audio data of the voice segment deleted; a first text signal is generated by the voice-to-text interface according to the second voice signal; the first internal server is used A semantic recognition interface generates an intent signal according to the first text signal; a dialogue module of a third internal server generates a reply signal according to the intent signal, wherein the third internal server is communicatively connected to the first internal server using a text-to-speech interface of the first internal server to generate a third voice signal according to the reply signal; using an application module of the third internal server to generate a control command according to the intention signal; and using the first An output interface of an internal server outputs the third voice signal and the control command; wherein the first internal server, the second internal server and the third internal server are internal servers which are blade servers, machine server or stand-up server; the speech-to-text interface is communicatively connected to a Google Cloud speech-to-text external server, the semantic recognition interface is communicatively connected to an IBM Watson external server, and the text-to-speech interface is communicative Connecting to an external server of text-to-speech web service of ITRI; before receiving the first voice signal, further comprising obtaining from a customer service recording record by means of machine learning with a dynamic information learning module of the second internal server The audio files associated with the personal data; and using the dynamic information learning module to store the audio files to the client audio database.
TW110100436A 2019-02-13 2019-02-13 Cross-channel artificial intelligence dialogue platform integrating machine learning and operation method thereof TWI767498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW110100436A TWI767498B (en) 2019-02-13 2019-02-13 Cross-channel artificial intelligence dialogue platform integrating machine learning and operation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW110100436A TWI767498B (en) 2019-02-13 2019-02-13 Cross-channel artificial intelligence dialogue platform integrating machine learning and operation method thereof

Publications (2)

Publication Number Publication Date
TW202134933A TW202134933A (en) 2021-09-16
TWI767498B true TWI767498B (en) 2022-06-11

Family

ID=78777352

Family Applications (1)

Application Number Title Priority Date Filing Date
TW110100436A TWI767498B (en) 2019-02-13 2019-02-13 Cross-channel artificial intelligence dialogue platform integrating machine learning and operation method thereof

Country Status (1)

Country Link
TW (1) TWI767498B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106128453A (en) * 2016-08-30 2016-11-16 深圳市容大数字技术有限公司 The Intelligent Recognition voice auto-answer method of a kind of robot and robot
CN106155992A (en) * 2015-03-24 2016-11-23 中兴通讯股份有限公司 Voice and/or the filter method of character information, device and terminal
TW201725580A (en) * 2015-12-31 2017-07-16 Beijing Sogou Technology Development Co Ltd Speech input method and terminal device
CN107180631A (en) * 2017-05-24 2017-09-19 刘平舟 Voice interaction method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106155992A (en) * 2015-03-24 2016-11-23 中兴通讯股份有限公司 Voice and/or the filter method of character information, device and terminal
TW201725580A (en) * 2015-12-31 2017-07-16 Beijing Sogou Technology Development Co Ltd Speech input method and terminal device
CN106128453A (en) * 2016-08-30 2016-11-16 深圳市容大数字技术有限公司 The Intelligent Recognition voice auto-answer method of a kind of robot and robot
CN107180631A (en) * 2017-05-24 2017-09-19 刘平舟 Voice interaction method and device

Also Published As

Publication number Publication date
TW202134933A (en) 2021-09-16

Similar Documents

Publication Publication Date Title
Warden Speech commands: A dataset for limited-vocabulary speech recognition
US11417343B2 (en) Automatic speaker identification in calls using multiple speaker-identification parameters
CN105489221B (en) A kind of audio recognition method and device
US9542956B1 (en) Systems and methods for responding to human spoken audio
CN104509065B (en) Human interaction proof is used as using the ability of speaking
WO2020253509A1 (en) Situation- and emotion-oriented chinese speech synthesis method, device, and storage medium
US20220076674A1 (en) Cross-device voiceprint recognition
US20180342250A1 (en) Automatic speaker identification in calls
WO2019000832A1 (en) Method and apparatus for voiceprint creation and registration
US11514919B1 (en) Voice synthesis for virtual agents
US10970898B2 (en) Virtual-reality based interactive audience simulation
KR20160030168A (en) Voice recognition method, apparatus, and system
CN110689261A (en) Service quality evaluation product customization platform and method
JP7568851B2 (en) Filtering other speakers' voices from calls and audio messages
CN110933225B (en) Call information acquisition method and device, storage medium and electronic equipment
CN107430851A (en) Speech suggestion device, speech reminding method and program
CN110428825A (en) Ignore the trigger word in streaming media contents
US20220201121A1 (en) System, method and apparatus for conversational guidance
TWM578858U (en) Cross-channel artificial intelligence dialogue platform
CN112860213B (en) Audio processing method and device, storage medium and electronic equipment
KR102226427B1 (en) Apparatus for determining title of user, system including the same, terminal and method for the same
KR102284912B1 (en) Method and appratus for providing counseling service
JP5707346B2 (en) Information providing apparatus, program thereof, and information providing system
TWI767498B (en) Cross-channel artificial intelligence dialogue platform integrating machine learning and operation method thereof
TWI767499B (en) Cross-channel artificial intelligence dialogue platform integrating online custom service system and its operation method