TW202030626A - Cross-channel artificial intelligence dialogue platform and operation method thereof - Google Patents

Cross-channel artificial intelligence dialogue platform and operation method thereof Download PDF

Info

Publication number
TW202030626A
TW202030626A TW108104841A TW108104841A TW202030626A TW 202030626 A TW202030626 A TW 202030626A TW 108104841 A TW108104841 A TW 108104841A TW 108104841 A TW108104841 A TW 108104841A TW 202030626 A TW202030626 A TW 202030626A
Authority
TW
Taiwan
Prior art keywords
voice
signal
internal server
text
module
Prior art date
Application number
TW108104841A
Other languages
Chinese (zh)
Other versions
TWI739067B (en
Inventor
江哲宇
Original Assignee
華南商業銀行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 華南商業銀行股份有限公司 filed Critical 華南商業銀行股份有限公司
Priority to TW108104841A priority Critical patent/TWI739067B/en
Publication of TW202030626A publication Critical patent/TW202030626A/en
Application granted granted Critical
Publication of TWI739067B publication Critical patent/TWI739067B/en

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A cross-channel artificial intelligence dialogue platform comprises three internal servers. The voice input interface of the first internal server receives the first voice signal. The private-data-hiding module deletes the personal data of the first voice signal to generate the second voice signal. The speech-to-text interface and the semantic recognition module of the first internal server respectively converts the second voice signal to the first text signal and obtaining the intent signal according to the first text signal. The dialogue module and the application module respectively generates the response signal and the control instruction according to the intent signal. The text-to-speech interface of the first internal server converts the response signal to the third voice signal. The output interface outputs the third voice signal and the control instruction.

Description

跨通路人工智慧對話式平台及其運作方法Cross-channel artificial intelligence dialogue platform and its operation method

本發明係關於一種對話式平台及對話式平台的運作方法,特別是一種跨通路人工智慧對話式平台及其運作方法。The invention relates to a dialogue platform and an operation method of the dialogue platform, in particular to a cross-channel artificial intelligence dialogue platform and an operation method thereof.

隨著各種數位行銷通路的普及,民眾遇到任何交易上面的問題,往往希望第一時間獲得回覆。With the popularization of various digital marketing channels, people who encounter any transaction problems often hope to get a reply as soon as possible.

然而,對於提供客服的金融機構而言,提高客服人員數量將不可避免地導致人力成本大幅提高。此外,訓練一名優秀的客服人員需要一定時間,在突如其來的大量客戶湧入時,可從容應付各種客戶問題的客服人員經常應接不暇,到職不久的客服人員又未必能滿足客戶各式各樣的問題。因此,對於無法提供良好服務的金融機構,民眾的評價將顯著下降,連帶影響民眾對於該金融機構在其他項目的信任度與參與意願。However, for financial institutions that provide customer service, increasing the number of customer service staff will inevitably lead to a substantial increase in labor costs. In addition, it takes a certain amount of time to train an excellent customer service staff. When a large number of customers come in suddenly, the customer service staff who can calmly deal with various customer problems are often overwhelmed. The customer service staff who have recently arrived may not be able to meet the various customer needs. problem. Therefore, for financial institutions that are unable to provide good services, the public’s evaluation will drop significantly, which will affect the public’s trust in and willingness to participate in other projects of the financial institution.

有鑑於此,本發明提出一種跨通路人工智慧(Artificial Intelligence ,AI)對話式平台。所述的通路包括:數位通路、客服中心及營業單位等。透過導入語音辨識系統,人工智慧對話式系統與人工智慧對話式後台,結合新一代對話式人工智慧技術,包括:自然語言處理(Natural Language Processing,NLP)、動態學習機制、多輪情境對話設計及動態資訊收集機制等建立客戶對話分析後台,藉此提升數位通路之使用者體驗及市場影響力。In view of this, the present invention proposes a cross-channel artificial intelligence (AI) dialogue platform. The said channels include: digital channels, customer service centers and business units. Through the introduction of speech recognition system, artificial intelligence dialogue system and artificial intelligence dialogue backend, combined with a new generation of dialogue artificial intelligence technology, including: natural language processing (Natural Language Processing, NLP), dynamic learning mechanism, multi-round situational dialogue design and Dynamic information collection mechanisms, etc., establish a customer dialogue analysis backstage to enhance the user experience and market influence of digital channels.

依據本發明一實施例的一種跨通路人工智慧對話式平台,包括:一第一內部伺服器,包括:一語音輸入介面、一語音轉文字介面、一語意辨識介面、一文字轉語音介面及一輸出介面,其中該語音輸入介面用於接收一第一語音訊號;一第二內部伺服器,通訊連接該第一內部伺服器,包括:一客戶音訊資料庫,用以儲存複數個音訊檔,該些音訊檔之內容分別對應至複數個個人資料;及一個資隱藏模組,電性連接該客戶音訊資料庫,該個資隱藏模組用以將該第一語音訊號分割為複數個語音片段,且當該個資隱藏模組判斷該些語音片段中之任一者符合該些音訊檔中之任一者時,該個資隱藏模組從該第一語音訊號刪除該語音片段對應之音頻資訊,並將被刪除該語音片段之音訊資料的第一語音訊號作為一第二語音訊號回傳至該第一內部伺服器;其中該第一內部伺服器之該語音轉文字介面用以根據該第二語音訊號產生一第一文字訊號;該語意辨識介面用以根據該第一文字訊號產生一意圖訊號;以及一第三內部伺服器,通訊連接該第一內部伺服器,包括:一對話模組,用以選擇性地根據該意圖訊號產生一回覆訊號;及一應用模組,用以產生對應於該意圖分析訊號的一控制指令;其中該第一內部伺服器之該文字轉語音介面用以根據該回覆訊號產生一第三語音訊號;該第一內部伺服器之該輸出介面用以輸出該第三語音訊號及該控制指令。A cross-channel artificial intelligence dialogue platform according to an embodiment of the present invention includes: a first internal server, including: a voice input interface, a voice-to-text interface, a semantic recognition interface, a text-to-speech interface, and an output Interface, wherein the voice input interface is used to receive a first voice signal; a second internal server, which is communicatively connected to the first internal server, includes: a client audio database for storing a plurality of audio files, the The content of the audio file corresponds to a plurality of personal data respectively; and an information hiding module is electrically connected to the customer audio database, and the information hiding module is used to divide the first voice signal into a plurality of voice fragments, and When the information hiding module determines that any one of the voice segments matches any one of the audio files, the information hiding module deletes the audio information corresponding to the voice segment from the first voice signal, And the first voice signal of the audio data of the deleted voice segment is returned to the first internal server as a second voice signal; wherein the voice-to-text interface of the first internal server is used according to the second voice signal The voice signal generates a first text signal; the semantic recognition interface is used to generate an intention signal based on the first text signal; and a third internal server, which is communicatively connected to the first internal server, includes: a dialogue module for Selectively generating a reply signal according to the intention signal; and an application module for generating a control command corresponding to the intention analysis signal; wherein the text-to-speech interface of the first internal server is used according to the reply The signal generates a third voice signal; the output interface of the first internal server is used to output the third voice signal and the control command.

依據本發明一實施例的一種跨通路人工智慧對話式平台的運作方法,包括:以一第一內部伺服器之一語音輸入介面接收一第一語音訊號;以一第二內部伺服器之一客戶音訊資料庫儲存複數個音訊檔,該些音訊檔之內容分別對應至複數個個人資料,其中該第二內部伺服器通訊連接該第一內部伺服器;以該第二內部伺服器之一個資隱藏模組將該第一語音訊號分割為複數個語音片段,其中,該個資隱藏模組電性連接該客戶音訊資料庫,且當該個資隱藏模組判斷該些語音片段中之任一者符合該些音訊檔中之任一者時,以該個資隱藏模組從該第一語音訊號刪除該語音片段對應之音頻資訊;以該個資隱藏模組回傳一第二語音訊號至該第一內部伺服器之一語音轉文字介面,其中該第二語音訊號係刪除該語音片段之音訊資料的第一語音訊號;以該語音轉文字介面根據該第二語音訊號產生一第一文字訊號;以該第一內部伺服器之一語意辨識介面根據該第一文字訊號產生一意圖訊號;以一第三內部伺服器之一對話模組根據該意圖訊號產生一回覆訊號,其中該第三內部伺服器通訊連接該第一內部伺服器;以該第一內部伺服器之一文字轉語音介面根據該回覆訊號產生一第三語音訊號;以該三伺服器之一應用模組根據該意圖訊號產生一控制指令;以及以該第一內部伺服器之一輸出介面輸出該第三語音訊號及該控制指令。An operating method of a cross-channel artificial intelligence dialogue platform according to an embodiment of the present invention includes: receiving a first voice signal through a voice input interface of a first internal server; and receiving a client from a second internal server The audio database stores a plurality of audio files, and the contents of the audio files respectively correspond to a plurality of personal data, wherein the second internal server is connected to the first internal server; one of the second internal servers is hidden The module divides the first voice signal into a plurality of voice segments, wherein the information hiding module is electrically connected to the customer audio database, and when the information hiding module determines any one of the voice segments When it matches any one of the audio files, delete the audio information corresponding to the voice segment from the first voice signal by using the information hiding module; using the information hiding module to return a second voice signal to the A voice-to-text interface of the first internal server, wherein the second voice signal is a first voice signal that deletes audio data of the voice segment; and the voice-to-text interface generates a first text signal according to the second voice signal; A semantic recognition interface of the first internal server generates an intent signal based on the first text signal; a dialogue module of a third internal server generates a reply signal based on the intent signal, wherein the third internal server Communicatively connect to the first internal server; use a text-to-speech interface of the first internal server to generate a third voice signal based on the reply signal; use one of the three servers to generate a control command based on the intention signal ; And output the third voice signal and the control command with an output interface of the first internal server.

以上之關於本揭露內容之說明及以下之實施方式之說明係用以示範與解釋本發明之精神與原理,並且提供本發明之專利申請範圍更進一步之解釋。The above description of the content of the disclosure and the description of the following embodiments are used to demonstrate and explain the spirit and principle of the present invention, and to provide a further explanation of the patent application scope of the present invention.

以下在實施方式中詳細敘述本發明之詳細特徵以及優點,其內容足以使任何熟習相關技藝者了解本發明之技術內容並據以實施,且根據本說明書所揭露之內容、申請專利範圍及圖式,任何熟習相關技藝者可輕易地理解本發明相關之目的及優點。以下之實施例係進一步詳細說明本發明之觀點,但非以任何觀點限制本發明之範疇。The detailed features and advantages of the present invention are described in detail in the following embodiments, and the content is sufficient to enable anyone familiar with the relevant art to understand the technical content of the present invention and implement it accordingly, and according to the content disclosed in this specification, the scope of patent application and the drawings Anyone who is familiar with the relevant art can easily understand the related purpose and advantages of the present invention. The following examples further illustrate the viewpoints of the present invention in detail, but do not limit the scope of the present invention by any viewpoint.

請參考圖1,其係繪示本發明之一實施例的跨通路人工智慧對話式平台的架構圖100。所述的跨通路人工智慧對話式平台100,包括第一內部伺服器2、第二內部伺服器4及第三內部伺服器6。如圖1所示,第二內部伺服器4及第三內部伺服器6分別通訊連接至第一內部伺服器2。另外,第一內部伺服器2中的元件各自與客戶端裝置91、第一外部伺服器93、第二外部伺服器95及第三外部伺服器97通訊連接。Please refer to FIG. 1, which shows an architecture diagram 100 of a cross-channel artificial intelligence dialogue platform according to an embodiment of the present invention. The cross-channel artificial intelligence dialogue platform 100 includes a first internal server 2, a second internal server 4 and a third internal server 6. As shown in FIG. 1, the second internal server 4 and the third internal server 6 are respectively communicatively connected to the first internal server 2. In addition, the components in the first internal server 2 communicate with the client device 91, the first external server 93, the second external server 95, and the third external server 97, respectively.

實務上,第一內部伺服器2、第二內部伺服器4及第三內部伺服器6例如係金融機構機房內配置的刀鋒伺服器(Blade Servers)、機架伺服器(Rack Servers)或直立式伺服器(Pedestal Servers),本發明對於第一、第二及第三內部伺服器2、4及6的硬體類型不予限制。In practice, the first internal server 2, the second internal server 4, and the third internal server 6 are, for example, Blade Servers, Rack Servers, or vertical servers deployed in the computer room of financial institutions. Servers (Pedestal Servers), the present invention does not limit the hardware types of the first, second, and third internal servers 2, 4, and 6.

第一內部伺服器2、第二內部伺服器4及第三內部伺服器6各自具有記憶體以實現後文述及的各項功能。上述記憶體可以是例如隨機存取記憶體、唯讀記憶體或是快閃記憶體等。在一實施例中,第一內部伺服器2、第二內部伺服器4及第三內部伺服器6中更包括支援有線網路、無線網路、行動網路及/或無線通訊的通訊裝置。在一實施例中,第一內部伺服器2、第二內部伺服器4及第三內部伺服器6各自包括一處理電路,可執行後文述及的功能。處理電路例如係微控制器(microcontroller)、微處理器(microprocessor)、處理器(processor)、中央處理器(central processing unit,CPU)、數位訊號處理器(digital signal processor)、特殊應用積體電路(application specific integrated circuit,ASIC)、數位邏輯電路、現場可程式邏輯閘陣列(field programmable gate array,FPGA) 及/或其它具有運算處理功能的硬體元件,本發明對於處理電路之硬體類型不予限制。The first internal server 2, the second internal server 4, and the third internal server 6 each have a memory to implement various functions described later. The above-mentioned memory can be, for example, random access memory, read-only memory, or flash memory. In one embodiment, the first internal server 2, the second internal server 4, and the third internal server 6 further include communication devices supporting wired networks, wireless networks, mobile networks, and/or wireless communications. In one embodiment, the first internal server 2, the second internal server 4, and the third internal server 6 each include a processing circuit, which can perform the functions described below. Processing circuits such as microcontrollers, microprocessors, processors, central processing units (CPU), digital signal processors, special application integrated circuits (application specific integrated circuit, ASIC), digital logic circuit, field programmable gate array (FPGA) and/or other hardware components with arithmetic processing functions. The present invention has no effect on the hardware type of the processing circuit. To limit.

請繼續參考圖1。第一內部伺服器2包括語音輸入介面21、語音轉文字介面23、語意辨識介面25、文字轉語音介面27及一輸出介面29。語音輸入介面21通訊連接至客戶端裝置91。所述的客戶端裝置91例如是使用者安裝有行動銀行App(應用程式)的智慧型手機、平板電腦,亦可以是智慧分行櫃台的智能音箱或是智能機器人等,對於使用者而言,其係與客戶端裝置91進行交談。實務上,使用者所發出的聲音將由客戶端裝置91的收音器(例如麥克風)據以產生第一語音訊號,再由客戶端裝置91的通訊元件發送此第一語音訊號至第一內部伺服器2的語音輸入介面21。簡言之,當使用者需要進行金融相關操作時,可直接對客戶端裝置91說話,產生第一語音訊號,然後此第一語音訊號將被送至語音輸入介面21進行處理。Please continue to refer to Figure 1. The first internal server 2 includes a voice input interface 21, a voice-to-text interface 23, a semantic recognition interface 25, a text-to-speech interface 27, and an output interface 29. The voice input interface 21 is communicatively connected to the client device 91. The client device 91 is, for example, a smart phone or a tablet computer with a mobile banking App (application) installed by the user, or a smart speaker or a smart robot at the counter of a smart branch. For the user, The department talks with the client device 91. In practice, the user's voice will be generated by the microphone (such as a microphone) of the client device 91 to generate a first voice signal, and then the communication component of the client device 91 will send the first voice signal to the first internal server 2 voice input interface 21. In short, when the user needs to perform financial-related operations, he can directly speak to the client device 91 to generate a first voice signal, and then the first voice signal will be sent to the voice input interface 21 for processing.

請先參考圖1的第二內部伺服器4,其包括彼此電性連接的客戶音訊資料庫41及個資隱藏模組43。客戶音訊資料庫41儲存複數個音訊檔,該些音訊檔之內容分別對應至複數個個人資料。實務上,第二內部伺服器可更包括一動態資訊學習模組,該動態資訊學習模組例如預先以金融機構的人工客服錄音記錄作為訓練資料,並以機器學習的方式,從客服錄音記錄中自動辨別出屬於客戶個資的音訊片段,然後儲存這些音訊片段至客戶音訊資料庫41中。動態資訊學習模組更可以根據每次由語音輸入介面21獲得的第一語音訊號更新客戶音訊資料庫中的記錄,本發明對此不予限制。Please refer to the second internal server 4 of FIG. 1 first, which includes a customer audio database 41 and a personal information hiding module 43 that are electrically connected to each other. The customer audio database 41 stores a plurality of audio files, and the contents of the audio files correspond to the plurality of personal data respectively. In practice, the second internal server may further include a dynamic information learning module. The dynamic information learning module uses, for example, a financial institution’s manual customer service recordings as training data in advance, and uses machine learning methods from the customer service recordings. The audio fragments belonging to the customer's personal information are automatically identified, and then these audio fragments are stored in the customer audio database 41. The dynamic information learning module can also update the records in the client audio database according to the first voice signal obtained from the voice input interface 21 each time, which is not limited by the present invention.

請繼續參考圖1的第二內部伺服器4。個資隱藏模組43電性連接客戶音訊資料庫41且通訊連接至第一內部伺服器2的語音轉文字介面23。個資隱藏模組43用以將第一語音訊號分割為複數個語音片段,且當個資隱藏模組43判斷這些語音片段中之任一者符合客戶資訊資料庫41中儲存的複數個音訊檔中之任一者時,個資隱藏模組43從第一語音訊號刪除該語音片段對應之音頻資訊,並將被刪除的語音片段之音訊資料的第一語音訊號作為第二語音訊號回傳至第一內部伺服器2的語音轉文字介面23。在個資隱藏模組43進行比對時,例如可採用模糊比對演算法。另外,當比對到的使用者個資被分割到多個語音片段時,個資隱藏模組43將這些帶有使用者個資的語音片段重組以擷取出屬於使用者個資的完整的音訊資料。透過上述個資隱藏模組43的處理機制,可以將屬於使用者個人的隱私資料限制在金融機構的機房所設置的第一內部伺服器2及第二內部伺服器4中,而在後續的語音辨識時,不致於將使用者的個資外洩到網路上。Please continue to refer to the second internal server 4 in FIG. 1. The individual information hiding module 43 is electrically connected to the customer audio database 41 and communicatively connected to the voice-to-text interface 23 of the first internal server 2. The data hiding module 43 is used to divide the first voice signal into a plurality of voice segments, and when the data hiding module 43 determines that any one of these voice segments matches the plurality of audio files stored in the customer information database 41 In any of them, the personal information hiding module 43 deletes the audio information corresponding to the voice segment from the first voice signal, and returns the first voice signal of the audio data of the deleted voice segment as the second voice signal. The voice-to-text interface 23 of the first internal server 2. When the personal information hiding module 43 performs comparison, for example, a fuzzy comparison algorithm can be used. In addition, when the compared user profile is divided into multiple voice segments, the profile hiding module 43 reorganizes these voice segments with the user profile to extract the complete audio belonging to the user profile. data. Through the processing mechanism of the above-mentioned personal information hiding module 43, the private data belonging to the user can be restricted to the first internal server 2 and the second internal server 4 set in the computer room of the financial institution. During identification, the user's personal information will not be leaked to the Internet.

請參考圖1。第一內部伺服器2的語音轉文字介面23分別通訊連接第二內部伺服器4的個資隱藏模組43以及第一外部伺服器93,語意辨識介面25通訊連接至第二外部伺服器95。語音轉文字介面23根據第二語音訊號產生第一文字訊號,語意辨識介面25根據第一文字訊號產生一意圖訊號。換言之,語音轉文字介面23將包含使用者個資的聲音資料轉換為文字,語意辨識介面25在從文字中解讀出使用者的意圖。舉例來說,當第一文字訊號為:「我要轉帳一仟元」時,語意辨識模組25可從中獲知「使用者欲進行轉帳」,且「轉帳金額為一仟元」的這兩個意圖。實務上,語音轉文字介面23及語意辨識介面25例如係應用程式介面(Application Programming Interface,API),第一外部伺服器93例如係Google Cloud語音轉文字(speech-to-text,STT)外部伺服器。第二外部伺服器95例如係IBM華生(Watson)外部伺服器,可提供各項Watson認知運算服務,包括用以判斷客戶意圖的自然語言處理(NLP)的服務,可透過句型式(Pattern)機器學習機制,提高語意理解準確度。Please refer to Figure 1. The voice-to-text interface 23 of the first internal server 2 is respectively communicatively connected to the personal information hiding module 43 of the second internal server 4 and the first external server 93, and the semantic recognition interface 25 is communicatively connected to the second external server 95. The voice-to-text interface 23 generates a first text signal based on the second voice signal, and the semantic recognition interface 25 generates an intention signal based on the first text signal. In other words, the speech-to-text interface 23 converts the voice data containing the user's personal information into text, and the semantic recognition interface 25 interprets the user's intention from the text. For example, when the first text signal is: "I want to transfer 1,000 yuan", the semantic recognition module 25 can learn from it that "the user wants to transfer" and the two intentions of "the transfer amount is 1,000 yuan" . In practice, the speech-to-text interface 23 and the semantic recognition interface 25 are, for example, an application programming interface (API), and the first external server 93 is, for example, a Google Cloud speech-to-text (STT) external server. Device. The second external server 95 is, for example, an IBM Watson external server, which can provide various Watson cognitive computing services, including natural language processing (NLP) services to determine customer intentions. ) Machine learning mechanism to improve the accuracy of semantic understanding.

請參考圖1。第三內部伺服器6包括通訊連接至語意辨識介面25的對話模組61及應用模組63。對話模組61選擇性地根據意圖訊號產生回覆訊號。應用模組63可產生對應於意圖分析訊號的控制指令。實務上,第三內部伺服器6的對話模組61可透過機器學習模型,提供動態學習機制,因此可大幅提升維護效率。對話模組61更具有多輪情境對話設計。實務上,例如透過將待分析的人工客服記錄預先以Watson平臺訓練出對話分析模型,再將訓練出的對話分析模型儲存於對話模組61的資料庫中,藉此使對話模組61可提供互動情境式對話設計,並具備前後文(Context)連貫的功效。舉例來說,當使用者說出:「我要轉帳一仟元」時,對話模組61除了從語意辨識介面25獲得包含「使用者欲進行轉帳」且「轉帳金額為一仟元」這兩個意圖的意圖訊號之外,更能夠向使用者提出:『詢問使用者要轉帳對象的帳號』以及『詢問使用者要用來轉帳號的帳號』之類的回覆訊號,以便於在客戶端裝置91上運行的行動銀行App能收集足夠的資訊以完成後續的轉帳操作。此外,對話模組61更具有動態資訊收集機制,可以快速設定參數以快速部署,提高使用者體驗。當對話模組61無法辨識使用者的意圖訊號的時候,對話模組61可轉接至人工客服系統,由線上的客服人員回應使用者的問題。Please refer to Figure 1. The third internal server 6 includes a dialogue module 61 and an application module 63 that are communicatively connected to the semantic recognition interface 25. The dialogue module 61 selectively generates a reply signal based on the intention signal. The application module 63 can generate a control command corresponding to the intent analysis signal. In practice, the dialogue module 61 of the third internal server 6 can provide a dynamic learning mechanism through a machine learning model, thereby greatly improving maintenance efficiency. The dialogue module 61 further has a multi-round situational dialogue design. In practice, for example, by pre-training a dialogue analysis model on the Watson platform from the human customer service records to be analyzed, and then storing the trained dialogue analysis model in the database of the dialogue module 61, the dialogue module 61 can provide Interactive contextual dialogue design, and has the effect of context coherence. For example, when the user says: "I want to transfer a thousand yuan", the dialogue module 61 obtains from the semantic recognition interface 25 the two items including "the user wants to transfer" and "the transfer amount is 1,000 yuan". In addition to the intent signal of intent, it can also provide the user with reply signals such as "ask the user who wants to transfer the account" and "ask the user to use the account to transfer the account" to the client device. The mobile banking app running on 91 can collect enough information to complete subsequent transfer operations. In addition, the dialogue module 61 further has a dynamic information collection mechanism, which can quickly set parameters for rapid deployment and improve user experience. When the dialogue module 61 cannot identify the user's intention signal, the dialogue module 61 can be transferred to the manual customer service system, and the online customer service staff can respond to the user's question.

請繼續參考圖1。第一內部伺服器2之文字轉語音介面27分別通訊連接第三內部伺服器6的對話模組61以及第三外部伺服器97。文字轉語音介面27用以根據回覆訊號產生第三語音訊號;換言之,將對話模組61產生的回覆轉換為使用者可以理解的電腦語音,然後再由第一內部伺服器2的輸出介面29輸出此第三語音訊號到客戶端裝置91,以便於客戶端裝置91的揚聲器或喇叭播放此第三語音訊號供使用者聆聽。實務上,第三外部伺服器97例如係工研院文字轉語音Web服務外部伺服器,可提供文字轉語音(Text To Speech,TTS)的網路服務(web service),伺服器提供 SOAP(Simple Object Access Protocol)協議的網路服務,將所輸入的文字轉換為語音進行輸出。須注意的是,雖本發明的第一、第二及第三外部伺服器93、95及97在上述實施例中係第一內部伺服器2透過網際網路連線到各家的雲端服務,然而在另一實施例中,上述的外部伺服器亦可由金融機構自行購置具有文字語音互轉功能以及語意理解功能的伺服器並設立於本地端的機房,本發明並不限制第一至第三外部伺服器93~97必須連線至雲端方可達到上述的功能。Please continue to refer to Figure 1. The text-to-speech interface 27 of the first internal server 2 communicates with the dialogue module 61 of the third internal server 6 and the third external server 97 respectively. The text-to-speech interface 27 is used to generate a third voice signal based on the reply signal; in other words, the reply generated by the dialogue module 61 is converted into a computer voice that the user can understand, and then output by the output interface 29 of the first internal server 2 The third voice signal is sent to the client device 91 so that the speaker or speaker of the client device 91 can play the third voice signal for the user to listen to. In practice, the third external server 97 is, for example, a text-to-speech web service external server of the Industrial Technology Research Institute, which can provide text-to-speech (TTS) web services, and the server provides SOAP (Simple Object Access Protocol) protocol network service, the input text is converted into voice for output. It should be noted that although the first, second, and third external servers 93, 95, and 97 of the present invention are in the above-mentioned embodiment, the first internal server 2 is connected to various cloud services through the Internet. However, in another embodiment, the above-mentioned external server can also be purchased by the financial institution itself with a text-to-speech function and semantic understanding function and set up in the local computer room. The present invention does not limit the first to third external servers. Servers 93~97 must be connected to the cloud to achieve the above functions.

請繼續參考圖1。第一內部伺服器2之輸出介面29通訊連接至第三內部伺服器6的應用模組63以及客戶端裝置91。輸出介面29除用以輸出第三語音訊號外,更輸出由應用模組63產生的控制指令。所述的控制指令例如係控制行動銀行App完成轉帳操作的指令。Please continue to refer to Figure 1. The output interface 29 of the first internal server 2 is communicatively connected to the application module 63 of the third internal server 6 and the client device 91. In addition to outputting the third voice signal, the output interface 29 also outputs control commands generated by the application module 63. The control instruction is, for example, an instruction to control the mobile bank App to complete the transfer operation.

根據上述的跨通路人工智慧對話式平台100的內容,實務上可根據需要,將跨通路人工智慧對話式平台100通訊連接至使用者的智慧型手機或是智慧分行櫃台的智能音箱。藉此,使用者得以透過和客戶端裝置91對話的方式完成所欲進行的金融交易操作項目。According to the content of the above-mentioned cross-channel artificial intelligence dialogue platform 100, in practice, the cross-channel artificial intelligence dialogue platform 100 can be communicatively connected to a user's smartphone or a smart speaker at a smart branch counter as required. In this way, the user can complete the desired financial transaction operation item through a dialogue with the client device 91.

請一併參考圖1及圖2。圖2係繪示本發明一實施例的跨通路人工智慧對話式平台的運作方法,適用於前述的跨通路人工智慧對話式平台100。請參考步驟S11,語音輸入介面21接收第一語音訊號。詳言之,客戶端裝置91以有線或無線通訊方式傳送使用者的語音,再由第一內部伺服器2的語音輸入介面21接收。請參考步驟S21,第二內部伺服器4的個資隱藏模組43將第一語音訊號分割為複數個語音片段。請參考步驟S23,個資隱藏模組43比對語音片段與客戶音訊資料庫41中的音訊檔。請參考步驟S25,個資隱藏模組43判斷任一語音片段是否符合任一音訊檔。如果符合,則繼續執行步驟S27。否則回到步驟S23。請參考步驟S27,個資隱藏模組43回傳第二語音訊號S27,所述的第二語音訊號係係刪除該語音片段之音訊資料(例如代表使用者個資的音訊的波形訊號)的第一語音訊號。請參考步驟S13,第一內部伺服器2的語音轉文字介面23根據第二語音訊號產生第一文字訊號,具體而言係將不包含使用者個人資料的音訊資料轉換為文字資料。請參考步驟S15,第一內部伺服器2的語意辨識介面25根據第一文字訊號產生意圖訊號S15,例如將文字訊號透過提供雲端服務的語意理解伺服器分析以得到使用者的意圖。請參考步驟S31,第三內部伺服器6的對話模組61根據該意圖訊號產生回覆訊號,換言之,對話模組61可從回覆訊號中辨別出使用者的意圖以給予個人化的服務或是回應。實務上,若是客戶的問題不在對話模組61的回應範圍,對話模組61可轉接到人工客服系統進行後續的客戶服務。請參考步驟S17,第一內部伺服器2的文字轉語音介面27根據回覆訊號產生第三語音訊號S17,其係將系統給予使用者的回覆轉換為使用者可聽到的電腦語音。請參考步驟S33,應用模組63根據意圖訊號產生控制指令S33。該控制指令用以在客戶端裝置上進行對應於使用者語音的操作。請參考步驟S35,輸出介面29輸出第三語音訊號及控制指令至客戶端裝置91,例如將系統回覆播放給使用者聆聽,實現與使用者的對話互動以取得使用者想要執行的操作所需要的其他參數,或是執行控制指令以完成使用者想要的金融交易操作項目。Please refer to Figure 1 and Figure 2 together. FIG. 2 illustrates the operation method of the cross-channel artificial intelligence dialogue platform according to an embodiment of the present invention, which is suitable for the aforementioned cross-channel artificial intelligence dialogue platform 100. Please refer to step S11, the voice input interface 21 receives the first voice signal. In detail, the client device 91 transmits the user's voice by wired or wireless communication, and then is received by the voice input interface 21 of the first internal server 2. Please refer to step S21, the data hiding module 43 of the second internal server 4 divides the first voice signal into a plurality of voice segments. Please refer to step S23, the personal information hiding module 43 compares the voice fragment with the audio file in the customer audio database 41. Please refer to step S25, the information hiding module 43 determines whether any voice segment matches any audio file. If it does, continue to step S27. Otherwise, return to step S23. Please refer to step S27. The information hiding module 43 returns a second voice signal S27. The second voice signal is to delete the audio data of the voice segment (for example, the waveform signal representing the user's personal information). A voice signal. Please refer to step S13. The speech-to-text interface 23 of the first internal server 2 generates a first text signal according to the second speech signal, specifically, it converts audio data that does not contain the user's personal data into text data. Please refer to step S15. The semantic recognition interface 25 of the first internal server 2 generates an intention signal S15 based on the first text signal, for example, the text signal is analyzed by a semantic understanding server that provides cloud services to obtain the user's intention. Please refer to step S31. The dialogue module 61 of the third internal server 6 generates a reply signal according to the intention signal. In other words, the dialogue module 61 can identify the user’s intention from the reply signal to provide personalized service or response. . In practice, if the customer's question is not within the response range of the dialogue module 61, the dialogue module 61 can be transferred to the manual customer service system for subsequent customer service. Please refer to step S17. The text-to-speech interface 27 of the first internal server 2 generates a third voice signal S17 based on the reply signal, which converts the reply given to the user by the system into a computer voice that the user can hear. Please refer to step S33. The application module 63 generates a control command S33 according to the intention signal. The control command is used to perform an operation corresponding to the user's voice on the client device. Please refer to step S35, the output interface 29 outputs the third voice signal and the control command to the client device 91, for example, the system reply is played to the user for listening, so as to realize the dialogue interaction with the user to obtain the operation required by the user. Other parameters, or execute control commands to complete the financial transaction operation items the user wants.

綜合以上所述,本發明所揭露的跨通路人工智慧對話式平台藉由提供與使用者對話來完成金融交易操作的服務,使客戶感受到最佳的體驗與服務,並且可防止使用者的個人隱私資訊外洩到雲端,保護使用者個資安全。另外,藉由跨通路人工智慧對話式平台串接到行動銀行App、智慧分行櫃檯或是智慧個人理財服務,更可以減少金融機構額外聘雇與訓練可提供上述金融服務的人員所需耗費的人力與時間成本。In summary, the cross-channel artificial intelligence dialogue platform disclosed in the present invention provides services for completing financial transaction operations through dialogue with users, so that customers can feel the best experience and service, and can prevent users from personal Private information is leaked to the cloud to protect users' personal information. In addition, the cross-channel artificial intelligence dialogue platform is connected to mobile banking apps, smart branch counters, or smart personal financial services, which can reduce the manpower required by financial institutions to hire and train personnel who can provide the above financial services. And time cost.

雖然本發明以前述之實施例揭露如上,然其並非用以限定本發明。在不脫離本發明之精神和範圍內,所為之更動與潤飾,均屬本發明之專利保護範圍。關於本發明所界定之保護範圍請參考所附之申請專利範圍。Although the present invention is disclosed in the foregoing embodiments, it is not intended to limit the present invention. All changes and modifications made without departing from the spirit and scope of the present invention fall within the scope of patent protection of the present invention. For the scope of protection defined by the present invention, please refer to the attached patent scope.

100:跨通路人工智慧對話式平台 2:第一內部伺服器 21:語音輸入介面 23:語音轉文字介面 25:語意辨識介面 27:文字轉語音介面 29:輸出介面 4:第二內部伺服器 41:客戶音訊資料庫 43:個資隱藏模組 6:第三內部伺服器 61:對話模組 63:應用模組 91:客戶端裝置 93:第一外部伺服器 95:第二外部伺服器 97:第三外部伺服器 S11~S35:步驟100: Cross-channel artificial intelligence dialogue platform 2: The first internal server 21: Voice input interface 23: Voice-to-text interface 25: Semantic Recognition Interface 27: Text-to-speech interface 29: output interface 4: The second internal server 41: Customer Audio Database 43: Personal information hidden module 6: Third internal server 61: Dialogue Module 63: Application Module 91: client device 93: The first external server 95: second external server 97: Third external server S11~S35: steps

圖1係依據本發明一實施例的跨通路人工智慧對話式平台所繪示的架構圖。 圖2係依據本發明一實施例的跨通路人工智慧對話式平台的運作方法所繪示的流程圖。FIG. 1 is a diagram illustrating the structure of a cross-channel artificial intelligence dialogue platform according to an embodiment of the present invention. FIG. 2 is a flowchart of a method for operating a cross-channel artificial intelligence dialogue platform according to an embodiment of the present invention.

100:跨通路人工智慧對話式平台 100: Cross-channel artificial intelligence dialogue platform

2:第一內部伺服器 2: The first internal server

21:語音輸入介面 21: Voice input interface

23:語音轉文字介面 23: Voice-to-text interface

25:語意辨識介面 25: Semantic Recognition Interface

27:文字轉語音介面 27: Text-to-speech interface

29:輸出介面 29: output interface

4:第二內部伺服器 4: The second internal server

41:客戶音訊資料庫 41: Customer Audio Database

43:個資隱藏模組 43: Personal information hidden module

6:第三內部伺服器 6: Third internal server

61:對話模組 61: Dialogue Module

63:應用模組 63: Application Module

91:客戶端裝置 91: client device

93:第一外部伺服器 93: The first external server

95:第二外部伺服器 95: second external server

97:第三外部伺服器 97: Third external server

Claims (7)

一種跨通路人工智慧對話式平台,包括:一第一內部伺服器,包括:一語音輸入介面、一語音轉文字介面、一語意辨識介面、一文字轉語音介面及一輸出介面,其中該語音輸入介面用於接收一第一語音訊號;一第二內部伺服器,通訊連接該第一內部伺服器,包括:一客戶音訊資料庫,用以儲存複數個音訊檔,該些音訊檔之內容分別對應至複數個個人資料;及一個資隱藏模組,電性連接該客戶音訊資料庫,該個資隱藏模組用以將該第一語音訊號分割為複數個語音片段,且當該個資隱藏模組判斷該些語音片段中之任一者符合該些音訊檔中之任一者時,該個資隱藏模組從該第一語音訊號刪除該語音片段對應之音頻資訊,並將被刪除該語音片段之音訊資料的第一語音訊號作為一第二語音訊號回傳至該第一內部伺服器;其中該第一內部伺服器之該語音轉文字介面用以根據該第二語音訊號產生一第一文字訊號;該語意辨識介面用以根據該第一文字訊號產生一意圖訊號;以及一第三內部伺服器,通訊連接該第一內部伺服器,包括:一對話模組,用以選擇性地根據該意圖訊號產生一回覆訊號;及一應用模組,用以產生對應於該意圖分析訊號的一控制指令;其中該第一內部伺服器之該文字轉語音介面用以根據該回覆訊號產生一第三語音訊號; 該第一內部伺服器之該輸出介面用以輸出該第三語音訊號及該控制指令。A cross-channel artificial intelligence dialogue platform, including: a first internal server, including: a voice input interface, a voice-to-text interface, a semantic recognition interface, a text-to-speech interface, and an output interface, wherein the voice input interface Used to receive a first voice signal; a second internal server, communicatively connected to the first internal server, including: a client audio database for storing a plurality of audio files, the contents of the audio files correspond to A plurality of personal data; and a data hiding module, which is electrically connected to the customer audio database, the data hiding module is used to divide the first voice signal into a plurality of voice segments, and when the data hiding module When it is determined that any one of the voice fragments matches any one of the audio files, the information hiding module deletes the audio information corresponding to the voice fragment from the first voice signal, and will delete the voice fragment The first voice signal of the audio data is returned to the first internal server as a second voice signal; wherein the voice-to-text interface of the first internal server is used to generate a first text signal based on the second voice signal ; The semantic recognition interface is used to generate an intention signal based on the first text signal; and a third internal server, which is communicatively connected to the first internal server, includes: a dialogue module for selectively based on the intention signal Generate a reply signal; and an application module for generating a control command corresponding to the intent analysis signal; wherein the text-to-speech interface of the first internal server is used for generating a third voice signal based on the reply signal ; The output interface of the first internal server is used to output the third voice signal and the control command. 如請求項1所述的跨通路人工智慧對話式平台,其中該語音轉文字介面係通訊連接至Google Cloud語音轉文字外部伺服器,該語意辨識介面係通訊連接至IBM華生外部伺服器,且該文字轉語音介面係通訊連接至工研院文字轉語音Web服務外部伺服器。The cross-channel artificial intelligence dialogue platform according to claim 1, wherein the speech-to-text interface is connected to the Google Cloud speech-to-text external server, and the semantic recognition interface is connected to the IBM Watson external server, and The text-to-speech interface is a communication link to the external server of the ITRI text-to-speech web service. 如請求項1所述的跨通路人工智慧對話式平台,其中該對話模組更通訊連接至一線上客服系統,當該對話模組無法辨別該意圖訊號時,該對話模組將該意圖訊號轉發至該線上客服系統。The cross-channel artificial intelligence dialogue platform according to claim 1, wherein the dialogue module is further communicatively connected to an online customer service system, and when the dialogue module cannot recognize the intent signal, the dialogue module forwards the intent signal To this online customer service system. 如請求項1所述的跨通路人工智慧對話式平台,其中該第二內部伺服器更包括一動態資訊學習模組,該動態資訊學習模組用以根據一客服錄音記錄並以機器學習方式從該客服錄音記錄中取出關聯於該些個人資料的該些音訊檔,以及儲存這些音訊檔至該客戶音訊資料庫。The cross-channel artificial intelligence dialogue platform according to claim 1, wherein the second internal server further includes a dynamic information learning module, and the dynamic information learning module is used for learning from a customer service recording record and using machine learning. The audio files associated with the personal data are retrieved from the customer service recording record, and the audio files are stored in the customer audio database. 如請求項1所述的跨通路人工智慧對話式平台,其中當該個資隱藏模組判斷該第一語音訊號中具有複數個語音片段分別符合該客戶音訊資料庫中的複數個音訊檔時,該個資隱藏模組更用以重組該些語音片段以擷取該些個人資料中的一完整個人資料。The cross-channel artificial intelligence dialogue platform according to claim 1, wherein when the information hiding module determines that there are a plurality of voice fragments in the first voice signal that respectively match the plurality of audio files in the customer audio database, The information hiding module is further used to reorganize the voice fragments to retrieve a complete personal data of the personal data. 一種跨通路人工智慧對話式平台的運作方法,包括:以一第一內部伺服器之一語音輸入介面接收一第一語音訊號;以該第二內部伺服器之一個資隱藏模組將該第一語音訊號分割為複數個語音片段,其中,該個資隱藏模組電性連接該客戶音訊資料庫,且當該個資隱藏模組判斷該些語音片段中之任一者符合該第二內部伺服器之一客戶音訊資料庫所儲存之複數個音訊檔中之任一者時,以該個資隱藏模組從該第一語音訊號刪除該語音片段對應之音頻資訊,其中該第二內部伺服器通訊連接該第一內部伺服器;以該個資隱藏模組回傳一第二語音訊號至該第一內部伺服器之一語音轉文字介面,其中該第二語音訊號係刪除該語音片段之音訊資料的第一語音訊號;以該語音轉文字介面根據該第二語音訊號產生一第一文字訊號;以該第一內部伺服器之一語意辨識介面根據該第一文字訊號產生一意圖訊號;以一第三內部伺服器之一對話模組根據該意圖訊號產生一回覆訊號,其中該第三內部伺服器通訊連接該第一內部伺服器;以該第一內部伺服器之一文字轉語音介面根據該回覆訊號產生一第三語音訊號;以該三伺服器之一應用模組根據該意圖訊號產生一控制指令;以及以該第一內部伺服器之一輸出介面輸出該第三語音訊號及該控制指令。An operating method of a cross-channel artificial intelligence dialogue platform includes: receiving a first voice signal through a voice input interface of a first internal server; using a resource hiding module of the second internal server to transfer the first voice signal The voice signal is divided into a plurality of voice segments, wherein the information hiding module is electrically connected to the customer audio database, and when the information hiding module determines that any one of the voice segments meets the second internal servo When any one of a plurality of audio files stored in a client audio database of one of the devices, the audio information corresponding to the voice segment is deleted from the first voice signal by the information hiding module, wherein the second internal server Communicatively connect to the first internal server; use the data hiding module to return a second voice signal to a voice-to-text interface of the first internal server, wherein the second voice signal deletes the audio of the voice fragment The first voice signal of the data; the voice-to-text interface generates a first text signal based on the second voice signal; the semantic recognition interface of the first internal server generates an intention signal based on the first text signal; The dialogue module of one of the three internal servers generates a reply signal according to the intention signal, wherein the third internal server is communicatively connected to the first internal server; a text-to-speech interface of the first internal server is used according to the reply signal A third voice signal is generated; an application module of the three servers generates a control command according to the intention signal; and an output interface of the first internal server is used to output the third voice signal and the control command. 如請求項6所述的跨通路人工智慧對話式平台的運作方法,其中在接收該第一語音訊號之前,更包括:以該第二內部伺服器之一動態資訊學習模組以機器學習方式從一客服錄音記錄中取得關聯於該些個人資料的該些音訊檔;以及以該動態資訊學習模組儲存這些音訊檔至該客戶音訊資料庫。The operating method of the cross-channel artificial intelligence dialogue platform according to claim 6, wherein before receiving the first voice signal, it further includes: using a dynamic information learning module of the second internal server to learn from Obtain the audio files associated with the personal data from a customer service recording record; and use the dynamic information learning module to store the audio files in the customer audio database.
TW108104841A 2019-02-13 2019-02-13 Cross-channel artificial intelligence dialogue platform and operation method thereof TWI739067B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW108104841A TWI739067B (en) 2019-02-13 2019-02-13 Cross-channel artificial intelligence dialogue platform and operation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW108104841A TWI739067B (en) 2019-02-13 2019-02-13 Cross-channel artificial intelligence dialogue platform and operation method thereof

Publications (2)

Publication Number Publication Date
TW202030626A true TW202030626A (en) 2020-08-16
TWI739067B TWI739067B (en) 2021-09-11

Family

ID=73002800

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108104841A TWI739067B (en) 2019-02-13 2019-02-13 Cross-channel artificial intelligence dialogue platform and operation method thereof

Country Status (1)

Country Link
TW (1) TWI739067B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708858A (en) * 2012-06-27 2012-10-03 厦门思德电子科技有限公司 Voice bank realization voice recognition system and method based on organizing way
CN104700836B (en) * 2013-12-10 2019-01-29 阿里巴巴集团控股有限公司 A kind of audio recognition method and system
CN106155992A (en) * 2015-03-24 2016-11-23 中兴通讯股份有限公司 Voice and/or the filter method of character information, device and terminal
CN106128453A (en) * 2016-08-30 2016-11-16 深圳市容大数字技术有限公司 The Intelligent Recognition voice auto-answer method of a kind of robot and robot
CN107180631A (en) * 2017-05-24 2017-09-19 刘平舟 Voice interaction method and device
TWM578858U (en) * 2019-02-13 2019-06-01 華南商業銀行股份有限公司 Cross-channel artificial intelligence dialogue platform

Also Published As

Publication number Publication date
TWI739067B (en) 2021-09-11

Similar Documents

Publication Publication Date Title
Warden Speech commands: A dataset for limited-vocabulary speech recognition
US10657966B2 (en) Better resolution when referencing to concepts
JP6876752B2 (en) Response method and equipment
JP6604836B2 (en) Dialog text summarization apparatus and method
WO2022095380A1 (en) Ai-based virtual interaction model generation method and apparatus, computer device and storage medium
WO2020253509A1 (en) Situation- and emotion-oriented chinese speech synthesis method, device, and storage medium
US20200126566A1 (en) Method and apparatus for voice interaction
US20200082214A1 (en) Method and apparatus for facilitating training of agents
US20220076674A1 (en) Cross-device voiceprint recognition
CN107818798A (en) Customer service quality evaluating method, device, equipment and storage medium
US11514919B1 (en) Voice synthesis for virtual agents
US10970898B2 (en) Virtual-reality based interactive audience simulation
US20180315094A1 (en) Method and system for targeted advertising based on natural language analytics
KR20160030168A (en) Voice recognition method, apparatus, and system
US10678855B2 (en) Generating descriptive text contemporaneous to visual media
US20180373705A1 (en) User device and computer program for translating recognized speech
US11481442B2 (en) Leveraging intent resolvers to determine multiple intents
CN110933225B (en) Call information acquisition method and device, storage medium and electronic equipment
CN110428825A (en) Ignore the trigger word in streaming media contents
US20220201121A1 (en) System, method and apparatus for conversational guidance
US20230169272A1 (en) Communication framework for automated content generation and adaptive delivery
KR102312993B1 (en) Method and apparatus for implementing interactive message using artificial neural network
TWM578858U (en) Cross-channel artificial intelligence dialogue platform
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
KR102226427B1 (en) Apparatus for determining title of user, system including the same, terminal and method for the same