TWI778477B

TWI778477B - Interaction methods, apparatuses thereof, electronic devices and computer readable storage media

Info

Publication number: TWI778477B
Application number: TW109145727A
Authority: TW
Inventors: 張子隆; 孫林; 路露
Original assignee: 大陸商北京市商湯科技開發有限公司
Priority date: 2020-02-27
Filing date: 2020-12-23
Publication date: 2022-09-21
Also published as: WO2021169431A1; JP2022524944A; KR20210110620A; SG11202109192QA; TW202132967A; CN111541908A

Abstract

The present disclosure provides interaction methods, apparatuses thereof, electronic devices and computer readable storage media. A method includes: receiving a first message from a client; obtaining, based on an instruction content included in the first message, driving data matching the instruction content; controlling a display interface of the client to play a response animation of an interactive object by using the driving data.

Description

Interactive method, device, electronic device and storage medium

本公開涉及計算機技術領域，具體涉及一種互動方法、裝置、電子設備以及儲存媒體。The present disclosure relates to the field of computer technologies, and in particular, to an interaction method, an apparatus, an electronic device, and a storage medium.

隨著互聯網的快速發展，直播成為重要的資訊傳播方式。由於不同觀眾觀看網路直播的時間段不同，真人主播無法24小時進行直播以滿足不同觀眾的需求。利用數位人進行直播可以解決這一問題，然而，數位人主播與觀眾之間的互動技術有待研究和開發。With the rapid development of the Internet, live broadcasting has become an important information dissemination method. Due to the different time periods for different audiences to watch online live broadcasts, live broadcasters cannot live broadcast 24 hours a day to meet the needs of different audiences. This problem can be solved by using digital people for live broadcast, however, the interactive technology between the digital human anchor and the audience needs to be researched and developed.

根據本公開的一方面，提供一種互動方法，所述方法包括：接收來自客戶端的第一消息；基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據；利用所述驅動數據，控制所述客戶端的顯示介面播放所述互動物件的回應動畫。According to an aspect of the present disclosure, an interaction method is provided, the method comprising: receiving a first message from a client; acquiring driving data matching the indication content based on indication content included in the first message; using the The driving data controls the display interface of the client to play the response animation of the interactive object.

結合本公開提供的任一實施方式，所述基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據，包括：獲取針對所述指示內容的應答內容，所述應答內容包括應答文本；基於所述應答文本中所包含的至少一個目標文本，獲取與所述目標文本匹配的互動物件的設定動作的控制參數。With reference to any of the implementation manners provided in the present disclosure, the acquiring driving data matching the indication content based on the indication content included in the first message includes: acquiring response content for the indication content, the response content A response text is included; based on at least one target text included in the response text, the control parameters of the set action of the interactive object matching the target text are obtained.

結合本公開提供的任一實施方式，所述基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據，包括：獲取針對所述指示內容的應答內容，所述應答內容包括音素序列；獲取與所述音素序列匹配的所述互動物件的控制參數。With reference to any of the implementation manners provided in the present disclosure, the acquiring driving data matching the indication content based on the indication content included in the first message includes: acquiring response content for the indication content, the response content Including a phoneme sequence; obtaining the control parameters of the interactive object matching the phoneme sequence.

結合本公開提供的任一實施方式，所述互動物件的控制參數包括至少一個局部區域的姿態控制向量，所述獲取與所述音素序列匹配的互動物件的控制參數，包括：對所述音素序列進行特徵編碼，獲得所述音素序列對應的第一編碼序列；根據所述第一編碼序列，獲取至少一個音素對應的特徵編碼；獲取所述特徵編碼對應的所述互動物件的至少一個局部區域的姿態控制向量。With reference to any of the embodiments provided in the present disclosure, the control parameters of the interactive object include a gesture control vector of at least one local area, and the acquiring the control parameters of the interactive object matching the phoneme sequence includes: controlling the phoneme sequence Perform feature coding to obtain a first coding sequence corresponding to the phoneme sequence; obtain a feature coding corresponding to at least one phoneme according to the first coding sequence; obtain at least one local area of the interactive object corresponding to the feature coding Attitude control vector.

結合本公開提供的任一實施方式，所述方法還包括：向所述客戶端發送包括所述應答內容的指示資訊，以使所述客戶端基於所述指示資訊顯示所述應答內容。With reference to any implementation manner provided in the present disclosure, the method further includes: sending instruction information including the response content to the client, so that the client displays the response content based on the instruction information.

結合本公開提供的任一實施方式，所述利用所述驅動數據，控制所述客戶端在顯示介面中播放所述互動物件的回應動畫，包括：將所述互動物件的驅動數據發送至所述客戶端，以使所述客戶端根據驅動數據生成回應動畫；控制所述客戶端在顯示介面中播放所述回應動畫；或者，基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並向所述客戶端發送所述回應動畫。With reference to any implementation manner provided in the present disclosure, the controlling the client to play the response animation of the interactive object on the display interface by using the driving data includes: sending the driving data of the interactive object to the the client, so that the client generates a response animation according to the driving data; controls the client to play the response animation in the display interface; or, based on the driving data, adjusts the virtual model parameters of the interactive object; With the adjusted virtual model parameters, a rendering engine is used to generate a response animation of the interactive object, and the response animation is sent to the client.

根據本公開的一方面，提供一種互動方法，所述方法包括：響應於來自客戶端的使用者輸入操作，向伺服器發送包括指示內容的第一消息；基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面播放所述互動物件的回應動畫。According to an aspect of the present disclosure, an interaction method is provided, the method comprising: in response to a user input operation from a client, sending a first message including indicating content to a server; In response to the second message, the response animation of the interactive object is played on the display interface of the client.

結合本公開提供的任一實施方式，所述指示內容包括文本內容；所述方法還包括：在所述客戶端中顯示所述文本內容，和/或，播放所述文本內容對應的音訊文件。With reference to any implementation manner provided in the present disclosure, the indication content includes text content; the method further includes: displaying the text content in the client, and/or playing an audio file corresponding to the text content.

結合本公開提供的任一實施方式，所述在所述客戶端中顯示所述文本內容，包括：生成所述文本內容的彈幕資訊；在所述客戶端的顯示介面中顯示所述彈幕資訊。With reference to any implementation manner provided by the present disclosure, the displaying the text content in the client includes: generating bullet screen information of the text content; displaying the bullet screen information in a display interface of the client .

結合本公開提供的任一實施方式，所述第二消息中包括針對所述指示內容的應答文本；所述方法還包括：在所述客戶端的顯示介面中顯示所述應答文本，和/或，確定並播放所述應答文本對應的音訊文件。With reference to any implementation manner provided in the present disclosure, the second message includes a response text for the indicated content; the method further includes: displaying the response text in a display interface of the client, and/or, Determine and play the audio file corresponding to the response text.

結合本公開提供的任一實施方式，所述第二消息中包括所述互動物件的驅動數據；所述基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面中播放所述互動物件的回應動畫，包括：基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並顯示在所述客戶端的顯示介面中；其中，所述驅動數據包括與所述應答文本對應的音素序列匹配的用於所述互動物件的控制參數，和/或，與所述應答文本中所包含的至少一個目標文本匹配的用於所述互動物件的設定動作的控制參數。With reference to any implementation manner provided in the present disclosure, the second message includes the driving data of the interactive object; the second message based on the response of the server to the first message is displayed on the client Playing the response animation of the interactive object in the interface includes: adjusting the virtual model parameters of the interactive object based on the driving data; based on the adjusted virtual model parameters, using a rendering engine to generate the response animation of the interactive object, and Displayed in the display interface of the client; wherein the driving data includes control parameters for the interactive object that match the phoneme sequence corresponding to the response text, and/or, are the same as those contained in the response text The at least one target text matches the control parameter for the set action of the interactive object.

結合本公開提供的任一實施方式，所述第二消息包括所述互動物件對所述指示內容做出的回應動畫。With reference to any implementation manner provided by the present disclosure, the second message includes a response animation made by the interactive object to the indicated content.

結合本公開提供的任一實施方式，所述使用者的輸入操作包括，所述使用者跟隨所述顯示介面中顯示的肢體操作畫面做出相應的人體姿態；響應於來自客戶端的使用者輸入操作，獲取包括所述人體姿態的使用者行為圖像；識別所述使用者行為圖像中的人體姿態資訊；基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應。With reference to any of the embodiments provided in the present disclosure, the user's input operation includes: the user follows the limb operation picture displayed in the display interface to make a corresponding human body gesture; in response to the user's input operation from the client , acquiring a user behavior image including the human body posture; identifying the human body posture information in the user behavior image; and driving the interactive object displayed on the display interface to respond based on the human body posture information.

結合本公開提供的任一實施方式，所述基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應，包括：確定所述人體姿態資訊與所述肢體操作畫面中的人體姿態的匹配度；基於所述匹配度，驅動所述顯示介面顯示的互動物件進行回應。With reference to any of the embodiments provided in the present disclosure, the driving the interactive object displayed on the display interface to respond based on the human body posture information includes: determining the relationship between the human body posture information and the human body posture in the limb operation screen. Matching degree; based on the matching degree, the interactive objects displayed on the display interface are driven to respond.

結合本公開提供的任一實施方式，所述基於所述匹配度，驅動所述互動物件進行回應，包括：在所述匹配度達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第一回應，其中所述第一回應包括顯示姿態合格的肢體動作和/或語音提示；以及顯示下一個肢體操作畫面；在所述匹配度未達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第二回應，其中所述第二回應包括顯示姿態未合格的肢體動作和/或語音提示；以及保持顯示當前的肢體操作畫面。With reference to any of the implementation manners provided in the present disclosure, the driving the interactive object to respond based on the matching degree includes: instructing the interactive object displayed on the display interface to respond when the matching degree reaches a set condition A first response is issued, wherein the first response includes a body movement and/or voice prompt showing qualified posture; and displaying the next body operation screen; in the case that the matching degree does not reach the set condition, instruct the display interface The displayed interactive object makes a second response, wherein the second response includes displaying a body movement and/or a voice prompt of an unqualified posture; and keeping the current body operation screen displayed.

根據本公開的一方面，提出一種互動裝置，所述裝置包括：接收單元，用於接收來自客戶端的第一消息；獲取單元，用於基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的；驅動單元，用於利用所述驅動數據，控制所述客戶端的顯示介面中播放所述互動物件的回應動畫。According to an aspect of the present disclosure, an interactive device is provided, the device includes: a receiving unit, configured to receive a first message from a client; indicating that the content matches; a driving unit, configured to use the driving data to control the display interface of the client to play the response animation of the interactive object.

結合本公開提供的任一實施方式，所述獲取單元具體用於：獲取針對所述指示內容的應答內容，所述應答內容包括應答文本；基於所述應答文本中所包含的至少一個目標文本，獲取與所述目標文本匹配的互動物件的設定動作的控制參數。With reference to any of the embodiments provided in the present disclosure, the obtaining unit is specifically configured to: obtain response content for the indication content, where the response content includes a response text; based on at least one target text included in the response text, Obtain the control parameters of the set action of the interactive object matching the target text.

結合本公開提供的任一實施方式，所述獲取單元用於，基於所述應答文本中所獲取針對所述指示內容的應答內容，所述應答內容包括音素序列；獲取與所述音素序列匹配的所述互動物件的控制參數。With reference to any of the implementation manners provided in the present disclosure, the obtaining unit is configured to, based on the response content for the indication content obtained in the response text, the response content including a phoneme sequence; obtain a phoneme sequence matching the phoneme sequence. Control parameters of the interactive object.

結合本公開提供的任一實施方式，所述互動物件的控制參數包括至少一個局部區域的姿態控制向量，所述獲取單元獲取與所述音素序列匹配的互動物件的第二控制參數時，用於：對所述音素序列進行特徵編碼，獲得所述音素序列對應的第一編碼序列；根據所述第一編碼序列，獲取至少一個音素對應的特徵編碼；獲取所述特徵編碼對應的所述互動物件的至少一個局部區域的姿態控制向量。With reference to any of the embodiments provided in the present disclosure, the control parameter of the interactive object includes a gesture control vector of at least one local area, and when the acquiring unit acquires the second control parameter of the interactive object matching the phoneme sequence, it is used for : perform feature coding on the phoneme sequence to obtain a first coding sequence corresponding to the phoneme sequence; obtain a feature coding corresponding to at least one phoneme according to the first coding sequence; obtain the interactive object corresponding to the feature coding The pose control vector of at least one local region of .

結合本公開提供的任一實施方式，所述裝置還包括發送單元，用於向所述客戶端發送包括針對所述指示內容的應答內容的指示資訊，以使所述客戶端基於所述指示資訊顯示所述應答內容。With reference to any of the implementation manners provided in the present disclosure, the apparatus further includes a sending unit, configured to send instruction information including a response content to the instruction content to the client, so that the client can base on the instruction information The content of the answer is displayed.

結合本公開提供的任一實施方式，所述驅動單元用於：將所述互動物件的驅動數據發送至所述客戶端，以使所述客戶端根據驅動數據生成回應動畫；控制所述客戶端在顯示介面中播放所述回應動畫；或者，基於所述驅動數據，調整所述互動物件的二維或三維虛擬模型參數；基於調整後的二維或三維虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並向所述客戶端發送所述回應動畫。With reference to any implementation manner provided in the present disclosure, the driving unit is configured to: send the driving data of the interactive object to the client, so that the client generates a response animation according to the driving data; control the client Play the response animation in the display interface; or, based on the driving data, adjust the two-dimensional or three-dimensional virtual model parameters of the interactive object; based on the adjusted two-dimensional or three-dimensional virtual model parameters, use a rendering engine to generate the The response animation of the interactive object, and the response animation is sent to the client.

根據本公開的一方面，提出一種互動裝置，所述裝置包括：發送單元，用於響應於來自客戶端的使用者輸入操作，向伺服器發送包括指示內容的第一消息；播放單元，用於基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面中播放所述互動物件的回應動畫。According to an aspect of the present disclosure, an interactive device is provided, the device includes: a sending unit, configured to send a first message including indicating content to a server in response to a user input operation from a client; a playing unit, configured to based on The second message that the server responds to the first message plays the response animation of the interactive object in the display interface of the client.

結合本公開提供的任一實施方式，所述指示內容包括文本內容；所述裝置還包括第一顯示單元，用於在所述客戶端的顯示介面中顯示所述文本內容，和/或，確定並播放所述文本內容對應的音訊文件。With reference to any implementation manner provided in the present disclosure, the indication content includes text content; the apparatus further includes a first display unit, configured to display the text content in the display interface of the client, and/or, determine and Play the audio file corresponding to the text content.

結合本公開提供的任一實施方式，所述第一顯示單元在用於在所述客戶端中顯示所述文本內容時，具體用於：生成所述文本內容的彈幕資訊；在所述客戶端的顯示介面中顯示所述彈幕資訊。With reference to any of the implementation manners provided in the present disclosure, when the first display unit is used to display the text content in the client, it is specifically configured to: generate bullet screen information of the text content; The barrage information is displayed on the display interface of the terminal.

結合本公開提供的任一實施方式，所述第二消息中包括針對所述指示內容的應答文本；所述裝置還包括第二顯示單元，用於在所述客戶端的顯示介面中顯示所述應答文本，和/或，確定並播放所述應答文本對應的音訊文件。With reference to any implementation manner provided in the present disclosure, the second message includes a response text to the indicated content; the apparatus further includes a second display unit, configured to display the response in a display interface of the client text, and/or, determine and play the audio file corresponding to the response text.

結合本公開提供的任一實施方式，所述第二消息中包括所述互動物件的驅動數據；所述播放單元用於：基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並顯示在所述客戶端的顯示介面中；其中，所述驅動數據包括與針對所述指示內容的應答文本對應的音素序列匹配的用於所述互動物件的控制參數，和/或，與所述應答文本中所包含的至少一個目標文本匹配的所述互動物件的設定動作的控制參數。With reference to any implementation manner provided in the present disclosure, the second message includes driving data of the interactive object; the playing unit is configured to: adjust the virtual model parameters of the interactive object based on the driving data; based on the adjustment After the virtual model parameters, the rendering engine is used to generate the response animation of the interactive object, and display it in the display interface of the client; wherein, the driving data includes matching with the phoneme sequence corresponding to the response text for the indicated content The control parameters for the interactive object, and/or the control parameters for the setting action of the interactive object that match at least one target text contained in the response text.

結合本公開提供的任一實施方式，所述使用者的輸入操作包括，所述使用者跟隨所述顯示介面中顯示的肢體操作畫面做出相應的人體姿態；所述生成單元用於：獲取包括所述人體姿態的使用者行為圖像；識別所述使用者行為圖像中的人體姿態資訊，基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應。With reference to any of the embodiments provided in the present disclosure, the user's input operation includes that the user follows the limb operation picture displayed in the display interface to make a corresponding human body gesture; the generating unit is configured to: obtain the information including: The user behavior image of the human body posture; recognize the human body posture information in the user behavior image, and drive the interactive object displayed on the display interface to respond based on the human body posture information.

結合本公開提供的任一實施方式，所述生成單元具體用於：確定所述人體姿態資訊與所述肢體操作畫面中的人體姿態的匹配度；基於所述匹配度，驅動所述顯示介面顯示的互動物件進行回應。With reference to any of the embodiments provided in the present disclosure, the generating unit is specifically configured to: determine the degree of matching between the human body posture information and the human posture in the limb operation screen; and, based on the matching degree, drive the display interface to display The interactive object responds.

結合本公開提供的任一實施方式，所述生成單元具體用於：在所述匹配度達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第一回應，其中所述第一回應包括顯示姿態合格的肢體動作和/或語音提示；以及顯示下一個肢體操作畫面；在所述匹配度未達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第二回應，其中所述第二回應包括顯示姿態未合格的肢體動作和/或語音提示；以及保持顯示當前的肢體操作畫面。With reference to any implementation manner provided in the present disclosure, the generating unit is specifically configured to: instruct the interactive object displayed on the display interface to make a first response when the matching degree reaches a set condition, wherein the first response The response includes displaying body movements and/or voice prompts with qualified posture; and displaying the next body operation screen; in the case that the matching degree does not meet the set condition, instructing the interactive object displayed on the display interface to make a second response, Wherein, the second response includes displaying body movements and/or voice prompts of unqualified postures; and keeping the current body operation screen displayed.

根據本公開的一方面，提出一種電子設備，所述設備包括記憶體、處理器，所述記憶體用於儲存在處理器上可運行的計算機指令，所述處理器用於在執行所述計算機指令時實現本公開任一實施方式所提出的互動方法。According to an aspect of the present disclosure, an electronic device is provided, the device includes a memory and a processor, where the memory is used for storing computer instructions executable on the processor, and the processor is used for executing the computer instructions The interaction method proposed by any of the embodiments of the present disclosure is implemented at the same time.

根據本公開的一方面，提出一種計算機可讀儲存媒體，其上儲存有計算機程式，所述程式被處理器執行時實現本公開任一實施方式所提出的互動方法。According to an aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, and when the program is executed by a processor, the interactive method provided by any embodiment of the present disclosure is implemented.

這裡將詳細地對範例性實施例進行說明，其範例表示在附圖中。下面的描述涉及附圖時，除非另有表示，不同附圖中的相同數位表示相同或相似的要素。以下範例性實施例中所描述的實施方式並不代表與本公開相一致的所有實施方式。相反，它們僅是與如所附請求項中所詳述的、本公開的一些方面相一致的裝置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the appended claims.

本文中術語“和/或”，僅僅是一種描述關聯物件的關聯關係，表示可以存在三種關係，例如，A和/或B，可以表示：單獨存在A，同時存在A和B，單獨存在B這三種情況。另外，本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合，例如，包括A、B、C中的至少一種，可以表示包括從A、B和C構成的集合中選擇的任意一個或多個元素。The term "and/or" in this article is only a relationship to describe related objects, which means that there can be three relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. three conditions. In addition, the term "at least one" herein refers to any combination of any one of a plurality or at least two of a plurality, for example, including at least one of A, B, and C, and may mean including those composed of A, B, and C. Any one or more elements selected in the collection.

利用數位人做主播，可以在任意時段進行直播，並且可以實現24小時不間斷直播，滿足了不同觀眾對於觀看直播的時間的不同需求。數位人作為直播過程中使用者的互動物件，如何對於使用者所提出的問題及時地進行反饋，以及如何與使用者進行生動、自然的互動，是亟需解決的問題。Using several people as anchors, live broadcasts can be performed at any time period, and 24-hour uninterrupted live broadcasts can be achieved, meeting the different needs of different audiences for the time to watch live broadcasts. As the interactive objects of the users during the live broadcast, how to give timely feedback to the questions raised by the users and how to interact with the users in a vivid and natural way are urgent problems to be solved.

有鑑於此，本公開提出供了一種互動方案，所述互動方案可應用於網路直播等任何涉及與虛擬的互動物件進行互動的場景。In view of this, the present disclosure provides an interactive solution that can be applied to any scenario involving interaction with virtual interactive objects, such as webcasting.

本公開實施例所提出的互動方法可應用於終端設備或者伺服器，終端設備例如可以是安裝有客戶端的電子設備，如手機、平板電腦等，本公開對於終端設備的形式並不限定。客戶端例如為視訊直播客戶端，包括直播視訊客戶端、體感互動客戶端等等。伺服器可以為任一能夠提供互動物件的處理能力的伺服器。The interaction method proposed in the embodiments of the present disclosure may be applied to a terminal device or a server. The terminal device may be, for example, an electronic device with a client installed, such as a mobile phone or a tablet computer. The present disclosure does not limit the form of the terminal device. The client is, for example, a live video client, including a live video client, a somatosensory interactive client, and the like. The server can be any server that can provide processing capabilities for interactive objects.

互動物件可以是任意一種能夠與使用者進行互動的互動物件，其可以是虛擬人物，還可以是虛擬動物、虛擬物品、卡通形象等等其他能夠實現互動功能的虛擬形象，互動物件可以基於二維虛擬模型來構建，也可以基於三維虛擬模型來構建，互動物件通過對二維或三維虛擬模型進行渲染得到。所述使用者可以是真人，也可以是機器人，還可以是其他智能設備。所述互動物件和所述使用者之間的互動方式可以是主動互動方式，也可以是被動互動方式。Interactive objects can be any kind of interactive objects that can interact with users, which can be virtual characters, virtual animals, virtual objects, cartoon images, and other virtual images that can realize interactive functions. Interactive objects can be based on two-dimensional It can also be constructed based on a 3D virtual model, and interactive objects are obtained by rendering a 2D or 3D virtual model. The user may be a real person, a robot, or other smart devices. The interaction mode between the interactive object and the user may be an active interaction mode or a passive interaction mode.

範例性的，在視訊直播場景下，客戶端的顯示介面中可顯示互動物件的動畫，使用者可以在終端設備的客戶端中執行輸入操作，比如輸入文本、輸入語音、動作觸發、按鍵觸發等操作，來實現與互動物件的互動。Exemplarily, in a live video broadcast scenario, the display interface of the client can display animations of interactive objects, and the user can perform input operations in the client of the terminal device, such as inputting text, inputting voice, triggering actions, triggering buttons, etc. , to interact with interactive objects.

圖1繪示根據本公開至少一個實施例的互動方法的流程圖，該互動方法可以應用於伺服器端。如圖1所示，所述方法包括步驟101~步驟103。FIG. 1 is a flowchart illustrating an interaction method according to at least one embodiment of the present disclosure, and the interaction method can be applied to a server side. As shown in FIG. 1 , the method includes steps 101 to 103 .

在步驟101中，接收來自客戶端的第一消息。In step 101, a first message from the client is received.

範例性的，所述第一消息中攜帶的指示內容可以包括所述使用者通過客戶端執行輸入操作所輸入的資訊，使用者的輸入操作包括輸入文本操作、輸入語音操作、動作觸發操作、按鍵觸發操作等等。輸入的資訊可以由客戶端發送至伺服器；或者在客戶端向伺服器發送輸入的資訊時，該輸入的資訊可以直接在所述客戶端進行顯示。所述第一消息中攜帶的指示內容的形式包括但不限於文本、語音、圖像（例如表情、動作圖像）、視訊等等。所述第一消息的具體形式與應用場景相關。例如，在視訊直播場景下，所述客戶端可以是支持觀看視訊直播功能的客戶端，所述第一消息可以在客戶端採集到使用者在顯示介面輸入文本內容後發送出去，第一消息攜帶的指示內容例如為輸入的文本內容，且該指示內容可以通過彈幕的形式顯示在顯示介面中；又例如，在體感互動場景下，所述第一消息可以在客戶端採集到使用者行為圖像後發送出去，第一消息攜帶的指示內容例如為採集的使用者行為圖像。當然，具體實施中本公開對第一消息的發送機制以及第一消息中攜帶的指示內容的形式並不進行限制。Exemplarily, the indication content carried in the first message may include information input by the user through an input operation performed by the client, and the user's input operation includes an input text operation, a voice input operation, an action trigger operation, and a keystroke. trigger actions, etc. The input information can be sent by the client to the server; or when the client sends the input information to the server, the input information can be directly displayed on the client. The form of the indication content carried in the first message includes, but is not limited to, text, voice, image (eg, facial expression, motion image), video, and the like. The specific form of the first message is related to the application scenario. For example, in a live video broadcast scenario, the client may be a client that supports the function of watching live video, and the first message may be collected by the client and sent after the user enters text content on the display interface. The first message carries The indication content is, for example, the input text content, and the indication content can be displayed in the display interface in the form of a bullet screen; another example, in the somatosensory interaction scenario, the first message can be collected on the client side. User behavior After the image is sent, the indication content carried in the first message is, for example, the collected user behavior image. Of course, in the specific implementation, the present disclosure does not limit the sending mechanism of the first message and the form of the indication content carried in the first message.

在步驟102中，基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據。In step 102, based on the indication content included in the first message, drive data matching the indication content is acquired.

範例性的，所述驅動數據包括聲音驅動數據、表情驅動數據、動作驅動數據中的一項或多項。一種實施方式中，所述驅動數據可以是預先儲存在伺服器或者其他關聯的業務伺服器中的，在接收到來自客戶端的第一消息後，可以根據所述指示內容在所述伺服器或其他關聯的業務伺服器中進行檢索，以獲得與所述指示內容匹配的驅動數據。另一種實施方式中，所述驅動數據可以是根據所述指示內容生成的，比如通過將所述指示內容輸入到預先訓練好的深度學習模型中，以預測得到與該指示內容對應的驅動數據。Exemplarily, the driving data includes one or more of voice-driven data, expression-driven data, and action-driven data. In an implementation manner, the drive data may be pre-stored in the server or other associated service servers, and after receiving the first message from the client Search from the associated service server to obtain the driving data matching the indicated content. In another implementation manner, the driving data may be generated according to the indication content, for example, by inputting the indication content into a pre-trained deep learning model to predict the driving data corresponding to the indication content.

在步驟103中，利用所述驅動數據，控制所述客戶端的顯示介面播放所述互動物件的回應動畫。In step 103, using the driving data, the display interface of the client is controlled to play the response animation of the interactive object.

在本公開實施例中，所述互動物件為對虛擬模型諸如二維或三維虛擬模型渲染得到的。所述虛擬模型可以是自定義生成的，也可以對一角色的圖像或視訊進行轉換而得到的。本公開實施例對於虛擬模型的生成方式不進行限制。In the embodiment of the present disclosure, the interactive object is obtained by rendering a virtual model, such as a two-dimensional or three-dimensional virtual model. The virtual model may be custom generated or obtained by converting an image or video of a character. The embodiments of the present disclosure do not limit the manner of generating the virtual model.

所述回應動畫可以根據所述驅動數據生成，通過控制客戶端的顯示介面，例如視訊直播介面，播放所述互動物件的回應動畫，能夠顯示所述互動物件對於來自客戶端的第一消息的回應，該回應包括輸出一段語言，和/或做出一些動作、表情等等。The response animation can be generated according to the drive data, and by controlling the display interface of the client, such as a live video interface, to play the response animation of the interactive object, the response of the interactive object to the first message from the client can be displayed, which is Responses include outputting a piece of language, and/or making some actions, expressions, etc.

在本公開實施例中，伺服器接收來自客戶端的第一消息，並根據所述第一消息所包含的指示內容來獲取匹配的驅動數據，並利用所述驅動數據來控制客戶端的顯示介面中播放所述互動物件的回應動畫，顯示互動物件的回應，使互動物件可以對於使用者的指示內容進行及時反饋，實現與使用者的及時互動。In the embodiment of the present disclosure, the server receives the first message from the client, obtains matching driving data according to the indication content contained in the first message, and uses the driving data to control the playback on the display interface of the client The response animation of the interactive object displays the response of the interactive object, so that the interactive object can give timely feedback to the user's instruction content, so as to realize the timely interaction with the user.

圖2為本公開至少一個實施例所提出的互動方法應用於直播過程的範例性說明。如圖2所示，所述互動物件為具有醫生形象的三維虛擬人物。在客戶端的顯示介面中可顯示所述三維虛擬人物作為主播進行直播的過程，客戶端上的使用者可以通過在顯示介面中執行輸入指示內容，以發送攜帶指示內容的第一消息，相應地，伺服器在接收來自客戶端的第一消息後，可以識別到指示內容，比如為“如何洗手”，進而可根據該指示內容獲取匹配的驅動數據，根據所述驅動數據，可以控制所述客戶端顯示該三維虛擬人物對於“如何洗手”這一指示內容的回應。例如，控制該三維虛擬人物輸出與“如何洗手”相對應的語音，並同時做出與輸出的語音相匹配的動作和/或表情。FIG. 2 is an exemplary illustration of the application of the interaction method proposed by at least one embodiment of the present disclosure to a live broadcast process. As shown in FIG. 2 , the interactive object is a three-dimensional virtual character with an image of a doctor. In the display interface of the client, the process of performing the live broadcast of the 3D avatar as the host can be displayed, and the user on the client can input the instruction content in the display interface to send the first message carrying the instruction content. Correspondingly, After receiving the first message from the client, the server can identify the instruction content, such as "how to wash hands", and then obtain matching drive data according to the instruction content, and can control the display of the client according to the drive data. The 3D avatar's response to the instruction "How to wash your hands". For example, the three-dimensional avatar is controlled to output a voice corresponding to "how to wash hands", and at the same time make actions and/or expressions matching the output voice.

在一些實施例中，所述指示內容包括文本內容。可以根據如下方式獲取針對指示內容的應答內容：基於自然語言處理（Natural Language Processing，NLP）算法，識別所述文本內容所表達的語言意圖，並獲取與所述語言意圖匹配的應答內容。In some embodiments, the indicative content includes textual content. The response content for the indicated content may be acquired in the following manner: based on a natural language processing (Natural Language Processing, NLP) algorithm, the language intent expressed by the text content is identified, and the response content matching the language intent is acquired.

在一些實施例中，可以利用預先訓練的用於自然語言處理的神經網路模型對所述文本內容進行處理，例如卷積神經網路（Convolutional Neural Networks，CNN）、循環神經網路（Recurrent Neural Network，RNN）、長短期記憶網路（Long Short Term Memory network，LTSM）等等。通過將所述第一消息包括的文本內容輸入至所上述神經網路模型，通過對文本內容所表徵的語言意圖進行分類，從而確定所述文本內容所表達的語言意圖類別。In some embodiments, the text content may be processed using a pre-trained neural network model for natural language processing, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (Recurrent Neural Networks) Network, RNN), Long Short Term Memory network (Long Short Term Memory network, LTSM) and so on. By inputting the text content included in the first message into the neural network model, and by classifying the language intention represented by the text content, the language intention category expressed by the text content is determined.

由於第一消息所包括的文本內容可能包含了多層的含義，通過利用自然語言處理算法，可以識別出使用者實際想表達的意圖，從而能夠直接反饋所述使用者真正想獲取的內容，提升了使用者的互動體驗。Since the text content included in the first message may contain multiple layers of meanings, the natural language processing algorithm can be used to identify the actual intention that the user wants to express, so that the content that the user really wants to obtain can be directly fed back, which improves the User's interactive experience.

在一些實施例中，可以根據所述語言意圖，從預設的數據庫中查找與所述語言意圖匹配的、符合所述語言意圖的應答內容，進一步地，伺服器可以基於所述應答內容，生成用於使所述互動物件表達所述應答內容的驅動數據。其中，所述數據庫可以部署在伺服器中，也可以部署在雲端，本公開對此不進行限制。In some embodiments, according to the language intent, the response content that matches the language intent and conforms to the language intent may be searched from a preset database. Further, the server may generate a response content based on the response content. Driving data for making the interactive object express the content of the response. Wherein, the database may be deployed in the server or in the cloud, which is not limited in the present disclosure.

在識別出語言意圖的情況下，伺服器可以從所述文本內容中提取與所述語言意圖相關的參數，也即實體。例如可以通過系統分詞、資訊抽取等方式確定實體。在所述語言意圖分類所對應的數據中，通過實體可以進一步確定符合所述語言意圖的應答文本。本領域技術人員應當理解，以上方式僅用於範例，也可以利用其他方式獲得與所述語言意圖匹配的應答文本，本公開對此不進行限制。In the case of identifying the linguistic intent, the server may extract parameters, ie entities, related to the linguistic intent from the textual content. For example, entities can be determined by means of systematic word segmentation, information extraction, etc. In the data corresponding to the language intent classification, the entity can further determine the response text that conforms to the language intent. Those skilled in the art should understand that the above manner is only used for example, and other manners may also be used to obtain the response text matching the language intent, which is not limited in the present disclosure.

在一些實施例中，伺服器可以根據所述應答內容生成語音驅動數據，所述語音驅動數據例如包括所述應答內容所包含的應答文本對應的音素序列。通過生成所述音素序列對應的語音，並控制所述客戶端輸出所述語音，可以使所述互動物件輸出表達所述應答文本所表徵的內容的語音。In some embodiments, the server may generate voice-driven data according to the response content, where the voice-driven data includes, for example, a phoneme sequence corresponding to the response text included in the response content. By generating the voice corresponding to the phoneme sequence and controlling the client to output the voice, the interactive object can be made to output the voice that expresses the content represented by the response text.

在一些實施例中，伺服器可以根據所述應答內容生成動作驅動數據，以使所述互動物件做出表達所述應答內容的動作。In some embodiments, the server may generate action-driven data according to the response content, so that the interactive object performs an action expressing the response content.

在一個範例中，在應答內容包括應答文本的情況下，可以利用以下方式根據所述應答內容生成動作驅動數據：基於所述應答文本中所包含的至少一個目標文本，獲取與所述目標文本匹配的互動物件的設定動作的控制參數。In an example, when the response content includes response text, the action-driven data may be generated according to the response content in the following manner: based on at least one target text included in the response text, obtain a match with the target text The control parameter of the set action of the interactive object.

所述目標文本可以是設置的關鍵字、關鍵詞、關鍵句等等。以關鍵詞為“洗手”為例，在所述應答文本中包含了“洗手”的情況下，則可以確定應答文本中包含了目標文本。可以預先為每一個目標文本設置匹配的設定動作，而每個設定動作可以通過一組控制參數序列來實現，例如多個骨骼點的位移形成一組控制參數，利用多組控制參數形成的控制參數序列來調整所述互動物件的模型參數，可以使互動物件做出所述設定動作。The target text may be set keywords, keywords, key sentences, and the like. Taking the keyword "washing hands" as an example, if the response text includes "washing hands", it can be determined that the response text includes the target text. A matching set action can be set for each target text in advance, and each set action can be realized by a set of control parameter sequences, such as the displacement of multiple skeleton points to form a set of control parameters, and the control parameters formed by multiple sets of control parameters. A sequence is used to adjust the model parameters of the interactive object, so that the interactive object can perform the set action.

在本公開實施例中，通過使互動物件以動作的形式來對第一消息進行回應，使使用者能夠獲得對於第一消息的直觀、生動的回應，提升了使用者的互動體驗。In the embodiment of the present disclosure, by making the interactive object respond to the first message in the form of action, the user can obtain an intuitive and vivid response to the first message, and the interactive experience of the user is improved.

在一些實施例中，可以確定所述目標文本對應的語音資訊；獲取輸出所述語音資訊的時間資訊；根據所述時間資訊確定所述目標文本對應的設定動作的執行時間；根據所述執行時間，以所述目標文本對應的控制參數控制所述互動物件執行所述設定動作。In some embodiments, the voice information corresponding to the target text can be determined; the time information for outputting the voice information can be obtained; the execution time of the set action corresponding to the target text can be determined according to the time information; according to the execution time , and control the interactive object to execute the set action with the control parameter corresponding to the target text.

在根據所述應答文本對應的音素序列控制所述客戶端輸出語音的情況下，可以確定輸出所述目標文本所對應的語音的時間資訊，例如開始輸出所述目標文本對應的語音的時間、結束輸出的時間以及持續時間。可以根據所述時間資訊確定所述目標文本對應的設定動作的執行時間，在所述執行時內，或者在執行時間的一定範圍內，以所述目標文本對應的控制參數控制所述互動物件執行所述設定動作。In the case of controlling the client to output the voice according to the phoneme sequence corresponding to the response text, the time information of outputting the voice corresponding to the target text may be determined, such as the time to start outputting the voice corresponding to the target text, the end of the output The time and duration of the output. The execution time of the set action corresponding to the target text can be determined according to the time information, and within the execution time, or within a certain range of the execution time, the interactive object is controlled to execute with the control parameters corresponding to the target text the setting action.

在本公開實施例中，對於每個目標文本，輸出對應的語音的持續時間，與根據對應的控制參數控制動作的持續時間，是一致的或者相近的，以使互動物件輸出目標文本所對應的語音與進行動作的時間是匹配的，從而使互動物件的語音和動作同步、協調，使使用者產生所述互動物件在直播過程中做出回應的感覺，提高了使用者在直播過程中與主播進行互動的體驗。In the embodiment of the present disclosure, for each target text, the duration of outputting the corresponding voice is consistent with or similar to the duration of the control action according to the corresponding control parameter, so that the interactive object outputs the corresponding value of the target text. The voice and the time of the action are matched, so that the voice of the interactive object and the action are synchronized and coordinated, so that the user can feel that the interactive object responds during the live broadcast, which improves the interaction between the user and the host during the live broadcast. An interactive experience.

在一些實施例中，可以根據所述應答文本生成姿態驅動數據，以使所述客戶端顯繪示與應答文本對應的語音相匹配的所述互動物件的姿態，例如做出相應的表情和動作。In some embodiments, gesture-driven data may be generated according to the response text, so that the client can display the gesture of the interactive object that matches the voice corresponding to the response text, such as making corresponding expressions and actions .

在一個範例中，應答內容還可以包括音素序列，或者，在應答內容包括應答文本的情況下，也可以提取應答文本對應的音素序列，在獲取到包括音素序列的應答內容後，可以獲取與所述音素序列匹配的用於所述互動物件的控制參數。其中，所述互動物件的控制參數包括至少一個局部區域的姿態控制向量，所述獲取與所述音素序列匹配的用於互動物件的控制參數，包括：對所述音素序列進行特徵編碼，獲得所述音素序列對應的第一編碼序列；根據所述第一編碼序列，獲取至少一個音素對應的特徵編碼；獲取所述特徵編碼對應的所述互動物件的至少一個局部區域的姿態控制向量。In an example, the response content may also include a phoneme sequence, or, in the case where the response content includes a response text, a phoneme sequence corresponding to the response text may also be extracted, and after the response content including the phoneme sequence is obtained, you can obtain the corresponding phoneme sequence. The phoneme sequence matches the control parameters for the interactive object. Wherein, the control parameter of the interactive object includes a gesture control vector of at least one local area, and the obtaining the control parameter for the interactive object that matches the phoneme sequence includes: performing feature encoding on the phoneme sequence to obtain the obtained phoneme sequence. The first coding sequence corresponding to the phoneme sequence; according to the first coding sequence, obtain the feature code corresponding to at least one phoneme; obtain the gesture control vector of at least one local area of the interactive object corresponding to the feature code.

在一些實施例中，通過控制客戶端在播放所述應答文本對應的語音並使客戶端顯繪示所述語音相匹配的所述互動物件的姿態的回應動畫，使得所述互動物件的回應更加擬人化，更加生動、自然，提升了使用者的互動體驗。In some embodiments, by controlling the client to play the voice corresponding to the response text and causing the client to display a response animation of the gesture of the interactive object that matches the voice, the response of the interactive object is made more responsive. Anthropomorphic, more vivid and natural, which enhances the user's interactive experience.

在所述互動物件的控制參數包括至少一個局部區域的姿態控制向量的實施例中，可以通過以下方式獲得姿態控制向量。In an embodiment where the control parameter of the interactive object includes a gesture control vector of at least one local area, the gesture control vector can be obtained in the following manner.

首先，對所述應答文本對應的音素序列進行特徵編碼，獲得所述音素序列對應的編碼序列。此處，為了與後續提到的編碼序列進行區分，將所述文本數據的音素序列對應的編碼序列稱為第一編碼序列。First, feature encoding is performed on the phoneme sequence corresponding to the response text to obtain a coding sequence corresponding to the phoneme sequence. Here, in order to distinguish it from the coding sequence mentioned later, the coding sequence corresponding to the phoneme sequence of the text data is referred to as the first coding sequence.

針對所述音素序列包含的多種音素，生成每種音素對應的子編碼序列。For multiple phonemes included in the phoneme sequence, a subcoding sequence corresponding to each phoneme is generated.

在一個範例中，檢測各時間點上是否對應有第一音素，所述第一音素為所述多個音素中的任一種；將有所述第一音素對應的時間點上的編碼值設置為第一數值，將沒有所述第一音素的時間點上的編碼值設置為第二數值，在對各個時間點上的編碼值進行賦值之後可得到第一音素對應的子編碼序列。例如，可以在有所述第一音素的時間點上的編碼值設置為1，在沒有所述第一音素的時間上的編碼值為0。本領域技術人員應當理解，上述編碼值的設置僅為範例，也可以將編碼值設置為其他值，本公開對此不進行限制。In an example, it is detected whether there is a first phoneme corresponding to each time point, and the first phoneme is any one of the plurality of phonemes; the encoding value at the time point corresponding to the first phoneme is set as For the first numerical value, the coding value at the time point without the first phoneme is set as the second numerical value, and the sub-coding sequence corresponding to the first phoneme can be obtained after assigning the coding value at each time point. For example, the code value may be set to 1 at the time point when the first phoneme is present, and the code value may be set to 0 at the time point when the first phoneme is absent. Those skilled in the art should understand that the above setting of the encoding value is only an example, and the encoding value may also be set to other values, which is not limited in the present disclosure.

之後，根據所述多種音素分別對應的子編碼序列，獲得所述音素序列對應的第一編碼序列。After that, a first coding sequence corresponding to the phoneme sequence is obtained according to the sub-coding sequences corresponding to the plurality of phonemes respectively.

在一個範例中，對於第一音素對應的子編碼序列，可利用高斯濾波器對所述第一音素在時間上的連續值進行高斯卷積操作，以對特徵編碼所對應的矩陣進行濾波，平滑每一個音素轉換時，嘴部區域過渡的動作。In an example, for the sub-coding sequence corresponding to the first phoneme, a Gaussian filter may be used to perform a Gaussian convolution operation on the continuous values of the first phoneme in time, so as to filter the matrix corresponding to the feature code and smooth the The action of the mouth region transition at each phoneme transition.

圖3繪示了本公開至少一個實施例提出的獲得姿態控制向量的方法流程圖。如圖3所示，音素序列310含音素j、i1、j、ie4（為簡潔起見，只繪示部分音素），針對每種音素j、i1、ie4分別獲得與上述各音素分別對應的子編碼序列321、322、323。在各個子編碼序列中，在有所述音素的時間（圖3中以秒(s)為時間單位）上對應的編碼值為第一數值（例如為1），在沒有所述音素的時間（圖3中以秒(s)為時間單位）上對應的編碼值為第二數值（例如為0）。以子編碼序列321為例，在音素序列310中有音素j的時間上，子編碼序列321的值為第一數值，在沒有音素j的時間上，子編碼序列321的值為第二數值。所有子編碼序列構成第一編碼序列320。FIG. 3 is a flowchart of a method for obtaining an attitude control vector provided by at least one embodiment of the present disclosure. As shown in FIG. 3 , the phoneme sequence 310 includes phonemes j, i1, j, and ie4 (for the sake of brevity, only part of the phonemes are shown), and for each phoneme j, i1, and ie4, the corresponding sub-phones are obtained respectively. Coding sequences 321, 322, 323. In each sub-encoding sequence, the corresponding encoding value is the first numerical value (for example, 1) at the time when the phoneme is present (second (s) is used as the time unit in FIG. 3 ), and at the time when the phoneme is absent ( In FIG. 3 , the corresponding coded value on the second (s) is the second value (for example, 0). Taking the sub-coding sequence 321 as an example, when the phoneme sequence 310 has phoneme j, the value of the sub-coding sequence 321 is the first value, and when there is no phone j, the value of the sub-coding sequence 321 is the second value. All sub-coding sequences constitute the first coding sequence 320 .

接下來，根據所述第一編碼序列，獲取至少一個音素對應的特徵編碼。Next, according to the first coding sequence, a feature code corresponding to at least one phoneme is acquired.

根據音素j、i1、ie4分別對應的子編碼序列321、322、323的編碼值，以及該三個子編碼序列中對應的音素的持續時間，也即在子編碼序列321中j的持續時間、在子編碼序列322中i1的持續時間、在子編碼序列323中ie4的持續時間，可以獲得子編碼序列321、322、323的特徵資訊。According to the coding values of the sub-coding sequences 321, 322, 323 corresponding to phonemes j, i1, and ie4 respectively, and the durations of the corresponding phonemes in the three sub-coding sequences, that is, the duration of j in the sub-coding sequence 321, The duration of i1 in the sub-coding sequence 322 and the duration of ie4 in the sub-coding sequence 323 can obtain the characteristic information of the sub-coding sequences 321 , 322 and 323 .

在一個範例中，可以利用高斯濾波器對分別子編碼序列321、322、323中的音素j、i1、ie4在時間上的連續值進行高斯卷積操作，以對特徵編碼進行平滑，得到平滑後的第一編碼序列330。也即，通過高斯濾波器對於音素的0-1的時間上的連續值進行高斯卷積操作，使得各個編碼序列中編碼值從第二數值到第一數值或者從第一數值到第二數值的變化階段變得平滑。例如，編碼序列的值除了0和1也呈現出中間狀態的值，例如0.2、0.3等等，而根據這些中間狀態的值所獲取的姿態控制向量，使得互動人物的動作過度、表情變化更加平緩、自然，提高了目標物件的互動體驗。In an example, a Gaussian convolution operation can be performed on the continuous values of phonemes j, i1, ie4 in time in the sub-coding sequences 321, 322, 323 using a Gaussian filter, so as to smooth the feature code, and obtain a smoothed 330 of the first coding sequence. That is, the Gaussian convolution operation is performed on the temporally continuous values of 0-1 of the phoneme through the Gaussian filter, so that the coding values in each coding sequence are from the second numerical value to the first numerical value or from the first numerical value to the second numerical value. The change phase becomes smooth. For example, in addition to 0 and 1, the values of the coding sequence also show intermediate state values, such as 0.2, 0.3, etc., and the gesture control vector obtained according to the values of these intermediate states makes the interactive characters excessively move and change their expressions more smoothly , Naturally, it improves the interactive experience of the target object.

在一些實施例中，可以通過在所述第一編碼序列上進行滑動視窗的方式獲取至少一個音素對應的特徵編碼。其中，所述第一編碼序列可以是經過高斯卷積操作後的編碼序列。In some embodiments, the feature code corresponding to at least one phoneme may be obtained by performing a sliding window on the first code sequence. Wherein, the first coding sequence may be a coding sequence after Gaussian convolution operation.

以設定長度的時間視窗和設定步長，對所述編碼序列進行滑動視窗，將所述時間視窗內的特徵編碼作為所對應的至少一個音素的特徵編碼，在完成滑動視窗後，根據得到的多個特徵編碼，則可以獲得第二編碼序列。如圖3所示，通過在第一編碼序列320或者平滑後的第一編碼序列330上，滑動設定長度的時間視窗，分別獲得特徵編碼1、特徵編碼2、特徵編碼3，以此類推，在遍歷第一編碼序列後，獲得特徵編碼1、2、3、…、M，從而得到了第二編碼序列340。其中，M為正整數，其數值根據第一編碼序列的長度、時間視窗的長度以及時間視窗滑動的步長確定。With a time window of a set length and a set step size, a sliding window is performed on the coding sequence, and the feature coding in the time window is used as the feature coding of the corresponding at least one phoneme, and after the sliding window is completed, according to the obtained multiple feature codes, the second code sequence can be obtained. As shown in FIG. 3 , by sliding a time window of a set length on the first encoding sequence 320 or the smoothed first encoding sequence 330, feature encoding 1, feature encoding 2, and feature encoding 3 are obtained respectively, and so on. After traversing the first coding sequence, the feature codes 1, 2, 3, . . . , M are obtained, thereby obtaining the second coding sequence 340 . Wherein, M is a positive integer, and its value is determined according to the length of the first coding sequence, the length of the time window, and the step size of the time window sliding.

根據特徵編碼1、2、3、…、M，分別可以獲得相應的姿態控制向量1、2、3、…、M，從而獲得姿態控制向量的序列350。According to the feature codes 1, 2, 3, ..., M, corresponding attitude control vectors 1, 2, 3, ..., M can be obtained respectively, thereby obtaining a sequence 350 of attitude control vectors.

姿態控制向量的序列350與第二編碼序列340在時間上是對齊的，由於所述第二編碼序列中的每個編碼特徵是根據音素序列中的至少一個音素獲得的，因此姿態控制向量的序列350中的每個特徵向量同樣是根據音素序列中的至少一個音素獲得的。在播放文本數據所對應的音素序列的同時，根據所述姿態控制向量的序列驅動所述互動物件做出動作，即能夠實現驅動互動物件發出文本內容所對應的聲音的同時，做出與聲音同步的動作，給目標物件以所述互動物件正在說話的感覺，提升了目標物件的互動體驗。The sequence 350 of gesture control vectors is temporally aligned with the second encoding sequence 340, and since each encoded feature in the second encoding sequence is obtained from at least one phoneme in the phoneme sequence, the sequence of gesture control vectors Each feature vector in 350 is also obtained from at least one phoneme in the phoneme sequence. While playing the phoneme sequence corresponding to the text data, the interactive object is driven to act according to the sequence of the gesture control vector, that is, the interactive object can be driven to emit the sound corresponding to the text content, and the sound can be synchronized with the sound. The action gives the target object the feeling that the interactive object is talking, which improves the interactive experience of the target object.

假設在第一個時間視窗的設定時刻開始輸出編碼特徵，可以將在所述設定時刻之前的姿態控制向量設置為默認值，也即在剛開始播放音素序列時，使所述互動物件做出默認的動作，在所述設定時刻之後開始利用根據第一編碼序列所得到的姿態控制向量的序列驅動所述互動物件做出動作。以圖3為例，在t0時刻開始輸出編碼特徵1，在t0時刻之前對應的是默認姿態控制向量。Assuming that the encoding feature starts to be output at the set time of the first time window, the gesture control vector before the set time can be set to the default value, that is, when the phoneme sequence is just started to be played, the interactive object can be set to the default value. After the set time, the interactive object starts to use the sequence of gesture control vectors obtained according to the first coding sequence to drive the interactive object to make an action. Taking Figure 3 as an example, the encoding feature 1 starts to be output at time t0, and before time t0 corresponds to the default attitude control vector.

在一些實施例中，在所述音素序列中音素之間的時間間隔大於設定閾值的情況下，根據所述局部區域的設定姿態控制向量，驅動所述互動物件做出動作。也即，在互動人物說話停頓較長的時候，則驅動互動物件做出設定的動作。例如，在輸出的聲音停頓較大時，可以使互動人物做出微笑的表情，或者做出身體微微的擺動，以避免在停頓較長時互動人物面無表情地直立，使得互動物件說話的過程自然、流暢，提高目標物件的互動感受。In some embodiments, when the time interval between phonemes in the phoneme sequence is greater than a set threshold, the interactive object is driven to act according to the set gesture control vector of the local area. That is, when the interactive character pauses for a long time, the interactive object is driven to perform the set action. For example, when the output sound pauses for a long time, the interactive character can be made to smile or make a slight body swing, so as to avoid the interactive character standing blankly when the pause is long, which makes the interactive object speak. Natural and smooth, improve the interactive experience of the target object.

在一些實施例中，對於所述應答文本中所包含的至少一個目標文本，獲取與所述至少一個目標文本匹配的用於互動物件的設定動作的控制參數，來驅動所述互動物件執行所述設定動作；對於所述至少一個目標文本以外的應答內容，可以根據所述應答內容所對應的音素來獲取所述互動物件的控制參數，從而驅動所述互動物件做出與所述應答內容的發音相匹配的姿態，例如表情和動作。In some embodiments, for at least one target text included in the response text, a control parameter for a set action of an interactive object that matches the at least one target text is obtained, so as to drive the interactive object to execute the Set action; for the response content other than the at least one target text, the control parameters of the interactive object can be obtained according to the phoneme corresponding to the response content, so as to drive the interactive object to make a pronunciation corresponding to the response content Matching gestures, such as expressions and movements.

以圖2所示的直播過程為例，在所接收的第一消息包含文本內容“如何洗手”的情況下，通過自然語言處理算法，可以識別出使用者的語言意圖是“諮詢如何洗手”。通過在預設的數據庫中進行檢索，可以獲得符合回答如何洗手的內容，並將該內容作為應答文本。通過根據所述應答文本生成動作驅動數據、聲音驅動數據、姿態驅動數據，可以使所述互動物件在通過語音回答“如何洗手”這一問題的同時，做出與發音相匹配的表情、動作，並同時用肢體動作來演示如何洗手。Taking the live broadcast process shown in Figure 2 as an example, when the received first message contains the text content "how to wash hands", through the natural language processing algorithm, it can be recognized that the user's language intention is "consult how to wash hands". By retrieving in the preset database, the content corresponding to the answer on how to wash hands can be obtained, and the content can be used as the answer text. By generating action-driven data, voice-driven data, and gesture-driven data according to the response text, the interactive object can make expressions and actions that match the pronunciation while answering the question "how to wash hands" through voice. And at the same time use body movements to demonstrate how to wash hands.

在一些實施例中，還可以向所述客戶端發送包括所述應答文本的指示資訊，以使所述客戶端基於所述指示資訊顯示所述應答文本。In some embodiments, instruction information including the response text may also be sent to the client, so that the client displays the response text based on the instruction information.

例如，對於回應“如何洗手”這一問題的應答文本，可以通過將包含所述應答文本的指示資訊發送至客戶端，以在所述客戶端上以文本的形式顯示所述指示消息，以使使用者能夠更加準確地接收到互動物件所傳達的資訊。For example, for a response text to the question "how to wash your hands", the instruction message can be displayed in text form on the client by sending instruction information containing the response text to the client, so that the Users can more accurately receive the information conveyed by interactive objects.

在一些實施例中，所述互動物件對應的虛擬模型（虛擬模型既可以是二維虛擬模型也可以是三維虛擬模型）可以儲存於客戶端。在這種情況下，可以將所述互動物件的驅動數據發送至所述客戶端，以使所述客戶端根據驅動數據生成回應動畫；控制所述客戶端播放所述回應動畫。例如，可以控制所述客戶端根據所述驅動數據所包含的控制參數來調整所互動物件的虛擬模型參數；並基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並播放所述回應動畫來對所述第一消息進行回應。在虛擬模型為二維虛擬模型的情況下，虛擬模型參數為二維虛擬模型參數，在虛擬模型為三維虛擬模型的情況下，虛擬模型參數為三維虛擬模型參數。又例如，伺服器可以基於驅動數據，確定用於控制互動物件的回應方式的控制指令，並向客戶端發送所述控制指令，以使所述客戶端基於所述控制指令顯示進行回應的互動物件的畫面。In some embodiments, the virtual model corresponding to the interactive object (the virtual model can be either a two-dimensional virtual model or a three-dimensional virtual model) can be stored in the client. In this case, the driving data of the interactive object may be sent to the client, so that the client generates a response animation according to the driving data; and the client is controlled to play the response animation. For example, the client can be controlled to adjust the virtual model parameters of the interactive object according to the control parameters included in the driving data; and based on the adjusted virtual model parameters, the rendering engine is used to generate the response animation of the interactive object, and The response animation is played in response to the first message. When the virtual model is a two-dimensional virtual model, the virtual model parameters are two-dimensional virtual model parameters, and when the virtual model is a three-dimensional virtual model, the virtual model parameters are three-dimensional virtual model parameters. For another example, the server may determine a control command for controlling the response mode of the interactive object based on the driving data, and send the control command to the client, so that the client displays the responding interactive object based on the control command screen.

在互動物件的虛擬模型的數據量較小，對於客戶端的性能佔用不高的情況下，可以通過將所述驅動數據發送至所述客戶端，使所述客戶端根據所述驅動數據生成回應動畫，從而可以方便靈活地顯繪示進行回應的互動物件的畫面。In the case where the data volume of the virtual model of the interactive object is small and the performance of the client is not high, the client can generate a response animation according to the drive data by sending the drive data to the client , so that the screen of the responding interactive object can be displayed conveniently and flexibly.

在一些實施例中，所述互動物件對應的虛擬模型儲存於伺服器端或雲端。在這種情況下，可以基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並向所述客戶端發送所述回應動畫，所述回應動畫中顯示所述互動物件的動作或表情。通過將所述回應動畫發送至客戶端來實現所述互動物件的回應，可以避免客戶端進行渲染導致的卡頓，並且能夠在客戶端顯示高質量的回應動畫，提升了使用者的互動體驗。In some embodiments, the virtual model corresponding to the interactive object is stored on the server or the cloud. In this case, the virtual model parameters of the interactive object can be adjusted based on the driving data; based on the adjusted virtual model parameters, a rendering engine is used to generate a response animation of the interactive object, and send it to the client In the response animation, the action or expression of the interactive object is displayed in the response animation. By sending the response animation to the client to realize the response of the interactive object, it is possible to avoid the freeze caused by the client's rendering, and can display the high-quality response animation on the client, which improves the user's interactive experience.

圖4繪示根據本公開至少一個實施例的另一種互動方法的流程圖。該互動方法可應用於客戶端。所述方法包括步驟401~402。FIG. 4 is a flowchart illustrating another interaction method according to at least one embodiment of the present disclosure. This interactive method can be applied to the client. The method includes steps 401-402.

在步驟401中，響應於來自客戶端的使用者輸入操作，向伺服器發送包括指示內容的第一消息。In step 401, in response to a user input operation from the client, a first message including the indication content is sent to the server.

範例性的，使用者輸入操作包括輸入文本操作、輸入語音操作、動作觸發操作、按鍵觸發操作等等，響應於所述使用者輸入操作，向伺服器發送的第一消息，第一消息中攜帶的指示內容包括但不限於文本、語音、圖像（例如表情、動作圖像）、視訊等中的一種或多種。例如，在視訊直播場景下，所述客戶端可以是支持觀看視訊直播功能的客戶端，所述第一消息可以在客戶端採集到使用者在顯示介面輸入文本內容後發送出去，第一消息攜帶的指示內容例如為輸入的文本內容，且該指示內容可以通過彈幕的形式顯示在顯示介面中。又例如，在體感互動場景下，所述第一消息可以在客戶端採集到使用者行為圖像後發送出去，第一消息攜帶的指示內容例如為採集的使用者行為圖像。當然，具體實施中本公開對第一消息的發送機制以及第一消息中攜帶的指示內容的形式並不進行限制。Exemplarily, the user input operation includes an input text operation, a voice input operation, an action-triggered operation, a key-triggered operation, etc. In response to the user input operation, the first message sent to the server, the first message carrying The indicated content includes but is not limited to one or more of text, voice, images (such as facial expressions, motion images), and video. For example, in a live video broadcast scenario, the client may be a client that supports the function of watching live video, and the first message may be collected by the client and sent after the user enters text content on the display interface. The first message carries For example, the instruction content is input text content, and the instruction content can be displayed in the display interface in the form of a bullet screen. For another example, in a somatosensory interaction scenario, the first message may be sent after the client terminal collects the user behavior image, and the indication content carried in the first message is, for example, the collected user behavior image. Of course, in the specific implementation, the present disclosure does not limit the sending mechanism of the first message and the form of the indication content carried in the first message.

在步驟402中，基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面播放所述互動物件的回應動畫。In step 402, based on the second message that the server responds to the first message, the response animation of the interactive object is played on the display interface of the client.

所述第二消息為所述伺服器響應於所述第一消息所包含的指示內容所生成的，用於使所述客戶端顯示對所述指示內容做出的回應的互動物件。The second message is generated by the server in response to the instruction content included in the first message, and is used for causing the client to display an interactive object responding to the instruction content.

在本公開實施例中，通過根據使用者輸入操作向伺服器發送包括指示內容的第一消息，基於所述伺服器響應於所述第一消息回應的第二消息，在客戶端中顯示互動物件對所述指示內容做出的回應，可以使互動物件可以對於使用者的指示內容進行及時反饋，實現與使用者的及時互動。In an embodiment of the present disclosure, by sending a first message including an indication content to a server according to a user input operation, an interactive object is displayed in the client based on the second message responded by the server in response to the first message The response to the instruction content enables the interactive object to give timely feedback to the user's instruction content, thereby realizing timely interaction with the user.

在一些實施例中，所述指示內容包括文本內容；所述方法還包括：在所述客戶端的顯示介面中顯示所述文本內容，和/或，確定並播放所述文本內容對應的音訊文件。也即，可以在客戶端顯示使用者輸入的文本內容；還可以在客戶端播放所述文本內容對應的音訊文件，輸出所述文本內容對應的語音。In some embodiments, the indication content includes text content; the method further includes: displaying the text content on a display interface of the client, and/or determining and playing an audio file corresponding to the text content. That is, the text content input by the user can be displayed on the client terminal; the audio file corresponding to the text content can also be played on the client terminal, and the voice corresponding to the text content can be output.

在一些實施例中，所述在所述客戶端中顯示所述文本內容，包括：生成所述文本內容的彈幕資訊；在所述客戶端的顯示介面中顯示所述彈幕資訊。In some embodiments, the displaying the text content in the client includes: generating bullet screen information of the text content; and displaying the bullet screen information in a display interface of the client.

在視訊直播場景下，對於使用者輸入的文本內容，可以生成對應的彈幕資訊，並在客戶端的顯示介面顯示所述彈幕資訊。以圖2為例，在使用者在客戶端的直播互動介面輸入“如何洗手”的情況下，在顯示介面可以顯示該文本內容對應的彈幕資訊“如何洗手”。In the live video broadcast scenario, for the text content input by the user, corresponding bullet screen information can be generated, and the bullet screen information can be displayed on the display interface of the client. Taking FIG. 2 as an example, when the user inputs "how to wash hands" in the live interactive interface of the client, the display interface can display the barrage information "how to wash hands" corresponding to the text content.

在一些實施例中，所述第二消息中包括針對所述指示內容的應答文本；所述方法還包括：在所述客戶端的顯示介面中顯示所述應答文本，和/或，確定並播放所述應答文本對應的音訊文件。In some embodiments, the second message includes a response text for the indicated content; the method further includes: displaying the response text on a display interface of the client, and/or determining and playing the response text. The audio file corresponding to the response text.

所述指示內容的應答文本可以通過以下方式獲得：識別所述文本內容所表達的語言意圖，並從預設的數據庫中查找與所述語言意圖匹配的應答文本。具體方法參見上述實施例所述，在此不再贅述。The response text indicating the content may be obtained by: identifying the language intent expressed by the text content, and searching for a response text matching the language intent from a preset database. For the specific method, refer to the above-mentioned embodiments, and details are not repeated here.

以視訊直播場景為例，在顯示介面可以同樣以彈幕資訊的形式，顯示對於使用者的彈幕資訊進行回復的應答文本；並且可以在顯示介面播放所述應答文本對應的音訊文件，也即輸出所述應答文本對應的語音，從而可以對使用者的彈幕資訊進行精准、直觀的回復，提升使用者的互動體驗。Taking the live video broadcast scene as an example, the response text for replying to the user's bullet screen information can also be displayed on the display interface in the form of bullet screen information; and the audio file corresponding to the response text can be played on the display interface, that is, The voice corresponding to the response text is output, so that the user's bullet screen information can be answered accurately and intuitively, and the user's interactive experience can be improved.

在一些實施例中，所述第二消息中包括與所述應答文本對應的音素序列匹配的所述互動物件的控制參數，和/或，與所述應答文本中所包含的至少一個目標文本匹配的所述互動物件的設定動作的控制參數；所述基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面中播放所述互動物件的回應動畫，包括：基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並顯示在所述客戶端的顯示介面中。其中，生成與所述應答文本對應的音素序列匹配的所述互動物件的控制參數，以及生成與所述應答文本中所包含的至少一個目標文本匹配的所述互動物件的設定動作的控制參數的具體方法參見上述實施例所述，在此不再贅述。In some embodiments, the second message includes control parameters of the interactive object that match the phoneme sequence corresponding to the response text, and/or matches at least one target text included in the response text The control parameters of the setting action of the interactive object; the second message based on the server's response to the first message, playing the response animation of the interactive object in the display interface of the client, including: Based on the driving data, the virtual model parameters of the interactive object are adjusted; based on the adjusted virtual model parameters, a rendering engine is used to generate a response animation of the interactive object, and display it on the display interface of the client. Wherein, generating control parameters of the interactive object matching the phoneme sequence corresponding to the response text, and generating control parameters of the setting action of the interactive object matching at least one target text included in the response text For the specific method, refer to the above-mentioned embodiments, and details are not repeated here.

在互動物件的虛擬模型的數據量較小，對於客戶端的性能佔用不高的情況下，所述客戶端獲取所述驅動數據，並根據所述驅動數據生成回應動畫，從而可以方便靈活地顯繪示進行回應的互動物件的畫面。In the case that the data volume of the virtual model of the interactive object is small and the performance of the client is not high, the client obtains the driving data, and generates a response animation according to the driving data, so that the rendering can be conveniently and flexibly A screen showing the interactive object that responded.

在一些實施例中，所述第二消息還包括所述互動物件對所述指示內容做出的回應動畫；所述基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面中播放所述互動物件的回應動畫，包括：在所述客戶端的顯示介面中顯示所述回應動畫。In some embodiments, the second message further includes a response animation made by the interactive object to the indicated content; the second message based on the server's response to the first message, in the Playing the response animation of the interactive object in the display interface of the client includes: displaying the response animation in the display interface of the client.

在一些實施例中，所述互動物件對應的虛擬模型儲存於伺服器端或雲端。在這種情況下，可以在伺服器端或雲端生成回應動畫。生成回應動畫的具體方式參見上述實施例，在此不再贅述。In some embodiments, the virtual model corresponding to the interactive object is stored on the server or the cloud. In this case, the response animation can be generated on the server side or in the cloud. For the specific manner of generating the response animation, reference may be made to the foregoing embodiments, and details are not described herein again.

通過將所述回應動畫發送至客戶端來實現所述互動物件的回應，可以避免客戶端進行渲染導致的卡頓，並且能夠在客戶端顯示高質量的回應動畫，提升了使用者的互動體驗。By sending the response animation to the client to realize the response of the interactive object, it is possible to avoid the freeze caused by the client's rendering, and can display the high-quality response animation on the client, which improves the user's interactive experience.

在一些實施例中，所述使用者的輸入操作包括，所述使用者跟隨所述顯示介面中顯示的肢體操作畫面做出相應的人體姿態；該情況下，響應於來自客戶端的使用者輸入操作，所述方法還包括：獲取包括所述人體姿態的使用者行為圖像；識別所述使用者行為圖像中的人體姿態資訊，基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應。In some embodiments, the user's input operation includes that the user follows the limb operation picture displayed in the display interface to make a corresponding human gesture; in this case, in response to the user's input operation from the client , the method further includes: acquiring a user behavior image including the human body posture; recognizing the human body posture information in the user behavior image, and driving the interactive objects displayed on the display interface based on the human body posture information respond.

在一些實施例中，所述基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應，包括：確定所述人體姿態資訊與所述肢體操作畫面中的人體姿態的匹配度；基於所述匹配度，驅動所述顯示介面顯示的互動物件進行回應。In some embodiments, driving the interactive object displayed on the display interface to respond based on the human body posture information includes: determining a degree of matching between the human body posture information and the human body posture in the limb operation screen; based on The matching degree drives the interactive objects displayed on the display interface to respond.

在一些實施例中；所述基於所述匹配度，驅動所述互動物件進行回應，包括：在所述匹配度達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第一回應，其中所述第一回應包括顯示姿態合格的肢體動作和/或語音提示；以及顯示下一個肢體操作畫面；在所述匹配度未達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第二回應，其中所述第二回應包括顯示姿態未合格的肢體動作和/或語音提示；以及保持顯示當前的肢體操作畫面。In some embodiments, the driving the interactive object to respond based on the matching degree includes: instructing the interactive object displayed on the display interface to make a first response when the matching degree reaches a set condition , wherein the first response includes displaying a qualified body movement and/or voice prompt; and displaying the next body operation screen; in the case where the matching degree does not reach the set condition, instructing the interactive object displayed on the display interface A second response is made, wherein the second response includes displaying the body movements and/or voice prompts of unqualified postures; and keeping the current body operation screen displayed.

範例性的，以下為本公開實施例應用在視訊直播平臺的場景下的一些實施例：Exemplarily, the following are some embodiments in which the embodiments of the present disclosure are applied in the scenario of a live video broadcast platform:

在一些實施例中，所接收的來自客戶端的第一消息是直播平臺傳送的使用者彈幕文本。In some embodiments, the received first message from the client is the user bullet screen text transmitted by the live broadcast platform.

在一些實施例中，通過自然語言處理算法分析彈幕的意圖後，得到對應的回答，之後通過互動物件播報所述回答的內容。並且，還可以通過互動物件顯示所述回答的內容對應的動作。In some embodiments, after analyzing the intention of the bullet screen by a natural language processing algorithm, a corresponding answer is obtained, and then the content of the answer is broadcast through the interactive object. In addition, the action corresponding to the content of the answer can also be displayed through the interactive object.

在一些實施例中，直接集成客戶端的自然語言處理能力，對所述第一消息包括的指示內容進行自然語言處理，得到與所述指示內容的語言意圖匹配的、符合所述語言意圖的應答文本，並將所輸出的所述應答文本對應的文字直接提供給互動物件進行播放。In some embodiments, the natural language processing capability of the client is directly integrated, and natural language processing is performed on the indication content included in the first message to obtain a response text that matches the language intention of the indication content and conforms to the language intention , and directly provide the text corresponding to the output response text to the interactive object for playback.

在一些實施例中，互動物件可以模仿使用者的說話內容。例如，對於使用者通過客戶端輸入的語音，通過將所述語音轉換成文本，並根據語音獲取所述使用者的聲音特徵，並基於所述聲音特徵輸出文本對應的語音，即能夠實現互動物件模仿使用者的說話內容。In some embodiments, the interactive object may mimic the user's speech. For example, for the voice input by the user through the client, by converting the voice into text, obtaining the voice features of the user according to the voice, and outputting the voice corresponding to the text based on the voice features, an interactive object can be realized. Mimic the user's speech.

在一些實施例中，互動物件還可以根據自然語言處理返回的內容進行頁面顯示，可按照預先設計的需顯示的內容，以及互動方式顯示UI內容進行顯示，從而使回應內容的顯示更加醒目，吸引使用者的注意力。In some embodiments, the interactive object can also be displayed on the page according to the content returned by natural language processing, and can be displayed according to the pre-designed content to be displayed and the UI content in an interactive manner, so that the display of the response content is more eye-catching and attractive. user's attention.

在上述實施例中可以直播實時互動，直播過程中，使用者可與互動物件進行實時互動，得到反饋。還可以不間斷直播，還可以可自動生產視訊內容，是一種新的電視直播方式。In the above embodiment, real-time interaction can be broadcast live. During the live broadcast, the user can interact with interactive objects in real time to obtain feedback. It can also broadcast live continuously, and can also automatically produce video content, which is a new way of live TV.

範例性的，互動物件可以表現為三維形式的數位人。數位人將人工智能（Artificial Intelligence, AI）仿真動畫生成能力與自然語言理解能力相結合，可以像真人一樣聲型並茂和使用者進行交流。數位人可以根據回答內容生成相應的嘴形、表情、眼神及全身動作，最終輸出高質量、音視訊同步的語音和多維動畫內容，將完整的數位人形象自然地呈現給使用者。Exemplarily, the interactive object can be represented as a digital human in three-dimensional form. The digital human combines the artificial intelligence (AI) simulation animation generation ability with the natural language understanding ability, and can communicate with the user like a real person. The digital person can generate the corresponding mouth shape, expression, eyes and body movements according to the answer content, and finally output high-quality, audio-video synchronized voice and multi-dimensional animation content, and present the complete digital person image to the user naturally.

在一些實施例中，可以快速對接不同知識領域的內容服務庫，高效應用到更多行業，同時還可針對不同場景需要，提供超寫實、卡通等多種風格的數位人形象，支持通過人臉識別、手勢識別等AI技術與使用者進行智能互動。例如，超寫實風格的數位人可打造銀行、營業廳、服務大廳的智能前臺，與客戶進行真實有效的觸達，提高服務質量和客戶滿意度。In some embodiments, content service libraries in different fields of knowledge can be quickly connected and efficiently applied to more industries. At the same time, digital human images in various styles such as hyper-realism and cartoons can also be provided according to the needs of different scenarios. , gesture recognition and other AI technologies to intelligently interact with users. For example, hyper-realistic digital people can create intelligent front desks for banks, business halls, and service halls, and communicate with customers in a real and effective manner, improving service quality and customer satisfaction.

在一些實施例中，卡通風格的數位人可應用於以趣味互動為導向的場景，如線下商超中的智能引導員，或者是智能教練、虛擬教師等，達到顧客引流、激發興趣、強化教學效果等目的。In some embodiments, cartoon-style figures can be used in interesting interaction-oriented scenarios, such as smart guides in offline supermarkets, or smart coaches, virtual teachers, etc., to attract customers, stimulate interest, strengthen teaching effect, etc.

本公開至少一個實施例還提供了一種互動裝置，可應用於伺服器。如圖5所示，所述裝置50包括：接收單元501，用於接收來自客戶端的第一消息；獲取單元502，用於基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據；驅動單元503，用於利用所述驅動數據，控制所述客戶端的顯示介面中播放所述互動物件的回應動畫。At least one embodiment of the present disclosure also provides an interactive device, which can be applied to a server. As shown in FIG. 5 , the apparatus 50 includes: a receiving unit 501, configured to receive a first message from a client; and an obtaining unit 502, configured to obtain, based on the indication content included in the first message, matching the indication content The driving unit 503 is configured to use the driving data to control the display interface of the client to play the response animation of the interactive object.

在一些實施例中，獲取單元502用於：獲取針對所述指示內容的應答內容，所述應答內容包括應答文本；基於所述應答文本中所包含的至少一個目標文本，獲取與所述目標文本匹配的互動物件的設定動作的控制參數。In some embodiments, the obtaining unit 502 is configured to: obtain a response content for the indicated content, where the response content includes a response text; and, based on at least one target text included in the response text, obtain a response to the target text The control parameter of the set action of the matching interactive object.

在一些實施例中，獲取單元502用於：獲取針對所述指示內容的應答內容，所述應答內容包括音素序列；獲取與所述音素序列匹配的所述互動物件的控制參數。In some embodiments, the obtaining unit 502 is configured to: obtain response content for the indicated content, where the response content includes a phoneme sequence; and obtain a control parameter of the interactive object matching the phoneme sequence.

在一些實施例中，所述互動物件的控制參數包括至少一個局部區域的姿態控制向量，所述獲取單元502獲取與所述音素序列匹配的互動物件的控制參數時，用於：對所述音素序列進行特徵編碼，獲得所述音素序列對應的第一編碼序列；根據所述第一編碼序列，獲取至少一個音素對應的特徵編碼；獲取所述特徵編碼對應的所述互動物件的至少一個局部區域的姿態控制向量。In some embodiments, the control parameter of the interactive object includes a gesture control vector of at least one local area, and when the obtaining unit 502 obtains the control parameter of the interactive object matching the phoneme sequence, it is used for: controlling the phoneme performing feature encoding on the sequence to obtain a first encoding sequence corresponding to the phoneme sequence; obtaining a feature encoding corresponding to at least one phoneme according to the first encoding sequence; obtaining at least one local area of the interactive object corresponding to the feature encoding The attitude control vector of .

在一些實施例中，所述方法還包括發送單元，用於向所述客戶端發送包括針對所述指示內容的應答內容的指示資訊，以使所述客戶端基於所述指示資訊顯示所述應答內容。In some embodiments, the method further includes a sending unit, configured to send instruction information including response content to the instruction content to the client, so that the client displays the response based on the instruction information content.

在一些實施例中，驅動單元503用於：將所述互動物件的驅動數據發送至所述客戶端，以使所述客戶端根據驅動數據生成回應動畫；控制所述客戶端在顯示介面中播放所述回應動畫；或者，基於所述驅動數據，調整所述互動物件的二維或三維虛擬模型參數；基於調整後的二維或三維虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並向所述客戶端發送所述回應動畫。In some embodiments, the driving unit 503 is configured to: send the driving data of the interactive object to the client, so that the client can generate a response animation according to the driving data; control the client to play in the display interface the response animation; or, based on the driving data, adjust the two-dimensional or three-dimensional virtual model parameters of the interactive object; based on the adjusted two-dimensional or three-dimensional virtual model parameters, use a rendering engine to generate the interactive object's response animation , and send the response animation to the client.

本公開至少一個實施例還提供了另一種互動裝置，可應用於客戶端。如圖6所示，所述裝置60包括：發送單元601，用於響應於來自客戶端的使用者輸入操作，向伺服器發送包括指示內容的第一消息；播放單元602，用於基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面播放所述互動物件的回應動畫。所述互動物件為通過虛擬模型諸如二維或三維虛擬模型渲染得到的。At least one embodiment of the present disclosure further provides another interactive device, which can be applied to a client. As shown in FIG. 6 , the apparatus 60 includes: a sending unit 601 for sending a first message including indicating content to a server in response to a user input operation from a client; a playing unit 602 for sending, based on the server The second message that the browser responds to the first message plays the response animation of the interactive object on the display interface of the client. The interactive objects are rendered through virtual models such as two-dimensional or three-dimensional virtual models.

在一些實施例中，所述指示內容包括文本內容；所述裝置還包括第一顯示單元，用於在所述客戶端的顯示介面中顯示所述文本內容，和/或，確定並播放所述文本內容對應的音訊文件。In some embodiments, the indication content includes text content; the apparatus further includes a first display unit, configured to display the text content in a display interface of the client, and/or determine and play the text The audio file corresponding to the content.

在一些實施例中，所述第一顯示單元在用於在所述客戶端中顯示所述文本內容時，具體用於：生成所述文本內容的彈幕資訊；在所述客戶端的顯示介面中顯示所述彈幕資訊。In some embodiments, when the first display unit is used to display the text content in the client, it is specifically configured to: generate bullet screen information of the text content; in the display interface of the client Display the barrage information.

在一些實施例中，所述第二消息中包括針對所述指示內容的應答文本；所述裝置還包括第二顯示單元，用於在所述客戶端的顯示介面中顯示所述應答文本，和/或，確定並播放所述應答文本對應的音訊文件。In some embodiments, the second message includes a response text to the indicated content; the apparatus further includes a second display unit configured to display the response text in a display interface of the client, and/or Or, determine and play the audio file corresponding to the response text.

在一些實施例中，所述第二消息中包括所述互動物件的驅動數據；所述播放單元602用於：基於所述驅動數據，調整所述互動物件的虛擬模型參數；基於調整後的虛擬模型參數，利用渲染引擎生成所述互動物件的回應動畫，並顯示在所述客戶端的顯示介面中；其中，所述驅動數據包括與針對所述指示內容的應答文本對應的音素序列匹配的用於所述互動物件的控制參數，和/或，與所述應答文本中所包含的至少一個目標文本匹配的所述互動物件的設定動作的控制參數。In some embodiments, the second message includes driving data of the interactive object; the playing unit 602 is configured to: adjust the virtual model parameters of the interactive object based on the driving data; Model parameters, using a rendering engine to generate a response animation of the interactive object, and display it in the display interface of the client; wherein, the driving data includes a phoneme sequence that matches the phoneme sequence corresponding to the response text for the indicated content. The control parameter of the interactive object, and/or the control parameter of the set action of the interactive object that matches at least one target text contained in the response text.

在一些實施例中，所述第二消息包括所述互動物件對所述指示內容做出的回應動畫。In some embodiments, the second message includes a response animation of the interactive object to the indicated content.

在一些實施例中，所述使用者的輸入操作包括，所述使用者跟隨所述顯示介面中顯示的肢體操作畫面做出相應的人體姿態；生成單元601用於：獲取包括所述人體姿態的使用者行為圖像；識別所述使用者行為圖像中的人體姿態資訊，基於所述人體姿態資訊，驅使所述顯示介面顯示的互動物件進行回應。In some embodiments, the user's input operation includes that the user follows the limb operation picture displayed in the display interface to make a corresponding human body posture; the generating unit 601 is configured to: acquire a data including the human body posture User behavior image; recognize the human body posture information in the user behavior image, and drive the interactive object displayed on the display interface to respond based on the human body posture information.

在一些實施例中，生成單元601具體用於：確定所述人體姿態資訊與所述肢體操作畫面中的人體姿態的匹配度；基於所述匹配度，驅動所述顯示介面顯示的互動物件進行回應。In some embodiments, the generating unit 601 is specifically configured to: determine the degree of matching between the human body posture information and the human posture in the body manipulation screen; and, based on the degree of matching, drive the interactive object displayed on the display interface to respond .

在一些實施例中，生成單元601具體用於：在所述匹配度達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第一回應，其中所述第一回應包括顯示姿態合格的肢體動作和/或語音提示；以及顯示下一個肢體操作畫面；在所述匹配度未達到設定條件的情況下，指示所述顯示介面顯示的互動物件做出第二回應，其中所述第二回應包括顯示姿態未合格的肢體動作和/或語音提示；以及保持顯示當前的肢體操作畫面。In some embodiments, the generating unit 601 is specifically configured to: instruct the interactive object displayed on the display interface to make a first response when the matching degree reaches a set condition, wherein the first response includes that the display posture is qualified and display the next body operation screen; if the matching degree does not meet the set condition, instruct the interactive object displayed on the display interface to make a second response, wherein the second Responses include displaying body movements and/or voice prompts for unqualified postures; and keeping the current body manipulation screen displayed.

本公開至少一個實施例還提供了一種電子設備，如圖7所示，電子設備70包括記憶體701和處理器702，所述記憶體701用於儲存可在處理器702上運行的計算機指令，所述處理器702用於在執行所述計算機指令時實現本公開涉及伺服器實施例所述的互動方法。At least one embodiment of the present disclosure further provides an electronic device. As shown in FIG. 7 , the electronic device 70 includes a memory 701 and a processor 702, where the memory 701 is used to store computer instructions that can be executed on the processor 702, The processor 702 is configured to implement the interaction method described in the embodiment of the present disclosure relating to the server when executing the computer instructions.

本說明書至少一個實施例還提供了一種計算機可讀儲存媒體，其上儲存有計算機程式，所述程式被處理器701執行時實現本公開涉及伺服器實施例所述的互動方法。At least one embodiment of the present specification further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by the processor 701, the interactive method described in the server-related embodiment of the present disclosure is implemented.

本公開至少一個實施例還提供了一種電子設備，如圖8所示，電子設備80包括記憶體801和處理器802，所述記憶體801用於儲存可在處理器802上運行的計算機指令，所述處理器802用於在執行所述計算機指令時實現本公開涉及客戶端實施例所述的互動方法。At least one embodiment of the present disclosure also provides an electronic device. As shown in FIG. 8 , the electronic device 80 includes a memory 801 and a processor 802, where the memory 801 is used to store computer instructions that can be executed on the processor 802, The processor 802 is configured to implement the interaction method described in the embodiments of the present disclosure involving the client when executing the computer instructions.

本說明書至少一個實施例還提供了一種計算機可讀儲存媒體，其上儲存有計算機程式，所述程式被處理器802執行時實現本公開涉及客戶端實施例所述的互動方法。At least one embodiment of the present specification further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by the processor 802, the interactive method described in the embodiments of the present disclosure involving the client is implemented.

本領域技術人員應明白，本說明書一個或多個實施例可提供為方法、系統或計算機程式產品。因此，本說明書一個或多個實施例可採用完全硬體實施例、完全軟體實施例或結合軟體和硬體方面的實施例的形式。而且，本說明書一個或多個實施例可採用在一個或多個其中包含有計算機可用程式代碼的計算機可用儲存媒體（包括但不限於磁碟記憶體、CD-ROM、光學記憶體等）上實施的計算機程式產品的形式。As will be appreciated by one skilled in the art, one or more embodiments of this specification may be provided as a method, system or computer program product. Accordingly, one or more embodiments of this specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of this specification may be implemented on one or more computer-usable storage media (including but not limited to disk memory, CD-ROM, optical memory, etc.) having computer-usable program code embodied therein in the form of a computer program product.

本說明書中的各個實施例均採用遞進的方式描述，各個實施例之間相同相似的部分互相參見即可，每個實施例重點說明的都是與其他實施例的不同之處。尤其，對於數據處理設備實施例而言，由於其基本相似於方法實施例，所以描述的比較簡單，相關之處參見方法實施例的部分說明即可。Each embodiment in this specification is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.

上述對本說明書特定實施例進行了描述。其他實施例在所附請求項的範圍內。在一些情況下，在請求項中記載的行為或步驟可以按照不同於實施例中的順序來執行並且仍然可以實現期望的結果。另外，在附圖中描繪的過程不一定要求繪示的特定順序或者連續順序才能實現期望的結果。在某些實施方式中，多任務處理和並行處理也是可以的或者可能是有利的。The foregoing describes specific embodiments of the present specification. Other embodiments are within the scope of the appended claims. In some cases, the acts or steps recited in the claims may be performed in an order different from that of the embodiments and still achieve desirable results. Additionally, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

本說明書中描述的主題及功能操作的實施例可以在以下中實現：數位電子電路、有形體現的計算機軟體或韌體、包括本說明書中公開的結構及其結構性等同物的計算機硬體、或者它們中的一個或多個的組合。本說明書中描述的主題的實施例可以實現為一個或多個計算機程式，即編碼在有形非暫時性程式載體上以被數據處理裝置執行或控制數據處理裝置的操作的計算機程式指令中的一個或多個模組。可替代地或附加地，程式指令可以被編碼在人工生成的傳播訊號上，例如機器生成的電、光或電磁訊號，該訊號被生成以將資訊編碼並傳輸到合適的接收機裝置以由數據處理裝置執行。計算機儲存媒體可以是機器可讀儲存設備、機器可讀儲存基板、隨機或序列存取記憶體設備、或它們中的一個或多個的組合。Embodiments of the subject matter and functional operations described in this specification can be implemented in digital electronic circuits, in tangible embodiment of computer software or firmware, in computer hardware including the structures disclosed in this specification and their structural equivalents, or A combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, a data processing apparatus. Multiple mods. Alternatively or additionally, program instructions may be encoded on artificially generated propagating signals, such as machine-generated electrical, optical or electromagnetic signals, which are generated to encode and transmit information to suitable receiver devices for data retrieval. The processing device executes. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or sequential access memory device, or a combination of one or more of these.

本說明書中描述的處理及邏輯流程可以由執行一個或多個計算機程式的一個或多個可編程計算機執行，以通過根據輸入數據進行操作並生成輸出來執行相應的功能。所述處理及邏輯流程還可以由專用邏輯電路—例如FPGA（現場可編程門陣列）或ASIC（專用積體電路）來執行，並且裝置也可以實現為專用邏輯電路。The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).

適合用於執行計算機程式的計算機包括，例如通用和/或專用微處理器，或任何其他類型的中央處理單元。通常，中央處理單元將從只讀記憶體和/或隨機存取記憶體接收指令和數據。計算機的基本組件包括用於實施或執行指令的中央處理單元以及用於儲存指令和數據的一個或多個記憶體設備。通常，計算機還將包括用於儲存數據的一個或多個大容量儲存設備，例如磁碟、磁光碟或光碟等，或者計算機將可操作地與此大容量儲存設備耦接以從其接收數據或向其傳送數據，抑或兩種情況兼而有之。然而，計算機不是必須具有這樣的設備。此外，計算機可以嵌入在另一設備中，例如移動電話、個人數位助理（PDA）、移動音訊或視訊播放器、遊戲操縱臺、全球定位系統（GPS）接收機、或例如通用序列匯流排（USB）快閃記憶體驅動器的便攜式儲存設備，僅舉幾例。Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from read-only memory and/or random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operably coupled to, such mass storage devices to receive data therefrom or to include, or be operatively coupled to, one or more mass storage devices, such as magnetic disks, magneto-optical disks, or optical disks, for storing data. Send data to it, or both. However, the computer does not have to have such a device. Furthermore, the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB ) flash memory drives for portable storage devices, to name a few.

適合於儲存計算機程式指令和數據的計算機可讀媒體包括所有形式的非揮發性記憶體、媒體和記憶體設備，例如包括半導體記憶體設備（例如EPROM、EEPROM和快閃記憶體設備）、磁碟（例如內部硬碟或可移動碟）、磁光碟以及CD ROM和DVD-ROM。處理器和記憶體可由專用邏輯電路補充或併入專用邏輯電路中。Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media, and memory devices including, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks or removable disks), magneto-optical disks, and CD-ROMs and DVD-ROMs. The processor and memory may be supplemented by or incorporated in special purpose logic circuitry.

雖然本說明書包含許多具體實施細節，但是這些不應被解釋為限制任何發明的範圍或所要求保護的範圍，而是主要用於描述特定發明的具體實施例的特徵。本說明書內在多個實施例中描述的某些特徵也可以在單個實施例中被組合實施。另一方面，在單個實施例中描述的各種特徵也可以在多個實施例中分開實施或以任何合適的子組合來實施。此外，雖然特徵可以如上所述在某些組合中起作用並且甚至最初如此要求保護，但是來自所要求保護的組合中的一個或多個特徵在一些情況下可以從該組合中去除，並且所要求保護的組合可以指向子組合或子組合的變型。While this specification contains many specific implementation details, these should not be construed as limiting the scope of any invention or what may be claimed, but rather are used primarily to describe features of specific embodiments of particular inventions. Certain features that are described in this specification in multiple embodiments can also be implemented in combination in a single embodiment. On the other hand, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function as described above in certain combinations and even be originally claimed as such, one or more features from a claimed combination may in some cases be removed from the combination and the claimed A protected combination may point to a subcombination or a variation of a subcombination.

類似地，雖然在附圖中以特定順序描繪了操作，但是這不應被理解為要求這些操作以所示的特定順序執行或順次執行、或者要求所有例示的操作被執行，以實現期望的結果。在某些情況下，多任務和並行處理可能是有利的。此外，上述實施例中的各種系統模組和組件的分離不應被理解為在所有實施例中均需要這樣的分離，並且應當理解，所描述的程式組件和系統通常可以一起集成在單個軟體產品中，或者封裝成多個軟體產品。Similarly, although operations in the figures are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or sequentially, or that all illustrated operations be performed, to achieve the desired results . In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of various system modules and components in the above-described embodiments should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems may generally be integrated together in a single software product , or packaged into multiple software products.

由此，主題的特定實施例已被描述。其他實施例在所附請求項的範圍以內。在某些情況下，請求項中記載的動作可以以不同的順序執行並且仍實現期望的結果。此外，附圖中描繪的處理並非必需所示的特定順序或順次順序，以實現期望的結果。在某些實現中，多任務和並行處理可能是有利的。Thus, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the actions recited in the claim may be performed in a different order and still achieve the desired result. Furthermore, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

以上所述僅為本說明書一個或多個實施例的一些實施例而已，並不用以限制本說明書一個或多個實施例，凡在本說明書一個或多個實施例的精神和原則之內，所做的任何修改、等同替換、改進等，均應包含在本說明書一個或多個實施例保護的範圍之內。The above descriptions are only some examples of one or more embodiments of this specification, and are not intended to limit one or more embodiments of this specification. All within the spirit and principles of one or more embodiments of this specification, all Any modification, equivalent replacement, improvement, etc. made should be included within the protection scope of one or more embodiments of the present specification.

101:接收來自客戶端的第一消息的步驟 102:基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據的步驟 103:利用所述驅動數據，控制所述客戶端的顯示介面播放所述互動物件的回應動畫的步驟 401:響應於來自客戶端的使用者輸入操作，向伺服器發送包括指示內容的第一消息的步驟 402:基於所述伺服器對所述第一消息回應的第二消息，在所述客戶端的顯示介面播放所述互動物件的回應動畫的步驟 501:接收單元 502:獲取單元 503:驅動單元 50:互動裝置 601:發送單元 602:播放單元 60:互動裝置 702:處理器 701:記憶體 70:電子設備101: Steps of receiving the first message from the client 102: based on the indication content included in the first message, a step of acquiring drive data matching the indication content 103: the step of controlling the display interface of the client to play the response animation of the interactive object using the driving data 401: In response to the user input operation from the client, the step of sending a first message including the indicated content to the server 402: Based on the second message that the server responds to the first message, the step of playing the response animation of the interactive object on the display interface of the client 501: Receiving unit 502: Get unit 503: Drive unit 50: Interactive installations 601: sending unit 602: playback unit 60: Interactive installations 702: Processor 701: Memory 70: Electronic Equipment

圖1繪示根據本公開至少一個實施例的一種互動方法的流程圖。圖2繪示本公開至少一個實施例所提出的互動方法應用於直播過程的示意圖。圖3繪示了本公開至少一個實施例提出的獲得姿態控制向量的方法流程圖。圖4繪示根據本公開至少一個實施例的另一種互動方法的流程圖。圖5繪示根據本公開至少一個實施例的一種互動裝置的結構示意圖。圖6繪示根據本公開至少一個實施例的另一種互動裝置的結構示意圖。圖7繪示根據本公開至少一個實施例的一種電子設備的結構示意圖。圖8繪示根據本公開至少一個實施例的另一種電子設備的結構示意圖。FIG. 1 is a flowchart illustrating an interaction method according to at least one embodiment of the present disclosure. FIG. 2 is a schematic diagram illustrating that the interaction method proposed by at least one embodiment of the present disclosure is applied to a live broadcast process. FIG. 3 is a flowchart of a method for obtaining an attitude control vector provided by at least one embodiment of the present disclosure. FIG. 4 is a flowchart illustrating another interaction method according to at least one embodiment of the present disclosure. FIG. 5 is a schematic structural diagram of an interactive device according to at least one embodiment of the present disclosure. FIG. 6 is a schematic structural diagram of another interactive device according to at least one embodiment of the present disclosure. FIG. 7 is a schematic structural diagram of an electronic device according to at least one embodiment of the present disclosure. FIG. 8 is a schematic structural diagram of another electronic device according to at least one embodiment of the present disclosure.

101:接收來自客戶端的第一消息的步驟101: Steps of receiving the first message from the client

102:基於所述第一消息包括的指示內容，獲取與所述指示內容匹配的驅動數據的步驟102: based on the indication content included in the first message, a step of acquiring drive data matching the indication content

103:利用所述驅動數據，控制所述客戶端的顯示介面播放所述互動物件的回應動畫的步驟103: the step of controlling the display interface of the client to play the response animation of the interactive object using the driving data

Claims

An interaction method, comprising: receiving a first message from a client; obtaining driving data matching the indicated content based on the indication content included in the first message; and controlling the display interface playback of the client by using the driving data The response animation of the interactive object; wherein, acquiring the driving data matching the indication content based on the indication content included in the first message includes: acquiring response content for the indication content, where the response content includes phonemes sequence, and obtain the control parameters of the interactive object matching the phoneme sequence; wherein, the control parameters of the interactive object include gesture control vectors of at least one local area, and obtaining the interactive object matching the phoneme sequence The control parameters include: performing feature encoding on the phoneme sequence to obtain a first encoding sequence corresponding to the phoneme sequence; obtaining a feature encoding corresponding to at least one phoneme according to the first encoding sequence; obtaining the corresponding feature encoding The gesture control vector of at least one local area of the interactive object.

The interaction method according to claim 1, wherein, based on the indication content included in the first message, acquiring the driving data matching the indication content, further comprising: Acquire response content for the indicated content, the response content includes a response text, and based on at least one target text included in the response text, acquire control parameters for setting actions of interactive objects matching the target text.

The interaction method according to claim 1, wherein the using the driving data to control the client to play the response animation of the interactive object in the display interface comprises: sending the driving data of the interactive object to a the client, so that the client generates a response animation according to the driving data; controls the client to play the response animation in the display interface; or, based on the driving data, adjusts the virtual model parameters of the interactive object ; Based on the adjusted virtual model parameters, use a rendering engine to generate a response animation of the interactive object, and send the response animation to the client.

An interaction method, comprising: in response to a user input operation from a client, sending a first message including indicating content to a server; based on a second message responded by the server to the first message, in the client The display interface plays the response animation of the interactive object; wherein, the second message includes the driving data of the interactive object, and the driving data includes the control for the interactive object that matches the phoneme sequence corresponding to the response text parameters, the control parameters of the interactive object include the attitude control vector of at least one local area, and the attitude control vector is obtained by the following methods: performing feature encoding on the phoneme sequence, and obtaining the first number corresponding to the phoneme sequence. a coding sequence; obtaining a feature code corresponding to at least one phoneme according to the first coding sequence; obtaining a gesture control vector of at least one local area of the interactive object corresponding to the feature code.

The interaction method according to claim 4, wherein the indication content includes text content; the method further comprises: displaying the text content in the client, and/or playing audio corresponding to the text content wherein, the displaying the text content in the client includes: generating bullet screen information of the text content; and displaying the bullet screen information on a display interface of the client.

The interaction method according to claim 4, wherein the second message includes a response text for the indicated content; the method further comprises: displaying the response text on a display interface of the client, and/ Or, determine and play the audio file corresponding to the response text.

The interaction method according to any one of claim 4 to 6, wherein the response animation of the interactive object is played in the display interface of the client based on the second message that the server responds to the first message, The method includes: adjusting the virtual model parameters of the interactive object based on the driving data; using a rendering engine to generate a response animation of the interactive object based on the adjusted virtual model parameters, and displaying it on the display interface of the client; wherein , the drive data also includes at least A target text matching control parameter for the set action of the interactive object.

The interactive method according to claim 4, wherein the user's input operation includes: the user follows the limb operation picture displayed in the display interface to make a corresponding human gesture; input operation, the method further includes: acquiring a user behavior image including the human body posture; recognizing the human body posture information in the user behavior image; based on the human body posture information, driving the display interface to display The interactive object responds.

The interaction method according to claim 8, wherein driving the interactive object displayed on the display interface to respond based on the human body posture information comprises: determining the human body posture information and the human body in the limb operation screen The matching degree of the gesture; based on the matching degree, the interactive object displayed on the display interface is driven to respond.

The interaction method according to claim 9, wherein the driving the interactive object to respond based on the matching degree comprises: instructing the interaction displayed on the display interface when the matching degree reaches a set condition The object makes a first response, wherein the first response includes displaying qualified body movements and/or voice prompts; and displaying the next body operation screen; in the case that the matching degree does not meet the set condition, indicating the The interactive object displayed on the display interface makes a second response, wherein the second response includes a display gesture Unqualified body movements and/or voice prompts; and keeping the current body operation screen displayed.

An electronic device, the device includes a memory and a processor, the memory is used to store computer instructions executable on the processor, and the processor is used to implement request items 1 to 1 when executing the computer instructions The interaction method according to any one of 3, or the processor is configured to implement the interaction method according to any one of request items 4 to 10 when executing the computer instructions.

A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, realizes the interactive method described in any one of claim 1 to 3, or when the computer program is executed by the processor. Implement the interaction method described in any one of claim items 4 to 10.