TWI782211B

TWI782211B - Human-computer interaction method and device

Info

Publication number: TWI782211B
Application number: TW108119296A
Authority: TW
Inventors: 榮濤
Original assignee: 開曼群島商創新先進技術有限公司
Priority date: 2018-08-02
Filing date: 2019-06-04
Publication date: 2022-11-01
Also published as: CN109254650A; TW202008143A; CN112925418A; CN109254650B; WO2020024692A1

Abstract

本說明書實施例公開了一種人機互動方法和裝置，上述方法包括：獲取用於指示終端設備執行動作的影像；基於所述影像的影像特徵確定匹配的動作指令；回應於所述動作指令，執行與所述動作指令相匹配的操作。本說明書實施例還公開了另外一種人機互動方法和裝置。The embodiment of this specification discloses a human-computer interaction method and device. The method includes: acquiring an image used to instruct a terminal device to perform an action; determining a matching action instruction based on the image features of the image; responding to the action instruction, executing The action that matches the action command. The embodiment of this specification also discloses another human-computer interaction method and device.

Description

Human-computer interaction method and device

本說明書涉及電腦技術領域，尤其涉及一種人機互動方法和裝置。This specification relates to the field of computer technology, in particular to a method and device for human-computer interaction.

擴增實境(Augmented reality，AR)技術是透過電腦系統提供的資訊增加使用者對現實世界感知，其將虛擬的資訊應用到真實世界，並將電腦產生的虛擬物體、場景或系統提示資訊疊加到真實場景中，從而實現對現實的增強，達到超越現實的感官體驗。虛擬實境(Virtual Reality，VR)透過模擬計算產生出一個與現實場景相同或相似的三維虛擬世界，使用者可以在這個虛擬實境世界中進行遊戲、活動或執行某些特定的操作，整個過程如同在真實世界中進行一般，給使用者提供了視覺、聽覺、觸覺等全方位的模擬體驗。混合現實(Mix reality，MR)技術包括擴增實境和增強虛擬，指的是合併現實和虛擬世界而產生的新的視覺化環境。在新的視覺化環境中，物理和虛擬對象(也即數位對象)共存，並即時互動。目前，AR、VR和MR技術還處於開發階段，與上述技術相關的人機互動技術尚不成熟，因此有必要提供一種人機互動方案。Augmented reality (Augmented reality, AR) technology is to increase the user's perception of the real world through the information provided by the computer system. It applies virtual information to the real world and superimposes virtual objects, scenes or system prompt information generated by the computer. Into the real scene, so as to realize the enhancement of reality and achieve a sensory experience beyond reality. Virtual reality (Virtual Reality, VR) generates a three-dimensional virtual world that is the same or similar to the real scene through simulation calculations. Users can play games, activities or perform certain operations in this virtual reality world. The whole process Just like in the real world, it provides users with a full range of simulation experience in terms of vision, hearing, and touch. Mixed reality (Mix reality, MR) technology includes augmented reality and augmented virtual reality, which refers to a new visual environment generated by merging the real and virtual worlds. In the new visual environment, physical and virtual objects (that is, digital objects) coexist and interact in real time. At present, AR, VR and MR technologies are still in the development stage, and the human-computer interaction technology related to the above technologies is not yet mature, so it is necessary to provide a human-computer interaction solution.

本說明書實施例提供一種人機互動方法和裝置，用於實現人機互動。本說明書實施例採用下述技術方案：第一態樣，提供了一種人機互動方法，包括：獲取用於指示終端設備執行動作的影像；基於所述影像的影像特徵確定匹配的動作指令；回應於所述動作指令，執行與所述動作指令相匹配的操作。第二態樣，提供了一種人機互動方法，應用在接收方，包括：接收來自於發送方的動作指令；回應於所述動作指令，顯示與所述動作指令對應的效果，所述與所述動作指令對應的效果包括下述至少一種：對終端設備的發送方頭像的處理效果和/或對終端設備的接收方頭像的處理效果；對與發送方進行通訊的資訊邊框顏色的處理效果；螢幕振動反轉；或視訊或動畫播放。第三態樣，提供了一種人機互動裝置，包括：影像獲取模組，獲取用於指示終端設備執行動作的影像；動作指令確定模組，基於所述影像的影像特徵確定匹配的動作指令；執行模組，回應於所述動作指令，執行與所述動作指令相匹配的操作。第四態樣，提供了一種人機互動裝置，包括：接收模組，接收來自於發送方的動作指令；效果顯示模組，回應於所述動作指令，顯示與所述動作指令對應的效果，所述與所述動作指令對應的效果包括下述至少一種：對終端設備的發送方頭像的處理效果和/或對終端設備的接收方頭像的處理效果；對與發送方進行通訊的資訊邊框顏色的處理效果；螢幕振動反轉；或視訊或動畫播放。第五態樣，提供了一種電子設備，包括：記憶體、處理器及存儲在所述記憶體上並可在所述處理器上運行的電腦程式，所述電腦程式被所述處理器執行時實現如下操作：獲取用於指示終端設備執行動作的影像；基於所述影像的影像特徵確定匹配的動作指令；回應於所述動作指令，執行與所述動作指令相匹配的操作。第六態樣，提供了一種電子設備，包括：記憶體、處理器及存儲在所述記憶體上並可在所述處理器上運行的電腦程式，所述電腦程式被所述處理器執行時實現如下操作：接收來自於發送方的動作指令；回應於所述動作指令，顯示與所述動作指令對應的效果，所述與所述動作指令對應的效果包括下述至少一種：對終端設備的發送方頭像的處理效果和/或對終端設備的接收方頭像的處理效果；對與發送方進行通訊的資訊邊框顏色的處理效果；螢幕振動反轉；或視訊或動畫播放。第七態樣，提供了一種電腦可讀儲存媒體，所述電腦可讀儲存媒體上存儲有電腦程式，所述電腦程式被處理器執行時實現如下操作：獲取用於指示終端設備執行動作的影像；基於所述影像的影像特徵確定匹配的動作指令；回應於所述動作指令，執行與所述動作指令相匹配的操作。第八態樣，提供了一種電腦可讀儲存媒體，所述電腦可讀儲存媒體上存儲有電腦程式，所述電腦程式被處理器執行時實現如下操作：接收來自於發送方的動作指令；回應於所述動作指令，顯示與所述動作指令對應的效果，所述與所述動作指令對應的效果包括下述至少一種：對終端設備的發送方頭像的處理效果和/或對終端設備的接收方頭像的處理效果；對與發送方進行通訊的資訊邊框顏色的處理效果；螢幕振動反轉；或視訊或動畫播放。本說明書實施例採用的上述至少一個技術方案能夠達到以下有益效果：基於獲取到影像的影像特徵確定匹配的動作指令，並回應於所述動作指令執行與所述動作指令相匹配的操作，實現了基於獲取的影像的人機互動。The embodiments of this specification provide a human-computer interaction method and device for realizing human-computer interaction. The embodiment of this description adopts the following technical solutions: The first aspect provides a human-computer interaction method, including: acquiring an image used to instruct a terminal device to perform an action; determining a matching action command based on the image features of the image; in response to the action command, executing the The action that matches the action command. The second aspect provides a human-computer interaction method, which is applied to the receiver, including: receiving an action command from the sender; responding to the action command, displaying the effect corresponding to the action command, and the The effects corresponding to the above action instructions include at least one of the following: the processing effect on the sender's avatar of the terminal device and/or the processing effect on the receiver's avatar of the terminal device; the processing effect on the color of the information border communicated with the sender; The screen vibrates and inverts; or a video or animation plays. A third aspect provides a human-computer interaction device, including: an image acquisition module, which acquires an image used to instruct a terminal device to perform an action; an action command determination module, which determines a matching action command based on the image features of the image; The execution module, in response to the action command, executes an operation matching the action command. The fourth aspect provides a human-computer interaction device, including: a receiving module that receives an action command from a sender; an effect display module that responds to the action command and displays an effect corresponding to the action command, The effect corresponding to the action instruction includes at least one of the following: the processing effect on the sender's avatar of the terminal device and/or the processing effect on the receiver's avatar of the terminal device; the color of the information border for communication with the sender processing effects; screen vibration inversion; or video or animation playback. A fifth aspect provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, when the computer program is executed by the processor The following operations are implemented: acquiring an image for instructing the terminal device to perform an action; determining a matching action command based on the image features of the image; responding to the action command, performing an operation matching the action command. A sixth aspect provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, when the computer program is executed by the processor The following operations are realized: receiving an action command from the sender; in response to the action command, displaying an effect corresponding to the action command, and the effect corresponding to the action command includes at least one of the following: The processing effect of the sender's avatar and/or the processing effect of the receiver's avatar on the terminal device; the processing effect of the border color of the information communicated with the sender; screen vibration inversion; or video or animation playback. A seventh aspect provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the following operations are realized: acquiring an image for instructing a terminal device to perform an action ; determining a matching action command based on the image features of the image; in response to the action command, performing an operation matching the action command. The eighth aspect provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the following operations are realized: receiving an action command from the sender; responding For the action command, display an effect corresponding to the action command, and the effect corresponding to the action command includes at least one of the following: processing effects on the sender's avatar of the terminal device and/or reception of the terminal device The processing effect of the party's avatar; the processing effect of the border color of the information communicated with the sender; the screen vibration inversion; or the video or animation playback. The above-mentioned at least one technical solution adopted in the embodiment of this specification can achieve the following beneficial effects: determine the matching action command based on the image features of the acquired image, and execute the operation matching the action command in response to the action command, realizing Human-computer interaction based on acquired images.

為使本說明書的目的、技術方案和優點更加清楚，下面將結合本說明書具體實施例及相應的附圖對本說明書技術方案進行清楚、完整地描述。顯然，所描述的實施例僅是本說明書一部分實施例，而不是全部的實施例。基於本說明書中的實施例，本領域普通技術人員在沒有做出進步性勞動前提下所獲得的所有其他實施例，都屬本說明書保護的範圍。如圖1所示，本說明書的一個實施例提供一種人機互動方法100，包括如下步驟： S102：獲取用於指示終端設備執行動作的影像。本說明書實施例中獲取的用於指示終端設備執行動作的影像可以是手勢影像、人臉影像、使用者全身的人體影像或者是使用者身體的局部影像等等，本說明書不作具體限行。本說明書實施例中獲取的影像可以是單張影像，也可以是獲取的視訊流中的多幀影像。另外，該步驟中獲取影像可以是單個使用者的影像，也可以是多個使用者的影像。該步驟可以是從預先存儲的多個影像中獲取影像，也可以是即時獲取得到影像。如果上述影像可以是預先存儲的，這樣，步驟S102可以從存儲的多個影像中獲取一個影像，例如獲取使用者選擇的一個影像。另外，如果上述影像還是即時獲取得到，這樣，步驟S102可以基於終端設備的影像感測器等即時獲取影像。 S104：基於所述影像的影像特徵確定匹配的動作指令。該步驟中的影像特徵和獲取到的影像相對應，具體可以是從獲取到的影像中提取的到，例如，獲取到的是手勢影像，則該處的影像特徵可以是手勢特徵；獲取到的影像是人臉影像，則該處的影像特徵可以是人臉特徵；獲取到的影像是人體影像，則該處的影像特徵可以是人體的姿勢或動作特徵等等。該實施例執行之前，可以預先建立影像特徵和動作指令的映射關係表，這樣，步驟S104則可以直接透過查表的方式確定匹配的動作指令。可選地，在不同的應用場景下，同一個影像特徵還可以對應與不同的動作指令，因此，該實施例執行之前，還可以在不同的場景下，分別建立影像特徵和動作指令的映射關係表，該實施例則可以是在已確定的場景下執行，例如，該實施例可以是在使用者選擇的場景下執行，又例如，該實施例還可以是在基於AR掃描獲取到的場景下執行，或者是在預設的VR環境下執行，又或者是在預設的MR環境下執行，等等。 S106：回應於所述動作指令，執行與所述動作指令相匹配的操作。該步驟中的回應於所述動作指令，執行與所述動作指令相匹配的操作，例如，在單機人機互動的擴增實境場景下，具體可以是基於所述動作指令產生渲染指令；然後以對所述動作指令相關的目標對象進行渲染。另外，在發送方和接收方的聊天場景下，對所述動作指令相關的目標對象進行渲染的同時，還可以向接收方發送所述動作指令，以便接收方基於上述動作指令產生渲染指令，以對所述動作指令相關的目標對象進行渲染。同時，在發送方也顯示上述擴增實境顯示的目標對象。上述提到的目標對象，具體可以是擴增實境場景、虛擬實境場景、混合現實場景等等；另外，本說明書各個實施例提到的顯示效果以及相關的顯示技術可以基於Open CV視覺庫實現。上述提到的向接收方發送所述動作指令，具體可以是將所述動作指令發送至伺服端，再由伺服端向接收方發送所述動作指令；或者是，在不存在伺服端而直接是使用者端對使用者端的場景下，發送方可以直接將所述動作指令發送至接收方。本說明書實施例提供的人機互動方法，基於獲取到的影像的影像特徵確定匹配的動作指令，並回應於所述動作指令執行與所述動作指令相匹配的操作，實現了基於獲取的影像的人機互動。可選地，本說明書的各個實施例還可以應用在AR、VR以及MR等場景下。為詳細說明本說明書實施例提供的人機互動方法，如圖2和圖3所示，本說明書的另一個實施例提供一種人機互動方法200，包括如下步驟： S202：回應於使用者對展示的預設影像的選擇操作，獲取被選擇的手勢影像、人臉影像或人體影像。如圖3的應用介面示意圖所示，該實施例可以預先在顯示介面顯示多個手勢影像，具體見圖3中右側的文字“手勢選擇”下方的方塊，當使用者點擊選擇其中的一個手勢影像時，該步驟即可獲取到了上述手勢影像。可選地，該實施例還可以預先展示多個人臉表情影像、人體動作姿勢影像等，當使用者選取時，該步驟即可獲取上述人臉表情影像或人體動作影像。可選地，上述預先顯示的手勢影像可以包括左手的手勢影像；右手的手勢影像；還可以包括單手握拳或手指合攏的手勢影像；單手放開或手指伸開的手勢影像；以及中指和無名指合攏其他手指伸開的愛的手勢影像等等。上述預先展示的人臉表情影像可以是歡笑的表情影像、悲傷的表情影像、大哭的表情影像等。上述預先展示的人體動作姿勢影像可以是彎腰90度的人體姿勢影像、站軍姿的人體動作姿勢影像等等。 S204：在預設場景下基於選取的影像的影像特徵確定動作指令。該實施例執行之前可以預先存儲上述影像和影像特徵的對應關係，這樣，基於使用者選擇的影像即可直接確定影像特徵，例如，使用者選取的手勢影像是單手握拳的影像，則手勢特徵可以是表示單手握拳的特徵。該實施例執行之前，可以預先建立影像特徵和動作指令的映射關係表，這樣，步驟S204則可以直接透過查表的方式確定匹配的動作指令。可選地，在不同的應用場景下，同一個影像特徵還可以對應與不同的動作指令，因此，該實施例執行之前，還可以在不同的場景下，分別建立影像特徵和動作指令的映射關係表，該實施例則可以是在已確定的場景下執行，例如，該實施例可以是在使用者選擇的場景下執行，又例如，該實施例還可以是在基於AR掃描獲取到的場景下執行，或者是在預設的VR場景下執行，又或者是在預設的MR場景下執行，等等，這樣，該實施例執行之前還可以預先獲取場景影像，在獲取到的場景下執行該實施例。該步驟基於所述影像特徵確定動作指令時，可以先確定當前的應用場景，然後確定在當前應用場景下獲取到的影像特徵對應的動作指令，例如，在單機格鬥遊戲的場景下，基於單手握拳的手勢特徵可以確定出拳的動作指令。 S206：回應於所述動作指令，執行與所述動作指令相匹配的操作。該步驟中的回應於所述動作指令，執行與所述動作指令相匹配的操作具體可以是基於所述動作指令產生渲染指令，對所述動作指令相關的目標對象進行渲染，例如，在圖3中預先顯示的手勢影像左側的方塊內展示強現實、虛擬實境或混合現實的目標對象，展示的目標對象可以是擴增實境、虛擬實境或混合現實場景影像。該步驟中提到的回應於所述動作指令，執行與所述動作指令相匹配的操作之後，還可以向接收方發送所述動作指令，以便接收方基於上述動作指令產生渲染指令，以對所述動作指令相關的目標對象進行渲染。上述提到的向接收方發送所述動作指令，具體可以是將所述動作指令發送至伺服端，再由伺服端向接收方發送所述動作指令；或者是，在不存在伺服端而直接是使用者端對使用者端的場景下，發送方可以直接將所述動作指令發送至接收方。本說明書實施例提供的互動方法，基於獲取到影像的影像特徵確定匹配的動作指令，並回應於所述動作指令執行與所述動作指令相匹配的操作，實現了基於獲取的影像的人機互動。另外，本說明書實施例預先保存有多個手勢影像、人臉影像或人體影像。從而方便使用者快速選取，提高使用者體驗。可選地，在圖3所示的顯示介面中預先展示的手勢影像的順序，或者是其他實施例中的人臉影像或人體影像的顯示順序，可以基於使用者歷史使用頻率進行排序，例如，使用者選擇單手握拳的手勢影像的頻率最高，則將單手握拳的手勢影像排在第一位進行展示，進一步方便使用者選取，提高使用者體驗。需要說明的是，上述實施例還可以同時應用在多個設備多個使用者互動的場景下。具體例如，透過步驟S202獲取甲、乙、丙等使用者從多個展示的手勢影像中選取的手勢影像；透過步驟S204和步驟S206，在預設的甲、乙、丙等互相互動的場景下，基於各自選取的手勢影像的影像特徵向甲、乙、丙等使用者發送上述影像特徵。同時，每個終端設備可以即時獲取每個使用者的手勢影像，如果匹配預先選取的影像特性達到一定契合度，則執行後續邏輯操作，例如甲、乙、丙等終端設備選擇的場景是一個古代廟宇，前面有道石門、當多設備識別到手往前推的動作，石門就會緩緩打開等。在圖2和圖3所示的實施例中預先展示有手勢影像、人臉影像或人體影像等，考慮到展示的影像的數量有限；並且預先展示的影像的內容不夠豐富，為了進一步提高影像的數量，並且提高影像的豐富程度，增強使用者互動，增加使用者互動樂趣，如圖4和圖5所示，本說明書的另一個實施例提供一種人機互動方法400，包括如下步驟： S402：獲取影像特徵，所述影像特徵包括下述至少一種：手勢影像特徵、人臉影像特徵、人體影像特徵以及動作特徵。該實施例可以應用在終端設備上，該終端設備包括有可用於獲取影像的部件，以運行擴增實境應用的終端設備為例，終端設備上用於獲取影像的部件可以包括紅外線攝像頭等，在獲取到影像後基於獲取的影像獲取影像特徵。上述動作特徵，例如包括：出拳的動作特徵、揮手的動作特徵、出掌的動作特徵、跑步的動作特徵、直立靜止的動作特徵、搖頭的動作特徵、點頭的動作特徵等。可選地，該實施例執行之前還可以預先識別應用場景，例如，上述應用場景具體可以包括發送方和接收方相互聊天的場景；網路格鬥遊戲的應用場景；多個終端設備互相聊天互動的場景等。該步驟在獲取影像特徵時，例如獲取手勢特徵時，可使用手勢特徵分類模型獲取手勢特徵。該手勢特徵分類模型的輸入參數可以是獲取到的手勢影像(或者預處理後的手勢影像，下一段進行介紹)，輸出參數可以是手勢特徵。該手勢特徵分類模型可基於支持向量機(Support Vector Machine，SVM))、卷積神經網路(Convolutional Neural Network，簡稱CNN)或DL等算法，透過機器學習的方式產生得到。為了提高手勢特徵的識別精度，可選地，該步驟還可以對獲取到的手勢影像進行預處理，以便去除雜訊。具體地，對手勢影像的預處理操作可包括但不限於：對獲取到的手勢影像進行影像增強；影像二值化；影像灰度化以及去雜訊處理等。對於人臉影像特徵、人體影像特徵以及動作特徵的獲取方式與上述手勢特徵的獲取方式類似，在此不再贅述。該實施例執行之前可以預先獲取手勢影像、人臉影像、人體影像以及動作影像等，然後基於獲取的影像提取手勢影像特徵、人臉影像特徵、人體影像特徵以及動作特徵。可選地，該實施例還可以根據影像特徵精度要求以及性能要求(比如回應速度要求)等來確定是否進行影像預處理，或者確定所採用的影像預處理方法。具體例如，在回應速度要求比較高的網路格鬥遊戲的應用場景下，可以不對手勢影像進行預處理；在對手勢精度要求比較高的場景下，可以對獲取到的影像進行預處理。 S404：在預設場景下基於所述影像特徵以及使用者選取的附加動態特徵確定匹配的動作指令。該實施例執行之前還可以預先獲取場景影像，在獲取到的場景下執行該實施例。該步驟具體基於所述影像特徵以及使用者選取的附加動態特徵確定匹配的動作指令時，可以先確定當前的應用場景，然後確定在當前應用場景下影像特徵以及使用者選取的附加動態特徵對應的動作指令，例如，在單機格鬥遊戲的場景下，基於單手握拳的手勢特徵以及使用者選擇的附加火球的動態特徵，可以確定出拳+火球的動作指令。如圖5的應用介面示意圖所示，該實施例可以預先在顯示介面顯示多個附加動態效果，具體見圖5中右側的文字“附加動態效果”下方的圓形，當使用者點擊選擇其中的一個附加動態效果時，該步驟即可基於所述手勢特徵和所述附加動態效果特徵確定動作指令。該實施例中，選取的附加動態特徵和獲取的影像相對應。在其他的實施例中，如果獲取到的是人臉特徵，這還可以預先在顯示介面顯示多個附加人臉相關的動態效果供使用者選取，當使用者選取時產生附加動態特徵，以對人臉顯示效果等進行增強顯示。在其他的實施例中，如果獲取到的是人體影像特徵或動作特徵，這還可以預先在顯示介面顯示多個附加人體或動作相關的動態效果供使用者選取，當使用者選取時產生附加動態特徵。具體例如，步驟S402中獲取到的是表示單手握拳的手勢特徵，如果不選擇上述附加動態效果(或稱特徵)，則該步驟確定的動作指令僅僅表示出拳的動作指令；如果選擇附加“雪球”的附加動態效果，則該步驟確定的動作指令可以是包括出拳加發射雪球的具有炫酷效果的動作指令。 S406：回應於所述動作指令，執行與所述動作指令相匹配的操作。該步驟中的回應於所述動作指令，執行與所述動作指令相匹配的操作，具體可以是基於所述動作指令產生渲染指令，對所述動作指令相關的目標對象進行渲染，例如，在圖5中左側的方塊內展示擴增實境、虛擬實境或混合現實的目標對象，展示的目標對象可以是擴增實境、虛擬實境或混合現實場景影像。該實施例還可以向接收方發送所述動作指令，以便接收方基於上述動作指令產生渲染指令，以對所述動作指令相關的目標對象進行渲染，當然在發送方也可以同樣展示擴增實境的目標對象。本說明書實施例提供的互動方法，獲取影像特徵，並基於所述影像特徵以及使用者選取的附加動態特徵確定動作指令並回應於所述動作指令，實現基於獲取的影像特徵的人機互動。另外，該實施例基於即時獲取的影像獲取手勢影像特徵、人臉影像特徵、人體影像特徵以及動作特徵等，相對於獲取數量有限的、預先存儲的影像而言，能夠獲取到的影像特徵更加豐富、多樣。同時，透過即時獲取使用者影像並獲取影像特徵的方式，增加使用者的互動，特別是在一些遊戲場景下，提高使用者的融入感和互動性，提高使用者體驗。另外，本說明書實施例預先保存有附加動態效果供使用者選擇，從而方便使用者快速選取，以便與產生更加炫酷的特技效果，提高使用者體驗。可選地，在圖5所示的顯示介面中預先展示的附加動態效果的順序，或者是其他實施例中的對人臉特徵的附加動態效果、或人體特徵的附加動態效果等顯示順序，可以基於使用者歷史使用頻率進行排序，例如，使用者選擇“火球”的頻率最高，參見圖5，則將“火球”的附加動態效果排在第一位進行展示，進一步方便使用者選取，提高使用者體驗。需要說明的是，上述實施例不僅可以應用在單個終端設備的場景下，還可以同時應用在多個設備互動的場景下。如圖6和圖7所示，本說明書的另一個實施例提供一種人機互動方法600，包括如下步驟： S602：獲取使用者選取的場景特徵。該實施例中的場景特徵，具體如圖7的應用介面示意圖所示，該實施例可以預先在顯示介面顯示多個預設場景，例如圖7所示的“阿凡達(avatar)”場景，後續的多個場景以“***”進行示意顯示，當使用者點擊選擇其中的一個場景時，該步驟即相當於是獲取到的場景特徵。另外，在圖7的應用介面還包括有“more”按鈕，當使用者點擊時可以展現更多的預設場景。 S604：基於所述場景特徵以及獲取的影像特徵確定動作指令，所述影像特徵包括下述至少一種：手勢影像特徵、人臉影像特徵、人體影像特徵以及動作特徵。該實施例可以應用在終端設備上，該終端設備包括有可用於獲取影像的部件，以運行擴增實境應用的終端設備為例，終端設備上用於獲取影像的部件可以包括紅外線攝像頭等，並基於獲取的影像獲取影像特徵，具體的獲取過程參見圖4所示的實施例，以下以獲取人臉特徵為例進行介紹。在獲取人臉特徵時，可使用人臉特徵分類模型獲取人臉特徵。該人臉特徵分類模型的輸入參數可以是獲取到的人臉影像(或者預處理後的人臉影像，下一段進行介紹)，輸出參數可以是人臉特徵。該人臉特徵分類模型可基於支持向量機(Support Vector Machine，SVM))、卷積神經網路(Convolutional Neural Network，簡稱CNN)或DL等算法，透過機器學習的方式產生得到。為了提高人臉特徵的識別精度，可選地，該步驟還可以對獲取到的人臉影像進行預處理，以便去除雜訊。具體地，對人臉影像的預處理操作可包括但不限於：對獲取到的人臉影像進行影像增強；影像二值化；影像灰度化以及去雜訊處理等。該步驟基於所述影像特徵和所述場景特徵確定匹配的動作指令時，例如，在具有發送方和接收方的網路聊天的應用場景下，可以將影像特徵和場景特徵融合，如將人臉特徵和場景特徵融合，產生人臉特徵和場景特徵融合的動作指令，具體例如，在使用者選擇的場景中預留有人臉區域，將使用者的人臉特徵融合展示在上述預留的人臉區域，從而實現使用者人臉與選擇的場景的無縫對接，產生使用者真實處於上述場景中的效果，具體如，使用者人在畫中游、上述場景中的角色的臉部變成了使用者的人臉等。該實施例尤其適用於合影、藝術大頭貼、藝術造型、cosplay等應用場景下。 S606：回應於所述動作指令，執行與所述動作指令相匹配的操作。該步驟中的回應於所述動作指令，執行與所述動作指令相匹配的操作，具體可以是基於所述動作指令產生渲染指令，以對所述動作指令相關的目標對象進行渲染；還可以是向接收方發送所述動作指令，以便接收方基於上述動作指令產生渲染指令，對所述動作指令相關的目標對象進行渲染，最終展示擴增實境、虛擬實境或混合現實的目標對象。在上述合影的應用場景下，透過步驟S606的操作之後，還可以將攜帶有人臉特徵和所述場景特徵的資訊發送至接收方，在接收方在獲取接收方的人臉特徵，從而實現發送方的人臉特徵、接收方的人臉特徵以及發送方選擇的場景的融合，便於提高使用者體驗。本說明書實施例提供的互動方法，獲取影像特徵以及場景特徵，基於所述影像特徵和所述場景特徵確定動作指令並回應於所述動作指令，實現了影像特徵和各種預設場景的融合，便於提升使用者體驗。需要說明的是，上述實施例不僅可以應用在單個終端設備的場景下，還可以同時應用在多個設備互動的場景下。另外，該實施例預先存儲有不同的預設場景供使用者選擇，實現了獲取的影像在不同的場景下變幻出不同的造型，增加趣味性，提高使用者體驗。可選地，該實施例還可以保存上述展示的擴增實境、虛擬實境或混合現實的目標對象，方便使用者後續使用。在一個實施例中，可以請求第三方攝影器材從外界拍攝記錄當前終端設備螢幕上所顯示的擴增實境、虛擬實境或混合現實視圖，從而間接實現擴增實境、虛擬實境或混合現實視圖存儲，能夠靈活的獲取使用者所需要存儲的擴增實境、虛擬實境或混合現實視圖。在另一個實施例中，還可以透過截圖的方式擷取並保存使用者在顯示器幕上所看到的擴增實境、虛擬實境或混合現實視圖。該實現方式不僅擷取並存儲螢幕上顯示的所有擴增實境、虛擬實境或混合現實內容，還可以根據使用者需要有選擇的存儲擴增實境、虛擬實境或混合現實視圖。對於本說明書前文圖1至圖7所示的實施例具體應用時，其初始顯示介面可以參見圖8至圖9，使用者點擊最右側的添加按鈕則會出現**Card選項，並且將**Card功能保存在聊天介面中，如圖8所示，該處的**Card可以是AR Card、MR Card或者是VR Card等等。後續使用者使用時，首先可以點擊如圖8所示的**Card按鈕，然後即可以執行圖1至圖7所示的各個實施例的操作步驟；或者，檢測到使用者目前的場景能夠執行前文圖1至圖7所示的實施例的方法步驟時，可以在資訊介面彈出**Card選項以供使用者選擇使用，提高使用者體驗。需要說明的是，圖8和圖9只是示意性地展示了一種觸發執行方式，實際上，前文幾個實施例介紹的方法還可以是由其他方式觸發執行，例如搖一搖終端設備自動執行、透過識別使用者發出的特定語音執行等等，本說明書實施例不作具體限定。如圖10和圖11所示，本說明書的另一個實施例提供一種人機互動方法1000，應用在接收方，包括如下步驟： S1002：接收來自於發送方的動作指令。該實施例中的動作指令，可以是前文中的圖1至圖7所示的實施例中所提到的動作指令，也即，該實施例應用在接收方，其發送方執行的操作可以是如圖1至圖7所示的各個實施例的操作。當然，該實施例中的動作指令也可以是其他的動作指令，即與圖1至圖7所示的各個實施例相互獨立。 S1004：回應於所述動作指令，顯示與所述動作指令對應的效果；其中，所述與所述動作指令對應的效果包括下述至少一種：對終端設備的發送方頭像的處理效果和/或對終端設備的接收方頭像的處理效果；對與發送方進行通訊的資訊邊框顏色的處理效果，對於該處提到的資訊邊框，可以參見圖11，在顯示介面中，網名為***的朋友發送了三條資訊，每一條資訊都包括有資訊邊框。螢幕振動反轉，即整個終端設備螢幕振動並發生反轉；或自動播放視訊、動畫以及語音等，上述動畫包括gif影像。上述視訊具體可以是H264、H265等編碼格式的視訊檔案，接收方接收到上述視訊檔案後即可自動播放；上述動畫具體可以是強化表現人物表情的動畫、畫外音的藝術文字以及一些背景動畫效果等，接收方接收到上述動畫後自動播放。另外，該實施例在發送方的顯示介面還可以顯示接收方三維模型狀態發生變化，具體可以展示接收方身上中彈、接收方身上有雪花等擴增實境、虛擬實境或混合現實等三維顯示效果。此外，該實施例在發送方的顯示介面還可以顯示頭像的處理效果，例如，具體可以是接收方頭像變成烏龜或其他的擴增實境、虛擬實境或混合現實等接收方頭像的三維顯示變化樣式，提高趣味性，增強使用者體驗。上述顯示效果中，在發送方的顯示介面中可以顯示出雙方動作的產生到消亡，以及接收方的狀態、頭像等最後的狀態；在接收方的顯示介面中可以顯示出雙方動作的產生到消亡，通常不會顯示上述接收方的狀態、頭像等最後的狀態，提高趣味性，增強使用者體驗。另外，該實施例還可以接收拖動指令，在顯示介面行動展示的對象等。本說明書實施例提供的人機互動方法，接收來自於發送方的動作指令，並回應於所述動作指令顯示與所述動作指令對應的效果，實現了基於動作指令的人機互動。本說明書實施例提供的人機互動方法，與所述動作指令對應的效果均可以是在三維狀態下展示，具體可以是三維擴增實境、虛擬實境或混合現實展示。在一個具體的實施例中，在發送方的顯示介面中還可以產生如下效果：甲(發送方)發送一個雪球，乙(接收方)發送一個火球，火球和雪球相撞後火球會削弱並飛向甲方，然後甲方影像著火等；又例如，甲方和乙方同時發送火球或同時發送水球，碰撞後會散落成火花或雪花濺落，形成奇幻的藝術效果，提高趣味性，增強使用者體驗。以上說明書部分詳細介紹了人機互動方法實施例，如圖12所示，本說明書還提供了一種人機互動裝置1200，如圖12所示，裝置1200包括：影像獲取模組1202，可以用於獲取用於指示終端設備執行動作的影像；動作指令確定模組1204，可以用於基於所述影像的影像特徵確定匹配的動作指令；執行模組1206，可以用於回應於所述動作指令，執行與所述動作指令相匹配的操作。本說明書實施例提供的互動裝置，基於獲取到影像的影像特徵確定動作指令並回應於所述動作指令，執行與所述動作指令相匹配的操作，實現了基於獲取的影像的人機互動。可選地，作為一個實施例，所述影像獲取模組1202，可以用於回應於使用者對展示的預設影像的選擇操作，獲取被選擇的影像。可選地，作為一個實施例，所述影像獲取模組1202，可以用於透過攝影獲取設備獲取使用者的影像。可選地，作為一個實施例，所述用於指示終端設備執行動作的影像包括手勢影像、人臉影像或人體影像。可選地，作為一個實施例，所述動作指令確定模組1204，可以用於基於所述手勢特徵和獲取的附加動態特徵確定匹配的動作指令。可選地，作為一個實施例，所述動作指令確定模組1204，可以用於在預設場景下，基於所述影像的影像特徵和所述附加動態特徵確定匹配的動作指令。可選地，作為一個實施例，所述動作指令確定模組1204，可以用於基於所述影像的影像特徵和獲取的場景特徵確定匹配的動作指令。可選地，作為一個實施例，所述裝置1200還包括保存模組，可以用於保存所述影像特徵和所述場景特徵。可選地，作為一個實施例，所述執行模組1206，可以用於基於所述動作指令產生渲染指令，以對所述動作指令相關的目標對象進行渲染。可選地，作為一個實施例，所述裝置1200還包括發送模組，可以用於向接收方發送所述動作指令。根據本說明書實施例的上述人機互動裝置1200可以參照對應前文本說明書實施例的圖1至圖9所示的人機互動方法的流程，並且，該人機互動裝置1200中的各個單元/模組和上述其他操作和/或功能分別為了實現人機互動方法中的相應流程，為了簡潔，在此不再贅述。如圖13所示，本說明書還提供了一種人機互動裝置1300，如圖13所示，該裝置1300包括：接收模組1302，可以用於接收來自於發送方的動作指令；效果顯示模組1304，可以用於回應於所述動作指令，顯示與所述動作指令對應的效果，所述與所述動作指令對應的效果包括下述至少一種：對終端設備的發送方頭像的處理效果和/或對終端設備的接收方頭像的處理效果；對與發送方進行通訊的資訊邊框顏色的處理效果；螢幕振動反轉；或視訊或動畫播放。上述視訊具體可以是H264、H265等編碼格式的視訊檔案，或是三維模型及時演算動畫，即接收方接收到上述視訊檔案後即可自動播放；上述動畫具體可以是強化表現人物表情的動畫、畫外音的藝術文字以及一些背景動畫效果等，接收方接收到上述動畫後即可自動播放。另外，該實施例在發送方的顯示介面還可以顯示接收方三維模型狀態發生變化，具體可以是展示接收方身上中彈、接收方身上有雪花等擴增實境、虛擬實境或混合現實等三維顯示效果。此外，該實施例在發送方的顯示介面還可以顯示接收方的頭像的處理效果例如，具體可以是接收方頭像變成烏龜或其他的擴增實境、虛擬實境或混合現實等接收方頭像的三維顯示變化樣式，提高趣味性，增強使用者體驗。上述顯示效果中，在發送方的顯示介面中可以顯示出雙方動作的產生到消亡，以及接收方的狀態、頭像等最後的狀態；在接收方的顯示介面中可以顯示出雙方動作的產生到消亡，通常不會顯示上述接收方的狀態、頭像等最後的狀態，提高趣味性，增強使用者體驗。本說明書實施例提供的人機互動裝置，接收來自於發送方的動作指令，並回應於所述動作指令顯示與所述動作指令對應的效果，實現了基於接收的動作指令的人機互動。根據本說明書實施例的上述人機互動裝置1300可以參照對應前文本說明書實施例的圖10至圖11所示的人機互動方法的流程，並且，該人機互動裝置1300中的各個單元/模組和上述其他操作和/或功能分別為了實現人機互動方法中的相應流程，為了簡潔，在此不再贅述。本說明書上述各個實施例能夠實現的效果具體可以參見圖14，在使用者輸入時，不僅實現了文本輸入、語音輸入、圖片輸入和短視訊輸入，還可以實現人臉識別、動作識別、場景識別等，並根據識別的人臉、動作和場景等變幻出不同的效果發送。使用者接收時，不僅實現了普通的文本展示、語音播放、圖片動態播放短視訊播放等，還實現了狀態發生變化、動畫聲音播放螢幕震動回饋等效果，上述狀態發生變化，例如包括發送方身上中彈、發送方頭像變成烏龜、動態更換背景等。下面將結合圖15詳細描述根據本說明書實施例的電子設備。參考圖15，在硬體層面，電子設備包括處理器，可選地，包括內部匯流排、網路介面、記憶體。其中，如圖15所示，記憶體可能包含記憶體，例如高速隨機存取記憶體(Random-Access Memory，RAM)，也可能還包括非揮發性記憶體(non-volatile memory)，例如至少1個磁碟記憶體等。當然，該電子設備還可能包括實現其他業務所需要的硬體。處理器、網路介面和記憶體可以透過內部匯流排相互連接，該內部匯流排可以是工業標準架構(Industry Standard Architecture，ISA)匯流排、週邊組件互連(Peripheral Component Interconnect，PCI)匯流排或延伸工業標準架構(Extended Industry Standard Architecture，EISA)匯流排等。所述匯流排可以分為位址匯流排、資料匯流排、控制匯流排等。為便於表示，圖15中僅用一個雙向箭頭表示，但並不表示僅有一根匯流排或一種類型的匯流排。記憶體，用於存放程式。具體地，程式可以包括程式程式碼，所述程式程式碼包括電腦操作指令。記憶體可以包括記憶體和非揮發性記憶體，並向處理器提供指令和資料。處理器從非揮發性記憶體中讀取對應的電腦程式到記憶體中然後運行，在邏輯層面上形成轉發聊天資訊的裝置。處理器，執行記憶體所存放的程式，並具體用於執行本說明書前文所述的方法實施例的操作。上述圖1至圖11所示實施例揭示的方法、裝置執行的方法可以應用於處理器中，或者由處理器實現。處理器可能是一種積體電路晶片，具有信號的處理能力。在實現過程中，上述方法的各步驟可以透過處理器中的硬體的集成邏輯電路或者軟體形式的指令完成。上述的處理器可以是通用處理器，包括中央處理器(Central Processing Unit，CPU)、網路處理器(Network Processor，NP)等；還可以是數位信號處理器(Digital Signal Processor，DSP)、特殊應用積體電路(Application Specific Integrated Circuit，ASIC)、現場可程式化閘陣列(Field－Programmable Gate Array，FPGA)或者其他可程式化邏輯裝置、分散式閘極或者電晶體邏輯裝置、分立硬體組件。可以實現或者執行本說明書實施例中的公開的各方法、步驟及邏輯方塊圖。通用處理器可以是微處理器或者該處理器也可以是任何常規的處理器等。結合本說明書實施例所公開的方法的步驟可以直接體現為硬體解碼處理器執行完成，或者用解碼處理器中的硬體及軟體模組組合執行完成。軟體模組可以位於隨機記憶體，快閃記憶體、唯讀記憶體，可程式化唯讀記憶體或者電可抹除可程式化記憶體、暫存器等本領域成熟的儲存媒體中。該儲存媒體位於記憶體，處理器讀取記憶體中的資訊，結合其硬體完成上述方法的步驟。圖15所示的電子設備還可執行圖1至圖11的方法，並實現人機互動方法在圖1至圖11所示實施例的功能，本說明書實施例在此不再贅述。當然，除了軟體實現方式之外，本說明書的電子設備並不排除其他實現方式，比如邏輯裝置抑或軟硬體結合的方式等等，也就是說以下處理流程的執行主體並不限定於各個邏輯單元，也可以是硬體或邏輯裝置。本說明書實施例還提供一種電腦可讀儲存媒體，電腦可讀儲存媒體上存儲有電腦程式，該電腦程式被處理器執行時實現上述圖1至圖11所示的各個方法實施例的各個過程，且能達到相同的技術效果，為避免重複，這裡不再贅述。其中，所述的電腦可讀儲存媒體，如唯讀記憶體(Read-Only Memory，簡稱ROM)、隨機存取記憶體(Random Access Memory，簡稱RAM)、磁碟或者光碟等。本領域內的技術人員應明白，本說明書的實施例可提供為方法、系統、或電腦程式產品。因此，本說明書可採用完全硬體實施例、完全軟體實施例、或結合軟體和硬體態樣的實施例的形式。而且，本說明書可採用在一個或多個其中包含有電腦可用程式程式碼的電腦可用儲存媒體(包括但不限於磁碟記憶體、CD-ROM、光學記憶體等)上實施的電腦程式產品的形式。本說明書是參照根據本說明書實施例的方法、設備(系統)、和電腦程式產品的流程圖和／或方塊圖來描述的。應理解可由電腦程式指令實現流程圖和／或方塊圖中的每一流程和／或方塊、以及流程圖和／或方塊圖中的流程和／或方塊的結合。可提供這些電腦程式指令到通用電腦、專用電腦、嵌入式處理機或其他可程式化資料處理設備的處理器以產生一個機器，使得透過電腦或其他可程式化資料處理設備的處理器執行的指令產生用於實現在流程圖一個流程或多個流程和／或方塊圖一個方塊或多個方塊中指定的功能的裝置。這些電腦程式指令也可存儲在能引導電腦或其他可程式化資料處理設備以特定方式工作的電腦可讀記憶體中，使得存儲在該電腦可讀記憶體中的指令產生包括指令裝置的製造品，該指令裝置實現在流程圖一個流程或多個流程和／或方塊圖一個方塊或多個方塊中指定的功能。這些電腦程式指令也可裝載到電腦或其他可程式化資料處理設備上，使得在電腦或其他可程式化設備上執行一系列操作步驟以產生電腦實現的處理，從而在電腦或其他可程式化設備上執行的指令提供用於實現在流程圖一個流程或多個流程和／或方塊圖一個方塊或多個方塊中指定的功能的步驟。在一個典型的配置中，電腦設備包括一個或多個處理器(CPU)、輸入/輸出介面、網路介面和記憶體。記憶體可能包括電腦可讀媒體中的非永久性記憶體，隨機存取記憶體(RAM)和/或非揮發性記憶體等形式，如唯讀記憶體(ROM)或快閃記憶體(flash RAM)。記憶體是電腦可讀媒體的示例。電腦可讀媒體包括永久性和非永久性、可行動和非可行動媒體可以由任何方法或技術來實現資訊存儲。資訊可以是電腦可讀指令、資料結構、程式的模組或其他資料。電腦的儲存媒體的例子包括，但不限於相變記憶體(PRAM)、靜態隨機存取記憶體(SRAM)、動態隨機存取記憶體(DRAM)、其他類型的隨機存取記憶體(RAM)、唯讀記憶體(ROM)、電可擦除可程式化唯讀記憶體(EEPROM)、快閃記憶體或其他記憶體技術、唯讀光碟唯讀記憶體(CD-ROM)、數位多功能光碟(DVD)或其他光學儲存器、磁卡式磁帶，磁帶式磁碟儲存器磁碟或其他磁性存儲設備或任何其他非傳輸媒體，可用於存儲可以被電腦設備存取的資訊。按照本文中的界定，電腦可讀媒體不包括暫存電腦可讀媒體(transitory media)，如調變的資料信號和載波。還需要說明的是，術語“包括”、“包含”或者其任何其他變體意在涵蓋非排他性的包含，從而使得包括一系列要素的過程、方法、商品或者設備不僅包括那些要素，而且還包括沒有明確列出的其他要素，或者是還包括為這種過程、方法、商品或者設備所固有的要素。在沒有更多限制的情況下，由語句“包括一個……”限定的要素，並不排除在包括要素的過程、方法、商品或者設備中還存在另外的相同要素。以上僅為本說明書的實施例而已，並不用於限制本說明書。對於本領域技術人員來說，本說明書可以有各種更改和變化。凡在本說明書的精神和原理之內所作的任何修改、等同替換、改進等，均應包含在本說明書的申請專利範圍之內。 In order to make the purpose, technical solution and advantages of this specification clearer, the technical solution of this specification will be clearly and completely described below in conjunction with specific embodiments of this specification and corresponding drawings. Apparently, the described embodiments are only some of the embodiments in this specification, not all of them. Based on the embodiments in this specification, all other embodiments obtained by persons of ordinary skill in the art without making progressive efforts fall within the protection scope of this specification. As shown in FIG. 1 , an embodiment of this specification provides a human-computer interaction method 100, including the following steps: S102: Acquire an image for instructing a terminal device to perform an action. The image obtained in the embodiment of this specification for instructing the terminal device to perform an action may be a gesture image, a face image, a whole-body human body image of the user, or a partial image of the user's body, etc., which are not specifically limited in this specification. The image acquired in the embodiment of this specification may be a single image, or may be multiple frames of images in the acquired video stream. In addition, the images acquired in this step may be images of a single user, or images of multiple users. This step may be to acquire images from a plurality of pre-stored images, or to acquire images in real time. If the above-mentioned images can be stored in advance, in this way, step S102 can acquire an image from a plurality of stored images, for example, acquire an image selected by the user. In addition, if the above-mentioned image is obtained in real time, then step S102 can obtain the image in real time based on the image sensor of the terminal device or the like. S104: Determine a matching action instruction based on the image features of the image. The image feature in this step corresponds to the acquired image, specifically, it can be extracted from the acquired image, for example, if what is acquired is a gesture image, then the image feature at this place can be a gesture feature; the acquired If the image is a human face image, the image feature at this location may be a human face feature; if the acquired image is a human body image, then the image feature at this location may be a posture or action feature of the human body, and the like. Before the implementation of this embodiment, a mapping relationship table between image features and action commands can be established in advance, so that step S104 can directly determine the matching action command through table lookup. Optionally, in different application scenarios, the same image feature can also correspond to different action instructions. Therefore, before the implementation of this embodiment, the mapping relationship between image features and action instructions can be established in different scenarios. Table, this embodiment can be executed in a determined scene, for example, this embodiment can be executed in a scene selected by the user, and for example, this embodiment can also be executed in a scene obtained based on AR scanning Execute, or execute in a preset VR environment, or execute in a preset MR environment, and so on. S106: In response to the action instruction, perform an operation matching the action instruction. In this step, in response to the action instruction, perform an operation that matches the action instruction, for example, in a stand-alone human-computer interaction augmented reality scenario, specifically, a rendering instruction may be generated based on the action instruction; and then to render the target object related to the action instruction. In addition, in the chat scene between the sender and the receiver, while rendering the target object related to the action command, the action command can also be sent to the receiver, so that the receiver can generate a rendering command based on the above action command, so as to Render the target object related to the action instruction. At the same time, the target object displayed in the augmented reality is also displayed on the sending side. The target objects mentioned above can specifically be augmented reality scenes, virtual reality scenes, mixed reality scenes, etc.; in addition, the display effects and related display technologies mentioned in various embodiments of this specification can be based on the Open CV vision library accomplish. The aforementioned sending of the action command to the receiver may specifically be sending the action command to the server, and then the server sends the action command to the receiver; or, if there is no server but directly In a client-to-client scenario, the sender can directly send the action command to the receiver. The human-computer interaction method provided by the embodiment of this specification determines the matching action command based on the image features of the acquired image, and executes the operation matching the action command in response to the action command, realizing the human-computer interaction method based on the acquired image. Human-Computer Interaction. Optionally, various embodiments of this specification may also be applied in scenarios such as AR, VR, and MR. In order to describe the human-computer interaction method provided by the embodiment of this specification in detail, as shown in Figure 2 and Figure 3, another embodiment of this specification provides a human-computer interaction method 200, including the following steps: S202: Respond to the user's response to the display The selection operation of the preset image is to obtain the selected gesture image, face image or human body image. As shown in the schematic diagram of the application interface in Figure 3, this embodiment can pre-display multiple gesture images on the display interface, specifically see the box below the text "gesture selection" on the right side of Figure 3, when the user clicks to select one of the gesture images , the above gesture image can be obtained in this step. Optionally, this embodiment can also pre-display a plurality of facial expression images, human body action posture images, etc. When the user selects, this step can obtain the above-mentioned facial expression images or human body action images. Optionally, the gesture image displayed in advance may include a gesture image of the left hand; a gesture image of the right hand; it may also include a gesture image of making a fist with one hand or closing fingers; a gesture image of releasing one hand or stretching fingers; The image of love gestures such as the ring finger closing and other fingers stretching out. The facial expression images shown in advance may be images of laughing, sad, or crying facial expressions. The above-mentioned pre-shown human body action posture image may be a human body posture image bent at 90 degrees, a human body action posture image standing in a military posture, and the like. S204: Determine an action instruction based on image features of the selected image in a preset scene. Before the implementation of this embodiment, the corresponding relationship between the above-mentioned images and image features can be stored in advance, so that the image features can be directly determined based on the image selected by the user. Could be a feature representing a fist with one hand. Before the implementation of this embodiment, a mapping relationship table between image features and action commands can be established in advance, so that step S204 can directly determine the matching action command through table lookup. Optionally, in different application scenarios, the same image feature can also correspond to different action instructions. Therefore, before the implementation of this embodiment, the mapping relationship between image features and action instructions can be established in different scenarios. Table, this embodiment can be executed in a determined scene, for example, this embodiment can be executed in a scene selected by the user, and for example, this embodiment can also be executed in a scene obtained based on AR scanning Execute, or execute in a preset VR scene, or execute in a preset MR scene, etc. In this way, the scene image can also be obtained in advance before the execution of this embodiment, and the scene image is executed in the obtained scene Example. When determining the action command based on the image features in this step, the current application scene can be determined first, and then the action command corresponding to the image features acquired in the current application scene can be determined. The gesture feature of making a fist can determine the action instruction of punching. S206: In response to the action instruction, perform an operation matching the action instruction. In this step, in response to the action instruction, performing an operation matching the action instruction may be specifically generating a rendering instruction based on the action instruction, and rendering the target object related to the action instruction, for example, in FIG. 3 The target object of augmented reality, virtual reality or mixed reality is displayed in the box on the left of the pre-displayed gesture image in , and the displayed target object can be an augmented reality, virtual reality or mixed reality scene image. In response to the action instruction mentioned in this step, after performing the operation matching the action instruction, the action instruction can also be sent to the recipient, so that the recipient can generate a rendering instruction based on the above action instruction to The target object related to the above action command is rendered. The aforementioned sending of the action command to the receiver may specifically be sending the action command to the server, and then the server sends the action command to the receiver; or, if there is no server but directly In a client-to-client scenario, the sender can directly send the action command to the receiver. The interaction method provided by the embodiment of this specification determines the matching action command based on the image features of the acquired image, and executes the operation matching the action command in response to the action command, realizing the human-computer interaction based on the acquired image . In addition, in the embodiment of the present specification, a plurality of gesture images, face images or human body images are stored in advance. Therefore, it is convenient for the user to quickly select and improve the user experience. Optionally, the order of gesture images displayed in advance in the display interface shown in FIG. 3 , or the display order of face images or human body images in other embodiments, can be sorted based on the user's historical usage frequency, for example, If the user selects the gesture image of one-handed fist most frequently, the gesture image of one-handed fist will be displayed in the first place, which further facilitates the selection of the user and improves the user experience. It should be noted that the above embodiments can also be applied simultaneously in a scenario where multiple devices and multiple users interact. Specifically, for example, through step S202, the gesture images selected by users A, B, and C from multiple displayed gesture images are obtained; through steps S204 and S206, in the preset scene where A, B, and C interact , sending the image features to users A, B, C, etc. based on the image features of the gesture images selected respectively. At the same time, each terminal device can obtain the gesture image of each user in real time, and if it matches the pre-selected image characteristics to a certain degree, it will perform subsequent logical operations. For example, the scene selected by terminal devices such as A, B, and C is an ancient There is a stone gate in front of the temple, and when the multi-device recognizes the movement of the hand pushing forward, the stone gate will slowly open, etc. In the embodiment shown in Fig. 2 and Fig. 3, gesture image, human face image or human body image etc. are displayed in advance, considering that the number of images displayed is limited; and the content of images displayed in advance is not rich enough, in order to further improve the image and increase the richness of images, enhance user interaction, and increase user interaction fun, as shown in Figure 4 and Figure 5, another embodiment of this specification provides a human-computer interaction method 400, including the following steps: S402: Image features are acquired, and the image features include at least one of the following: gesture image features, face image features, human body image features, and action features. This embodiment can be applied to a terminal device, and the terminal device includes components that can be used to acquire images. Taking a terminal device running an augmented reality application as an example, the components used to acquire images on the terminal device can include an infrared camera, etc. After the image is acquired, image features are acquired based on the acquired image. The above-mentioned action features include, for example: action features of punching, waving, palming, running, standing still, shaking head, and nodding. Optionally, before this embodiment is executed, the application scenarios can also be pre-identified. For example, the above application scenarios can specifically include the scenario where the sender and the receiver chat with each other; the application scenario of online fighting games; the scenario where multiple terminal devices chat and interact with each other. scene etc. In this step, when acquiring image features, for example, acquiring gesture features, gesture features may be acquired using a gesture feature classification model. The input parameter of the gesture feature classification model may be the acquired gesture image (or the preprocessed gesture image, which will be introduced in the next paragraph), and the output parameter may be the gesture feature. The gesture feature classification model can be generated through machine learning based on support vector machine (Support Vector Machine, SVM)), convolutional neural network (Convolutional Neural Network, CNN) or DL and other algorithms. In order to improve the recognition accuracy of gesture features, optionally, this step may also perform preprocessing on the acquired gesture images, so as to remove noise. Specifically, the preprocessing operations on the gesture image may include but not limited to: image enhancement on the acquired gesture image; image binarization; image grayscale and noise removal processing, etc. The methods of acquiring face image features, human body image features, and motion features are similar to those of the above-mentioned gesture features, and will not be repeated here. Gesture images, face images, body images, and action images can be acquired in advance before execution of this embodiment, and then gesture image features, face image features, body image features, and action features can be extracted based on the acquired images. Optionally, this embodiment may also determine whether to perform image preprocessing or determine the adopted image preprocessing method according to image feature accuracy requirements and performance requirements (such as response speed requirements). Specifically, for example, in an application scenario of an online fighting game that requires a relatively high response speed, the gesture image may not be preprocessed; in a scenario that requires a relatively high gesture accuracy, the acquired image may be preprocessed. S404: Determine a matching action instruction based on the image feature and the additional dynamic feature selected by the user in a preset scene. Before this embodiment is executed, scene images may also be acquired in advance, and this embodiment is executed under the acquired scene. In this step, when determining the matching action instruction based on the image features and the additional dynamic features selected by the user, the current application scene can be determined first, and then the image features and the additional dynamic features selected by the user corresponding to the current application scene can be determined. Action commands, for example, in the scenario of a stand-alone fighting game, based on the gesture characteristics of one-handed fist and the dynamic characteristics of the additional fireball selected by the user, the action command of punching + fireball can be determined. As shown in the schematic diagram of the application interface in Figure 5, this embodiment can display multiple additional dynamic effects on the display interface in advance, specifically see the circle below the text "Additional Dynamic Effects" on the right side of Figure 5, when the user clicks to select one of the If there is an additional dynamic effect, this step can determine the action instruction based on the gesture feature and the additional dynamic effect feature. In this embodiment, the selected additional dynamic feature corresponds to the acquired image. In other embodiments, if the facial features are obtained, a plurality of additional dynamic effects related to human faces can be displayed on the display interface in advance for the user to select, and additional dynamic features will be generated when the user selects, so as to Enhanced display of face display effects, etc. In other embodiments, if the image features or action features of the human body are obtained, multiple additional dynamic effects related to the human body or action can be displayed on the display interface in advance for the user to select, and additional dynamic effects will be generated when the user selects feature. Specifically, for example, what is obtained in step S402 is the gesture feature representing a fist with one hand. If the above-mentioned additional dynamic effect (or feature) is not selected, the action command determined in this step only represents the action command of punching; If there is an additional dynamic effect of "Snowball", the action instruction determined in this step may be an action instruction with a cool effect including punching and launching a snowball. S406: In response to the action instruction, perform an operation matching the action instruction. In this step, in response to the action instruction, perform an operation that matches the action instruction, specifically, generate a rendering instruction based on the action instruction, and render the target object related to the action instruction, for example, in FIG. The target object of augmented reality, virtual reality or mixed reality is displayed in the left box in 5, and the displayed target object can be an image of an augmented reality, virtual reality or mixed reality scene. This embodiment can also send the action command to the receiver, so that the receiver can generate a rendering command based on the above action command to render the target object related to the action command. Of course, the sender can also display the augmented reality target audience. The interaction method provided by the embodiment of this specification acquires image features, determines action commands based on the image features and additional dynamic features selected by the user, and responds to the action commands to realize human-computer interaction based on the acquired image features. In addition, this embodiment acquires gesture image features, face image features, human body image features, and action features based on the images acquired in real time. Compared with the limited number of pre-stored images, the image features that can be acquired are more abundant. , various. At the same time, through the way of real-time acquisition of user images and image features, user interaction is increased, especially in some game scenarios, the user's sense of integration and interactivity is improved, and user experience is improved. In addition, the embodiment of this specification pre-stores additional dynamic effects for users to choose, so that users can quickly select them, so as to generate more cool special effects and improve user experience. Optionally, the order of the additional dynamic effects displayed in advance in the display interface shown in FIG. 5 , or the display order of the additional dynamic effects on human face features or the additional dynamic effects on human body features in other embodiments, can be Sorting is based on the user's historical use frequency. For example, the user chooses "Fireball" most frequently, see Figure 5, and the additional dynamic effect of "Fireball" is displayed in the first place, which further facilitates the user's selection and improves the user experience. experience. It should be noted that, the above-mentioned embodiments can not only be applied in the scenario of a single terminal device, but also can be applied in the scenario of interaction between multiple devices at the same time. As shown in FIG. 6 and FIG. 7 , another embodiment of the present specification provides a human-computer interaction method 600 , including the following steps: S602 : Acquire scene features selected by the user. The scene features in this embodiment are specifically shown in the schematic diagram of the application interface in FIG. Multiple scenes are schematically displayed with "***". When the user clicks to select one of the scenes, this step is equivalent to the obtained scene features. In addition, the application interface in FIG. 7 also includes a "more" button, which can display more preset scenes when the user clicks it. S604: Determine an action instruction based on the scene features and the acquired image features, where the image features include at least one of the following: gesture image features, face image features, human body image features, and action features. This embodiment can be applied to a terminal device, and the terminal device includes components that can be used to acquire images. Taking a terminal device running an augmented reality application as an example, the components used to acquire images on the terminal device can include an infrared camera, etc. The image features are acquired based on the acquired images. For the specific acquisition process, refer to the embodiment shown in FIG. When acquiring facial features, facial features can be acquired using a facial feature classification model. The input parameter of the face feature classification model may be the acquired face image (or the preprocessed face image, which will be introduced in the next paragraph), and the output parameter may be the face feature. The face feature classification model can be generated through machine learning based on algorithms such as Support Vector Machine (Support Vector Machine, SVM), convolutional neural network (Convolutional Neural Network, CNN for short), or DL. In order to improve the recognition accuracy of facial features, optionally, this step may also perform preprocessing on the acquired facial images, so as to remove noise. Specifically, the preprocessing operations on the face image may include but not limited to: image enhancement on the acquired face image; image binarization; image grayscale and noise removal processing, etc. When this step determines the matching action instruction based on the image features and the scene features, for example, in the application scenario of an online chat with a sender and a receiver, the image features and scene features can be fused, such as the human face Features and scene features are fused to generate action commands for the fusion of face features and scene features. area, so as to realize the seamless connection between the user's face and the selected scene, and produce the effect that the user is actually in the above scene, specifically, the user is swimming in the picture, and the face of the character in the above scene becomes the face etc. This embodiment is especially suitable for application scenarios such as group photos, artistic photo stickers, artistic modeling, and cosplay. S606: In response to the action instruction, perform an operation matching the action instruction. In this step, in response to the action instruction, perform an operation matching the action instruction, specifically, generate a rendering instruction based on the action instruction to render the target object related to the action instruction; it may also be Sending the action instruction to the recipient, so that the recipient generates a rendering instruction based on the above action instruction, renders the target object related to the action instruction, and finally displays the augmented reality, virtual reality or mixed reality target object. In the application scenario of the above-mentioned group photo, after the operation of step S606, the information carrying the facial features and the scene features can also be sent to the receiving party, and the receiving party is acquiring the facial features of the receiving party, so as to realize the The facial features of the receiving party, the facial features of the receiving party, and the scene selected by the sending party are integrated to facilitate the improvement of user experience. The interaction method provided by the embodiment of this specification acquires image features and scene features, determines action commands based on the image features and scene features and responds to the action commands, realizes the fusion of image features and various preset scenes, and facilitates Improve user experience. It should be noted that, the above-mentioned embodiments can not only be applied in the scenario of a single terminal device, but also can be applied in the scenario of interaction between multiple devices at the same time. In addition, in this embodiment, different preset scenes are pre-stored for the user to choose, so that the acquired images can change into different shapes in different scenes, which increases interest and improves user experience. Optionally, this embodiment may also save the target object of the augmented reality, virtual reality or mixed reality shown above for the convenience of the user for subsequent use. In one embodiment, third-party photography equipment can be requested to shoot and record the augmented reality, virtual reality or mixed reality view displayed on the screen of the current terminal device from the outside, so as to indirectly realize the augmented reality, virtual reality or mixed reality view. Reality view storage can flexibly acquire augmented reality, virtual reality or mixed reality views that users need to store. In another embodiment, the augmented reality, virtual reality or mixed reality view that the user sees on the display screen can also be captured and saved in the form of a screenshot. This implementation method not only captures and stores all augmented reality, virtual reality or mixed reality content displayed on the screen, but also selectively stores augmented reality, virtual reality or mixed reality views according to user needs. For the specific application of the embodiment shown in Figure 1 to Figure 7 in this manual, its initial display interface can refer to Figure 8 to Figure 9, the user clicks the Add button on the far right and the **Card option will appear, and ** The Card function is saved in the chat interface, as shown in Figure 8, where **Card can be AR Card, MR Card or VR Card, etc. When using it later, the user can first click the **Card button as shown in Figure 8, and then execute the operation steps of each embodiment shown in Figure 1 to Figure 7; or, it is detected that the current scene of the user can execute In the method steps of the embodiments shown in FIGS. 1 to 7 above, the **Card option can be popped up on the information interface for the user to choose and use, so as to improve the user experience. It should be noted that Fig. 8 and Fig. 9 only schematically show a way of triggering execution. In fact, the methods introduced in the previous embodiments can also be triggered by other ways, such as shaking the terminal device to automatically execute, The embodiment of the present specification does not specifically limit the implementation by recognizing the specific voice issued by the user. As shown in FIG. 10 and FIG. 11 , another embodiment of this specification provides a human-computer interaction method 1000, which is applied on the receiver, and includes the following steps: S1002: Receive an action instruction from the sender. The action command in this embodiment can be the action command mentioned in the embodiment shown in Figure 1 to Figure 7 above, that is, this embodiment is applied to the receiver, and the operation performed by the sender can be Operation of various embodiments as shown in FIGS. 1 to 7 . Of course, the action command in this embodiment may also be other action commands, that is, it is independent from the various embodiments shown in FIGS. 1 to 7 . S1004: In response to the action instruction, display an effect corresponding to the action instruction; wherein, the effect corresponding to the action instruction includes at least one of the following: a processing effect on the avatar of the sender of the terminal device and/or The processing effect of the receiver's avatar on the terminal device; the processing effect of the color of the information frame communicating with the sender. For the information frame mentioned here, please refer to Figure 11. In the display interface, the screen name is *** A friend of has sent three messages, each of which includes a message border. Screen vibration reversal, that is, the entire terminal device screen vibrates and reverses; or automatically plays videos, animations and voices, etc., and the above animations include gif images. The above-mentioned video can specifically be a video file in H264, H265 and other encoding formats, and the receiver can play it automatically after receiving the above-mentioned video file; the above-mentioned animation can specifically be an animation that strengthens the expression of the character’s expression, artistic text for voice-over, and some background animation effects, etc. , the receiver automatically plays the above animation after receiving it. In addition, in this embodiment, the display interface of the sender can also display the change of the state of the 3D model of the receiver. Specifically, it can show the 3D model of the receiver being shot, snowflakes on the receiver, etc., such as augmented reality, virtual reality, or mixed reality. display effect. In addition, this embodiment can also display the processing effect of the avatar on the display interface of the sender, for example, specifically, it can be that the avatar of the recipient becomes a turtle or other three-dimensional display of the avatar of the recipient such as augmented reality, virtual reality or mixed reality Change styles, improve fun, and enhance user experience. In the above display effect, the display interface of the sender can display the actions from generation to extinction of both parties, as well as the final status of the receiver’s status, avatar, etc.; the display interface of the receiver can display the actions from generation to extinction of both parties , usually does not display the last status of the recipient, such as the status and avatar, to improve fun and enhance user experience. In addition, this embodiment can also receive a drag command, move and display an object on the display interface, and the like. The human-computer interaction method provided by the embodiment of this specification receives an action command from the sender, and displays an effect corresponding to the action command in response to the action command, thereby realizing the human-computer interaction based on the action command. In the human-computer interaction method provided in the embodiments of this specification, the effects corresponding to the action instructions can be displayed in a three-dimensional state, specifically three-dimensional augmented reality, virtual reality or mixed reality display. In a specific embodiment, the following effects can also be produced in the display interface of the sender: A (sender) sends a snowball, B (receiver) sends a fireball, and the fireball will weaken after the fireball collides with the snowball. And fly to Party A, and then Party A’s image catches fire, etc.; for another example, Party A and Party B send fireballs or water balloons at the same time, and after the collision, they will scatter into sparks or snowflakes, forming a fantastic artistic effect, improving fun and enhancing use experience. The above specification part introduces the embodiment of the human-computer interaction method in detail, as shown in Figure 12, this specification also provides a human-computer interaction device 1200, as shown in Figure 12, the device 1200 includes: Obtain an image for instructing the terminal device to perform an action; the action command determination module 1204 can be used to determine a matching action command based on the image features of the image; the execution module 1206 can be used to respond to the action command, execute The action that matches the action command. The interactive device provided in the embodiment of this specification determines an action command based on the image features of the acquired image, responds to the action command, and executes an operation matching the action command, thereby realizing human-computer interaction based on the acquired image. Optionally, as an embodiment, the image acquisition module 1202 may be configured to acquire the selected image in response to the user's selection operation on the displayed preset image. Optionally, as an embodiment, the image acquisition module 1202 may be used to acquire images of users through photographic acquisition devices. Optionally, as an embodiment, the image used to instruct the terminal device to perform an action includes a gesture image, a face image, or a human body image. Optionally, as an embodiment, the action instruction determining module 1204 may be configured to determine a matching action instruction based on the gesture feature and the acquired additional dynamic feature. Optionally, as an embodiment, the action instruction determining module 1204 may be configured to determine a matching action instruction based on the image features of the image and the additional dynamic features in a preset scene. Optionally, as an embodiment, the action instruction determining module 1204 may be configured to determine a matching action instruction based on the image features of the image and the acquired scene features. Optionally, as an embodiment, the apparatus 1200 further includes a saving module, which can be used to save the image features and the scene features. Optionally, as an embodiment, the execution module 1206 may be configured to generate a rendering instruction based on the action instruction, so as to render a target object related to the action instruction. Optionally, as an embodiment, the apparatus 1200 further includes a sending module, which can be used to send the action instruction to the receiver. The human-computer interaction device 1200 according to the embodiment of this specification can refer to the flow of the human-computer interaction method shown in Figure 1 to Figure 9 corresponding to the embodiment of the previous text specification, and each unit/module in the human-computer interaction device 1200 The group and the above-mentioned other operations and/or functions are to realize the corresponding processes in the human-computer interaction method, and for the sake of brevity, details are not repeated here. As shown in Figure 13, this specification also provides a human-computer interaction device 1300, as shown in Figure 13, the device 1300 includes: a receiving module 1302, which can be used to receive action instructions from the sender; an effect display module 1304, may be configured to display an effect corresponding to the action instruction in response to the action instruction, where the effect corresponding to the action instruction includes at least one of the following: a processing effect on the avatar of the sender of the terminal device and/or Or the processing effect on the avatar of the receiving party on the terminal device; the processing effect on the color of the border of the information communicated with the sender; the screen vibration reversal; or video or animation playback. The above-mentioned video can be a video file in encoding format such as H264 or H265, or a 3D model can calculate animation in real time, that is, the receiving party can play the video automatically after receiving the above-mentioned video file; Artistic text and some background animation effects, etc., the receiver can play the animation automatically after receiving the above animation. In addition, in this embodiment, the display interface of the sender can also display the change of the state of the 3D model of the receiver, which can specifically show augmented reality, virtual reality or mixed reality, such as the receiver being shot, snowflakes on the receiver, etc. 3D display effect. In addition, this embodiment can also display the processing effect of the recipient's avatar on the display interface of the sender. For example, it can specifically be that the recipient's avatar becomes a turtle or other augmented reality, virtual reality, or mixed reality. The three-dimensional display changes styles to improve interest and enhance user experience. In the above display effect, the display interface of the sender can display the actions from generation to extinction of both parties, as well as the final status of the receiver’s status, avatar, etc.; the display interface of the receiver can display the actions from generation to extinction of both parties , usually does not display the last status of the recipient, such as the status and avatar, to improve fun and enhance user experience. The human-computer interaction device provided by the embodiment of this specification receives an action command from the sender, and responds to the action command to display an effect corresponding to the action command, thereby realizing the human-computer interaction based on the received action command. The human-computer interaction device 1300 according to the embodiment of this specification can refer to the flow of the human-computer interaction method shown in Figure 10 to Figure 11 corresponding to the embodiment of the previous text specification, and each unit/module in the human-computer interaction device 1300 The group and the above-mentioned other operations and/or functions are to realize the corresponding processes in the human-computer interaction method, and for the sake of brevity, details are not repeated here. The effects that can be achieved by the above-mentioned embodiments of this specification can be seen in Figure 14. When the user inputs, not only text input, voice input, picture input and short video input are realized, but also face recognition, action recognition, and scene recognition can be realized. etc., and send different effects according to the recognized faces, actions and scenes. When the user receives it, it not only realizes ordinary text display, voice playback, picture dynamic playback, short video playback, etc., but also realizes effects such as status changes, animation sound playback, and screen vibration feedback. Being shot, the sender's avatar becomes a turtle, the background is dynamically changed, etc. The electronic device according to the embodiment of this specification will be described in detail below with reference to FIG. 15 . Referring to FIG. 15 , at the hardware level, the electronic device includes a processor, and optionally includes an internal bus, a network interface, and a memory. Wherein, as shown in FIG. 15, the memory may include a memory, such as a high-speed random-access memory (Random-Access Memory, RAM), and may also include a non-volatile memory (non-volatile memory), such as at least 1 disk memory, etc. Of course, the electronic device may also include hardware needed to implement other services. The processor, network interface, and memory can be interconnected via an internal bus, which can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) busbar, etc. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one double-headed arrow is used in FIG. 15 , but it does not mean that there is only one bus bar or one type of bus bar. Memory, used to store programs. Specifically, the program may include program code, and the program code includes computer operation instructions. Memory can include both RAM and non-volatile memory and provides instructions and data to the processor. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs it, forming a device for forwarding chat information on a logical level. The processor executes the programs stored in the memory, and is specifically configured to execute the operations of the method embodiments described above in this specification. The methods disclosed in the above embodiments shown in FIGS. 1 to 11 and the methods executed by the device may be applied to or implemented by a processor. A processor may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above-mentioned method can be completed through an integrated logic circuit of hardware in a processor or instructions in the form of software. The above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processor, DSP), special Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, distributed gate or transistor logic devices, discrete hardware components . The methods, steps and logic block diagrams disclosed in the embodiments of this specification can be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the methods disclosed in the embodiments of this specification can be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or scratchpad. The storage medium is located in the memory, and the processor reads the information in the memory, and combines with its hardware to complete the steps of the above method. The electronic device shown in FIG. 15 can also execute the method in FIG. 1 to FIG. 11 and realize the functions of the human-computer interaction method in the embodiment shown in FIG. 1 to FIG. Of course, in addition to the software implementation, the electronic equipment in this specification does not exclude other implementations, such as logic devices or the combination of software and hardware, etc., that is to say, the execution subject of the following processing flow is not limited to each logic unit , which can also be a hardware or logic device. The embodiment of this specification also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, each process of each method embodiment shown in the above-mentioned Figures 1 to 11 is realized. And can achieve the same technical effect, in order to avoid repetition, no more details here. Wherein, the computer-readable storage medium is, for example, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), a magnetic disk or an optical disk, and the like. Those skilled in the art should understand that the embodiments of this specification may be provided as methods, systems, or computer program products. Accordingly, this description may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this description may employ the concept of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk memory, CD-ROM, optical memory, etc.) having computer-usable program code embodied therein. form. The specification is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It should be understood that each process and/or block in the flowchart and/or block diagram, and combinations of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processors of general-purpose computers, special-purpose computers, embedded processors, or other programmable data processing devices to produce a machine so that the instructions executed by the processor of the computer or other programmable data processing devices Produce means for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram. These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means , the instruction device implements the functions specified in one or more procedures of the flow chart and/or one or more blocks of the block diagram. These computer program instructions may also be loaded into a computer or other programmable data processing device, so that a series of operational steps are performed on the computer or other programmable device to produce computer-implemented processing, thereby generating The instructions executed above provide steps for implementing the functions specified in the procedure or procedures of the flowchart and/or the block or blocks of the block diagram. In a typical configuration, a computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. Memory may include non-permanent memory in computer readable media, random access memory (RAM) and/or forms of nonvolatile memory such as read only memory (ROM) or flash memory (flash RAM). The memory is an example of a computer readable medium. Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for computers include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM) , read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD-ROM), digital multifunction A compact disk (DVD) or other optical storage, magnetic cassette, magnetic tape storage disk or other magnetic storage device or any other non-transmission medium used to store information that can be accessed by computer equipment. As defined herein, computer readable media does not include transitory computer readable media, such as modulated data signals and carrier waves. It should also be noted that the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes Other elements not expressly listed, or elements inherent in the process, method, commodity, or apparatus are also included. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus that includes the element. The above are only examples of this specification, and are not intended to limit this specification. For those skilled in the art, various modifications and changes may occur in this description. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this specification shall be included in the scope of patent application of this specification.

100‧‧‧人機互動方法 S102‧‧‧步驟 S104‧‧‧步驟 S106‧‧‧步驟 200‧‧‧人機互動方法 S202‧‧‧步驟 S204‧‧‧步驟 S206‧‧‧步驟 400‧‧‧人機互動方法 S402‧‧‧步驟 S404‧‧‧步驟 S406‧‧‧步驟 600‧‧‧人機互動方法 S602‧‧‧步驟 S604‧‧‧步驟 S606‧‧‧步驟 1000‧‧‧人機互動方法 S1002‧‧‧步驟 S1004‧‧‧步驟 1200‧‧‧人機互動裝置 1202‧‧‧影像獲取模組 1204‧‧‧動作指令確定模組 1206‧‧‧執行模組 1300‧‧‧人機互動裝置 1302‧‧‧接收模組 1304‧‧‧效果顯示模組100‧‧‧Human-computer interaction method S102‧‧‧step S104‧‧‧step S106‧‧‧step 200‧‧‧Human-computer interaction method S202‧‧‧step S204‧‧‧step S206‧‧‧step 400‧‧‧Human-computer interaction method S402‧‧‧step S404‧‧‧step S406‧‧‧step 600‧‧‧Human-computer interaction method S602‧‧‧step S604‧‧‧step S606‧‧‧step 1000‧‧‧Human-computer interaction method S1002‧‧‧step S1004‧‧‧step 1200‧‧‧Human-computer interaction device 1202‧‧‧Image acquisition module 1204‧‧‧Action command determination module 1206‧‧‧execution module 1300‧‧‧Human-computer interaction device 1302‧‧‧Receiving module 1304‧‧‧Effect display module

此處所說明的附圖用來提供對本說明書的進一步理解，構成本說明書的一部分，本說明書的示意性實施例及其說明用於解釋本說明書，並不構成對本說明書的不當限定。在附圖中：圖1為本說明書的一個實施例提供的人機互動方法流程示意圖；圖2為本說明書的另一個實施例提供的人機互動方法流程示意圖；圖3為圖2所示的實施例中的顯示介面示意圖；圖4為本說明書的再一個實施例提供的人機互動方法流程示意圖；圖5為圖4所示的實施例中的顯示介面示意圖；圖6為本說明書的又一個實施例提供的人機互動方法流程示意圖；圖7為圖6所示的實施例中的顯示介面示意圖；圖8為本說明書的一個實施例提供的人機互動方法初始介面示意圖；圖9為本說明書的一個實施例提供的人機互動方法初始介面另一示意圖；圖10為本說明書的下一個實施例提供的人機互動方法流程示意圖；圖11為圖10所示的實施例中的顯示介面示意圖；圖12為本說明書的一個實施例提供的人機互動裝置結構示意圖；圖13為本說明書的另一個實施例提供的人機互動裝置結構示意圖；圖14本說明書各個實施例能夠實現的效果示意圖。圖15為實現本說明書各個實施例的電子設備硬體結構示意圖。The drawings described here are used to provide a further understanding of this specification and constitute a part of this specification. The schematic embodiments and descriptions of this specification are used to explain this specification and do not constitute an improper limitation of this specification. In the attached picture: FIG. 1 is a schematic flow chart of a human-computer interaction method provided by an embodiment of this specification; FIG. 2 is a schematic flow chart of a human-computer interaction method provided by another embodiment of this specification; FIG. 3 is a schematic diagram of a display interface in the embodiment shown in FIG. 2; FIG. 4 is a schematic flow chart of a human-computer interaction method provided by another embodiment of this specification; FIG. 5 is a schematic diagram of a display interface in the embodiment shown in FIG. 4; FIG. 6 is a schematic flowchart of a human-computer interaction method provided by another embodiment of this specification; FIG. 7 is a schematic diagram of a display interface in the embodiment shown in FIG. 6; FIG. 8 is a schematic diagram of the initial interface of the human-computer interaction method provided by an embodiment of this specification; FIG. 9 is another schematic diagram of the initial interface of the human-computer interaction method provided by an embodiment of this specification; Fig. 10 is a schematic flow chart of the human-computer interaction method provided by the next embodiment of this specification; FIG. 11 is a schematic diagram of a display interface in the embodiment shown in FIG. 10; Fig. 12 is a schematic structural diagram of a human-computer interaction device provided by an embodiment of this specification; Fig. 13 is a schematic structural diagram of a human-computer interaction device provided by another embodiment of this specification; Fig. 14 is a schematic diagram of the effects that can be achieved by various embodiments of this specification. FIG. 15 is a schematic diagram of a hardware structure of an electronic device implementing various embodiments of this specification.

Claims

A method of human-computer interaction, which is applied to multiple user-interactive communication scenarios, including: in the communication application interface, acquiring the scene features selected by the user; responding to the user's response to the default image displayed in the communication application interface The selection operation to obtain the selected images, the order of the preset images is sorted based on the user’s historical use frequency, and the images are used to instruct the terminal device to perform actions; according to the image characteristics and action instructions preset for different scene characteristics The mapping relationship, based on the image features of the image and the scene features to which the image is applied, determines the action command that matches and reaches a predetermined degree of fit with the interacting user; generates a rendering command based on the action command, and then In the communication application interface, the target object related to the action instruction is rendered.

According to the method described in Claim 1, the preset image includes a gesture image, a human face image or a human body image.

According to the method described in claim 2, according to the mapping relationship between image features and action instructions preset for different scene features, the matching is determined based on the image features of the image and the scene features to which the image is applied, and Before the action command to achieve a predetermined fit with the interacting users, the method further includes: Obtain additional dynamic features related to the image; wherein, according to the mapping relationship between image features and action instructions preset for different scene features, determine based on the image features of the image and the scene features to which the image is applied The action command that is matched and reaches a predetermined degree of fit with the interacting users includes: according to the mapping relationship between image features and action commands preset for different scene features, based on the image features of the image, the additional dynamic features, An action instruction that matches the scene feature to which the image is applied and reaches a predetermined degree of fit with the interacting users is determined.

According to the method described in claim 1, the method further includes: saving the image features and the scene features.

According to the method described in claim 1, the method further includes: sending the action instruction to a receiver.

A method of human-computer interaction, applied to a receiver in a communication scene where multiple users interact, including: receiving an action command from a sender, the action command includes the mutually interacting sender and receiver users respectively communicating In the application interface, the image features of the selected image and the respectively selected scene features to which the image is applied determine an action command that matches and reaches a predetermined degree of fit; in response to the action command, display the action command Corresponding effects aimed at the communication application interface; Wherein, the effect corresponding to the action instruction for the communication application interface includes at least one of the following: a processing effect on the sender's avatar of the terminal device and/or a processing effect on the receiver's avatar of the terminal device; The processing effect on the border color of the message communicated with the sender; the screen vibration inversion; or the playback of video or animation.

A human-computer interaction device, which is applied to multiple user-interactive communication scenarios, including: an image acquisition module, which acquires scene features selected by the user in the communication application interface; The selection operation of the displayed preset image obtains the selected image, and the order of the preset images is sorted based on the user’s historical use frequency; the action command determination module is based on the preset image features and action commands for different scene characteristics The mapping relationship, based on the image features of the image and the scene features to which the image is applied, determines the action command that matches and reaches a predetermined degree of fit with the interacting users; the execution module generates an action command based on the action command The rendering instruction is used to render the target object related to the action instruction in the communication application interface.

A human-computer interaction device, applied to a receiver in multiple user interaction scenarios, including: The receiving module receives an action command from the sender, and the action command includes the image characteristics of the image selected by the interactive sender and the receiver user in the communication application interface, and the image applied to the image, The respectively selected scene features determine matching action commands that reach a predetermined degree of fit; the effect display module, in response to the action commands, displays the effect corresponding to the action commands for the communication application interface; wherein, The effect corresponding to the action instruction and aimed at the communication application interface includes at least one of the following: a processing effect on the sender's avatar of the terminal device and/or a processing effect on the receiver's avatar of the terminal device; The processing effect of the border color of the message communicated by the sender; the inversion of screen vibration; or the playback of video or animation.

An electronic device, comprising: a memory, a processor, and a computer program stored in the memory and operable on the processor. When the computer program is executed by the processor, the following operations are realized: in communication applications In the interface, obtain the scene feature selected by the user; respond to the user's selection operation on the default image displayed in the communication application interface, obtain the selected image, and the sequence of the preset image is based on the user history Using frequency sorting, the image is used to instruct the terminal device to perform an action; according to the mapping between image features and action instructions preset for different scene features Based on the image features of the image and the scene features to which the image is applied, an action command that matches and reaches a predetermined degree of fit with the interacting user is determined; based on the action command, a rendering command is generated, and the In the communication application interface, the target object related to the action command is rendered.

An electronic device, comprising: a memory, a processor, and a computer program stored in the memory and operable on the processor. When the computer program is executed by the processor, the following operations are realized: receiving information from The action command of the sender, the action command includes the image features of the image selected by the interacting sender and the receiver user in the communication application interface, and the respectively selected scene features applied to the image to determine the match an action command that reaches a predetermined degree of fit; in response to the action command, display the effect corresponding to the action command for the communication application interface; wherein, the effect corresponding to the action command for the The effect of the communication application interface includes at least one of the following: the processing effect of the sender's avatar of the terminal device and/or the processing effect of the receiver's avatar of the terminal device; the processing effect of the border color of the message communicated with the sender; The screen vibrates and inverts; or a video or animation plays.

A computer-readable storage medium, the computer-readable storage medium stores There is a computer program, and when the computer program is executed by the processor, the following operations are realized: in the communication application interface, the scene feature selected by the user is obtained; in response to the user's response to the preset image displayed in the communication application interface The selection operation is to obtain the selected image. The order of the preset images is sorted based on the user's historical usage frequency. The images are used to instruct the terminal device to perform actions; according to the preset image characteristics and action instructions for different scene characteristics The mapping relationship is based on the image features of the image and the scene features to which the image is applied to determine an action command that matches and achieves a predetermined degree of fit with the interacting user; generates a rendering command based on the action command, and in the In the communication application interface, the target object related to the action command is rendered.

A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the following operations are realized: receiving an action command from a sender, and the action command includes mutually interactive In the communication application interface, the user on the sending side and the receiving side determine the matching of the image features of the selected image and the respectively selected scene features applied to the image, and reach a predetermined degree of fit; respond to the The action instruction, displaying the effect corresponding to the action instruction for the communication application interface; wherein, the effect corresponding to the action instruction for the communication application interface The effect of the user interface includes at least one of the following: the processing effect on the sender's avatar of the terminal device and/or the processing effect on the receiver's avatar of the terminal device; the processing effect on the border color of the message communicated with the sender; screen vibration Reverse; or video or animation playback.