CN113703585A

CN113703585A - Interaction method, interaction device, electronic equipment and storage medium

Info

Publication number: CN113703585A
Application number: CN202111115768.XA
Authority: CN
Inventors: 李慧
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2021-11-26

Abstract

The application discloses an interaction method, an interaction device, electronic equipment and a storage medium. The interaction method comprises the following steps: the method comprises the steps of obtaining a face image of a user, detecting the face image to perform face recognition to obtain a recognition result, generating corresponding greeting audio and greeting actions according to the recognition result, and controlling a digital person to play the greeting audio and display the greeting actions. According to the interactive method, the corresponding greeting audio frequency and greeting actions can be generated according to different users through face image recognition, interaction with the users can be carried out through the virtual digital persons, and therefore user experience is improved.

Description

Interaction method, interaction device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an interaction method, an interaction apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development of information technology, digital people are more and more widely applied. Various convenient services can be provided for human beings through digital people. However, in most current scenes, the human-computer interaction mode based on the virtual image is single.

Disclosure of Invention

In view of this, the present application provides an interaction method, an interaction apparatus, a voice interaction system, an electronic device, and a computer-readable storage medium.

The application provides an interaction method, which is characterized by comprising the following steps:

acquiring a face image of a user;

detecting the face image to perform face recognition to obtain a recognition result;

generating corresponding greeting audio frequency and greeting actions according to the identification result; and

and controlling the digital person to play the greeting audio and displaying the greeting action.

In some embodiments, the interaction method further comprises:

acquiring a voice request of a user;

generating text data according to the voice request, and performing intention understanding query and request query to obtain a response result;

performing voice synthesis on the response result through a voice cloud service to generate response audio;

and controlling the digital person to play the response audio.

In some embodiments, the generating text data from the voice request and performing an intent understanding query and a request query to obtain a response result comprises:

judging the type of the voice request;

under the condition that the voice request is of a single-round type, adopting FAQ and KBQA to carry out intention understanding inquiry and request inquiry on the text data so as to obtain a response result;

and in the case that the voice request is of a multi-round type, performing intention understanding query and request query on the text data by adopting RASA to obtain a response result.

In some embodiments, the generating text data according to the voice request and performing an intention understanding query and a request query to obtain a response result includes:

determining a target interaction scene matched with the voice request according to the voice request;

determining the problem type of the voice request according to the voice request and the target interaction scene;

and obtaining a response result according to the question type and the voice request.

In some embodiments, the detecting the face image for face recognition to obtain a recognition result includes:

extracting the face characteristic points;

and matching the face feature points with a preset face feature library to obtain the recognition result.

In some embodiments, the generating corresponding greeting audio and greeting actions according to the recognition result comprises:

determining a greeting scene according to the identification result;

determining greetings according to the greeting scenes through a digital human cloud service;

and generating the greeting voice frequency corresponding to the greeting by the voice cloud service.

In some embodiments, the interaction method further comprises:

responding to a first input of a user to display a digital human control;

generating greeting control instructions in response to a second input by the user to the digital person control;

generating a dialog control instruction in response to a third input to the digital human control by the user;

generating action control instructions in response to a fourth input to the digital human control by the user, the action control instructions including greeting action instructions and dialog action instructions.

The present application further provides an interaction apparatus, the interaction apparatus comprising:

the acquisition module is used for acquiring a face image of a user;

the recognition module is used for detecting the face image so as to perform face recognition to obtain a recognition result;

the generating module is used for generating corresponding greeting audio according to the identification result; and

and the control module is used for controlling the digital person to play the greeting audio and displaying the greeting action.

The application also provides an electronic device, which includes a memory and a processor, wherein the memory stores a computer program, and the computer program is executed by the processor to implement the interaction method of any one of the above items.

The present application also provides a non-transitory computer-readable storage medium of a computer program, which, when executed by one or more processors, implements the interaction method described in any of the above embodiments.

According to the interaction method, the interaction device, the voice interaction system, the electronic equipment and the computer readable storage medium, the face image of the user is acquired, the image of the face image is detected, the face recognition is carried out to obtain the recognition result, the corresponding greeting audio frequency and greeting actions are generated according to the recognition result, and finally the greeting audio frequency is played through the digital person and the greeting actions are displayed. Therefore, a multi-mode interaction system is formed by integrating the modes of face recognition, action feedback and the like, the intelligence and the friendliness of the digital people are improved, and the user experience is improved.

Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart diagram of an interaction method of some embodiments of the present application;

FIG. 2 is a block diagram of an interaction device according to some embodiments of the present application;

FIG. 3 is a scene diagram of an interaction method of some embodiments of the present application;

FIG. 4 is a schematic view of a scenario of an electronic device according to some embodiments of the present application;

FIGS. 5-7 are flow diagrams of an interaction method according to some embodiments of the present application;

FIG. 8 is a schematic diagram of yet another module of an interaction device according to some embodiments of the present application;

FIGS. 9-11 are flow diagrams of an interaction method according to some embodiments of the present application;

fig. 12-14 are schematic diagrams of scenarios of digital human controls according to some embodiments of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the embodiments of the present application, and are not to be construed as limiting the embodiments of the present application.

At present, under the drive of new theoretical and new technologies such as internet of things, big data, sensor networks, brain science and the like and strong demands of economic and social development, artificial intelligence is developed in an accelerated manner, and is deeply integrated with various industries in various fields, so that new characteristics such as deep learning, cross-border integration, man-machine cooperation, crowd intelligence development, autonomous control and the like are presented. The virtual image is arranged in some intelligent equipment or applications, so that visual interaction with a user is realized through the virtual image, and the human-computer interaction experience of the user is improved. Nowadays, digital virtual people are widely applied in the fields of games, entertainment and movies. With the development of information technology, digital people are more and more widely applied. Various convenient services can be provided for human beings through digital people. However, in most current scenes, the human-computer interaction mode based on the virtual image is single, and the delivery form of the digital virtual human needs to be personified and multi-modal upgraded.

In view of this, referring to fig. 1, the present application provides an interaction method, which includes:

01: acquiring a face image of a user;

02: detecting the face image to perform face recognition to obtain a recognition result;

03: generating corresponding greeting audio frequency and greeting actions according to the identification result; and

04: and controlling the digital person to play greeting audio and displaying greeting actions.

Correspondingly, referring to fig. 2, an interaction apparatus 100 is further provided in the embodiments of the present application, and the interaction method in the embodiments of the present application may be implemented by the interaction apparatus 100.

The interaction device 100 includes an acquisition module 110, a recognition module 120, a generation module 130, and a control module 140. Step 01 may be implemented by the obtaining module 110, step 02 may be implemented by the identifying module 120, step 03 may be implemented by the generating module 130, and step 04 may be implemented by the controlling module 140. Alternatively, the obtaining module 110 is used to obtain a face image of the user. The recognition module 120 is configured to detect a face image to perform face recognition to obtain a recognition result. The generating module 130 is configured to generate corresponding greeting audio and greeting actions according to the recognition result. The control module 140 is used for controlling the digital person to play the greeting audio and displaying the greeting action.

The embodiment of the application also provides the electronic equipment. The electronic device includes a memory and a processor. The memory has stored therein a computer program. The processor is used for acquiring a face image of a user, detecting the face image to perform face recognition to obtain a recognition result, generating corresponding greeting audio and greeting actions according to the recognition result, and controlling the digital person to play the greeting audio and display the greeting actions.

According to the interaction method, the interaction device and the electronic equipment, the face image of the user is acquired, the image detection is carried out on the face image, the face recognition is carried out to obtain the recognition result, the corresponding greeting audio frequency and the greeting action are generated according to the recognition result, and finally the greeting audio frequency is played through the digital person and the greeting action is displayed. Therefore, a multi-mode interaction system is formed by integrating the modes of face recognition, action feedback and the like, the intelligence and the friendliness of the digital people are improved, and the user experience is improved.

In particular, the electronic device includes a screen having a graphical user interface display, and a voice recognition device capable of voice interaction. Electronic devices may include, but are not limited to, robots, computers, tablets, cell phones, and the like. Taking a robot as an example, the robot includes a display area, an electroacoustic element, a communication element, and a processor. The display area of the robot may include a display screen or the like. The system in which the robot operates presents the presented content to the User using a Graphical User Interface (GUI). The display area includes a number of UI elements, and different display areas may present the same or different UI elements. The UI elements may include card objects, application icons or interfaces, folder icons, multimedia file icons, and controls for making interactive operations, among others. The electro-acoustic element may be used to collect a voice request of a user. The system can send the voice request and the interactive scene information to the server through the communication element, and receive greeting audio operation instructions generated by the server according to the voice request through the communication element. The processor is used for executing the operation corresponding to the operation instruction.

Please refer to fig. 3 and fig. 4, for convenience of description, the following embodiment is developed by taking an electronic device as a conversation robot.

The electronic equipment is internally provided with a digital person, wherein the digital person utilizes an information science method to perform virtual simulation on the shapes and functions of the human body at different levels. The method comprises four cross-overlapped development stages, namely a visible person, a physical person, a physiological person and an intelligent person, and finally establishes a multidisciplinary and multi-level digital model and achieves accurate simulation of a human body from a microscopic level to a macroscopic level. Digital people can enable interaction with users.

The electronic equipment comprises an image sensor and a display screen, wherein the image sensor is used for collecting face images, the display screen can be used for displaying a digital person, and the digital person is used for realizing interaction with a user. The electronic device is also in communication with a server, which may provide digital human cloud services and voice cloud services.

When a user passes through the detection range of the image sensor of the electronic equipment, the image sensor can scan a face image, the face image is subjected to face recognition by the processor, a recognition result is obtained, the recognition result is transmitted to the digital human cloud service of the server after the processor obtains the recognition result according to the face image, greeting text data and greeting actions are generated by the digital human cloud service according to the recognition result, the greeting text data are transmitted to the voice cloud service for voice synthesis, corresponding greeting audio is obtained, and finally the greeting actions and the greeting audio are transmitted to the electronic equipment by the digital human cloud service, so that the greeting actions are displayed while the digital human of the electronic equipment is controlled by the processor to play the audio. Greeting actions may include, but are not limited to, waving, bowing, nodding, and the like.

Therefore, a multi-mode interaction system can be formed by integrating the modes of face recognition, action feedback and the like, the intelligence and the friendliness of the digital people are improved, and the user experience is improved.

Preferably, referring to fig. 5, in some embodiments, step 02 includes:

021: extracting human face characteristic points;

022: and matching the face characteristic points with a preset face characteristic library to obtain a recognition result.

Referring further to FIG. 2, in some embodiments,

steps

021 and 022 may be performed by identification module 120. Or, the recognition module 120 is configured to extract the facial feature points, and match the facial feature points with a preset facial feature library to obtain a recognition result.

In some embodiments, the processor is configured to extract the face feature points and match the face feature points with a preset face feature library to obtain a recognition result.

The extraction method of the human face Feature points may include, but is not limited to, Neural Networks (NNs), Scale-Invariant Feature Transform (sift) algorithm, surf (speeded Up robustfeatures), and other Feature point extraction algorithms. The neural network is an algorithmic mathematical model which simulates animal neural network behavior characteristics and performs distributed parallel information processing. The network achieves the aim of processing information by adjusting the mutual connection relationship among a large number of nodes in the network depending on the complexity of the system. The scale invariant feature transformation algorithm is an algorithm for detecting and describing local features in an image in the field of computer vision, and keeps invariance to rotation, scale scaling and brightness change and stable to a certain degree on view angle change, affine transformation and noise.

The face feature library can store a plurality of face features and identity information (such as name, gender, age and the like) corresponding to the face features in advance, after the face feature points are obtained, the face feature points can be matched with the face features in the preset face feature library, if the face feature points are overlapped with the features in the preset face feature library, the identity Information (ID) corresponding to the face features can be sent to a digital person cloud service of a server, and therefore the digital person cloud service performs text synthesis according to the identity information and a preset greeting to obtain a greeting text and generates a corresponding greeting action according to the greeting.

Preferably, referring to fig. 6, in some embodiments, step 03 includes:

031: determining a greeting scene according to the recognition result and the recognition time;

032: determining greetings according to the greeting scenes through the digital human cloud service;

033: and generating a corresponding greeting audio from the greeting by the voice cloud service.

In some embodiments, the sub-step 031-.

In some embodiments, the processor is configured to determine a greeting scene from the recognition result and the recognition time, determine a greeting from the greeting scene through the digital personal cloud service, and generate a corresponding greeting audio from the greeting through the voice cloud service.

The recognition results may include, but are not limited to, gender characteristics (male, female), age characteristics (teenager, adolescent, middle aged, elderly), and character characteristics (VIP client, general client, stranger), etc.

The recognition time may be a time in the recognition process or a time when the recognition result is received by the digital human cloud service, and the recognition time may include morning, noon, afternoon, evening, or the like.

Greeting scenes may include, but are not limited to, greeting time, location, gender, age, etc. of the person. For example, that is, the greeting can be determined by the time, place, gender, age, etc. of the person in the scene of the greeting.

For example, the processor identifies a sex characteristic as male, an age characteristic as young, a character characteristic as common client and a time as morning according to the identification result obtained by the face image identification. The generated greeting audio may be: mr. XX, good morning, welcome your decoration to the business hall ". For another example, if the gender feature obtained by the processor according to the face image recognition is female, the age feature is juvenile, the character feature is VIP client, and the time is night, the generated greeting audio may be: "Zunjing XX child, Happy evening, welcome you a design book business hall".

Referring to fig. 7, in some embodiments, the interaction method further includes:

05: acquiring a voice request of a user;

06: generating text data according to the voice request, and performing intention understanding query and request query to obtain a response result;

07: performing voice synthesis on the response result through a voice cloud service to generate response audio;

08: and controlling the digital person to play the response audio.

Referring to fig. 8, in some embodiments, the interaction apparatus further includes a query module 150 and a synthesis module 160, step 05 may be implemented by the obtaining module 110, step 06 may be implemented by the query module 150, step 07 may be implemented by the synthesis module 160, and step 08 may be implemented by the control module 140. Or, the obtaining module 110 is further configured to obtain a voice request of a user, the query module 150 is configured to generate text data according to the voice request and perform an intention understanding query and a request query to obtain a response result, the synthesis module 160 may be configured to perform voice synthesis on the response result through a voice cloud service to generate a response audio, and the control module 140 may be configured to control the digital person to play the response audio.

In some embodiments, the processor is configured to generate text data according to a voice request and perform an intention understanding query and a request query to obtain a response result, and the processor is further configured to perform voice synthesis on the response result through a voice cloud service to generate a response audio and control the digital person to play the response audio.

A digital person of the electronic device may be pre-provisioned with a voice software development kit. When the digital person runs, the digital person can be displayed in real time in the display area of the electronic equipment. Software Development Kit (SDK) refers to a tool Kit developed to implement a certain function of product Software. The digital person can synchronize to the server through a voice software development kit, and the voice software development kit is a hub for voice interaction between the electronic device and a voice cloud service of the server. In one aspect, a voice software development kit defines a generation specification for a voice request. On the other hand, the voice software development kit can realize a voice cloud service for synchronizing the digital person information in the electronic equipment to the server and transmit an operation instruction generated by the voice cloud service of the server for the voice request to the digital person.

With further reference to fig. 3, in particular, the electronic device may further include a sound pickup device, and the processor may control the sound pickup device to obtain the voice input of the user, and after the voice input of the user is received, perform noise reduction and other processing on the voice input, and transmit the processed voice input to a voice cloud service of the server, where the voice cloud service performs voice recognition through an automatic voice recognition technology (ASR) to generate text data. And then sending the response audio to the digital human cloud service, performing intention understanding query and request query by the digital human cloud service to obtain a response result, sending the response result back to the voice cloud service, performing voice synthesis by the voice cloud service to generate a response audio, and sending the response audio back to the electronic equipment. So that the processor controls the digital person to play the response audio and display the corresponding response action. Therefore, a multi-mode interaction system can be formed by integrating the modes of face recognition, action feedback, voice interaction and the like, the intelligence and the friendliness of the digital people are improved, and the user experience is improved.

Preferably, referring to fig. 9, in some embodiments, step 06 includes:

061: judging the type of the voice request;

062: under the condition that the voice request is of a single-round type, FAQ and KBQA are adopted to carry out intention understanding query and request query on the text data so as to obtain a response result; or

063: in the case where the voice request is of a multi-round type, the RASA is used to perform an intention understanding query and a request query on the text data to obtain a response result.

Referring to fig. 8, in some embodiments, step 061-.

In some embodiments, the processor is configured to determine a type of the voice request, and perform an intention understanding query and a request query on the text data using the FAQ and the KBQA to obtain a response result in a case where the voice request is of a single-round type, or perform an intention understanding query and a request query on the text data using the RASA to obtain a response result in a case where the voice request is of a multi-round type.

The Frequently Asked Questions (FAQ) refers to a main means for providing online help on the current network, and is published on a webpage to provide consulting services for users by organizing some possible Frequently Asked question and answer pairs in advance. QA refers to directly giving answers to natural language questions put forward by users by using various technologies and data, and KBQA refers to natural language questions and answers based on a knowledge base.

Rasa is an open source machine learning framework for building contextual AI assistants and conversational robots, and has two main modules: a Rasa NLU module and a Rasa Core module. The Rasa NLU module is used to understand user messages, including intent recognition and entity recognition, which will translate the user's input into structured data. The Rasa Core module is a dialogue management platform for taking dialogue and deciding what to do next.

Therefore, the electronic equipment can interact with the user more intelligently, and the intelligence and the friendliness of the electronic equipment are further improved.

Referring to fig. 10, in some embodiments, step 06 further includes:

064: determining a target interaction scene matched with the voice request according to the voice request;

065: determining the problem type of the voice request according to the voice request and the target interaction scene;

066: and obtaining a response result according to the question type and the voice request.

Referring to fig. 8, in some embodiments, step 064-.

In some embodiments, the processor is configured to determine a target interaction scenario matching the voice request according to the voice request, determine a question type of the voice request according to the voice request and the target interaction scenario, and obtain a response result according to the question type and the voice request.

Specifically, after text data is generated according to a voice request through a voice cloud service, intention recognition can be performed on the text data, so that a target interaction scene matched with the text data is determined. The target interaction scenario may include, but is not limited to, a question and answer scenario, a specific business scenario, and the like. For example, in some examples, the voice cloud service may identify a target interaction scenario corresponding to the voice request using the trained intent recognition neural network. For example, for each target interaction scenario, text data corresponding to a plurality of sample voice requests commonly used in the target interaction scenario may be stored, and the intent recognition neural network may be used to determine the similarity between the text data generated by the voice request and the text data corresponding to each target scenario, so as to determine a matching target interaction scenario.

Therefore, the different response results are determined according to different problem types aiming at the received voice request, the flexibility of conversation interaction is improved, meanwhile, the voice request determines a target interaction scene, the response result of the voice request is determined under the target interaction scene, the response result can be matched with the current interaction scene, the matching degree of the response result and the voice request is improved, and therefore the user experience can be further improved.

Preferably, referring to fig. 11, in some embodiments, before step 01, the interaction method further includes:

001: responding to a first input of a user to display a digital human control;

002: generating greeting control instructions in response to a second input by the user to the digital person control;

003: responding to a third input of the digital human control by the user to generate a conversation control instruction;

004: and responding to a fourth input of the digital human control by the user to generate action control instructions, wherein the action control instructions comprise greeting action instructions and dialogue action instructions.

Referring further to fig. 8, in some embodiments, step 001-. In other words, the control module 140 is configured to respond to a first input from the user to display the digital-to-person control, respond to a second input from the user to the digital-to-person control to generate greeting control instructions, respond to a third input from the user to the digital-to-person control to generate conversation control instructions, and respond to a fourth input from the user to the digital-to-person control to generate action control instructions, wherein the action control instructions include greeting action instructions and conversation action instructions.

In some embodiments, the processor is configured to generate greeting control instructions in response to a first user input to display the digital-person control, generate conversation control instructions in response to a second user input to the digital-person control, generate conversation control instructions in response to a third user input to the digital-person control, and generate action control instructions in response to a fourth user input to the digital-person control, the action control instructions including greeting action instructions and conversation action instructions.

Specifically, the electronic device may further be provided with a digital human control, which generally includes but is not limited to the following information: control identification, control type, action type of the control, and the like. Wherein the control identification is unique for each control by which the control can be found. The control types may include groups, text, images, and the like. The action type of the control may include clicking, sliding, and the like.

Referring to fig. 12 to 14, the digital person control includes a reception greeting sub-control, an action editing sub-control, and a voice conversation sub-control, and when the corresponding sub-control is clicked, the relevant interface of the corresponding sub-control can be displayed, and the user can perform user setting on the interface displayed by the sub-control. The reception greeting sub-control is used for setting greetings in different scenes aiming at the scene of first awakening. The reception greeting child control may set a role, time, gender, age, etc. For example, in some examples, a greeting may be set for a male VIP customer as "Mr. XX, good morning, welcome to the business hall". Of course, the above description is only an example, that is, the greeting can be set according to the user's preference, and is not limited to the above example.

The action edit sub-control can be used for settings where the voice (user's voice request, audio generated by a voice cloud service, etc.) matches the action, e.g., in a welcome scenario, the voice "welcome" matches the action "bow.

The voice conversation sub-control can be set according to conversation contents of different scenes, so that the robot guides a user to better handle related services.

The second input may be used to set a reception greeting child control, the third input may be used to set a dialog child control, and the fourth input may be used to set an action editing child control. The input of the user may be a voice input, a touch input, an external device input, or the like. For example, the first input of the user may be a voice input, and the second input, the third input, and the fourth input may be touch inputs, that is, the user may call a digital human control through the voice input control electronic device, and edit and set the relevant contents of the greeting sub-control, the conversation sub-control, and the action sub-control through the input of the display screen, so as to generate a greeting control instruction, a conversation control instruction, an action control instruction, and the like.

Therefore, the digital person of the electronic equipment is configured in a visual mode, the use flexibility is improved, the application range of the electronic equipment is wider, and the user experience is further improved.

The present application further provides a non-transitory computer-readable storage medium storing a computer program, which when executed by one or more processors implements the interaction method of any one of the above embodiments. It will be understood by those skilled in the art that all or part of the processes in the method for implementing the above embodiments may be implemented by a computer program instructing relevant software. The program may be stored in a non-volatile computer readable storage medium, which when executed, may include the flows of embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), or the like.

In the description herein, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example" or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Meanwhile, the description referring to the terms "first", "second", and the like is intended to distinguish the same kind or similar operations, and "first" and "second" have a logical context in some embodiments, and do not necessarily have a logical context in some embodiments, and need to be determined according to actual embodiments, and should not be determined only by a literal meaning.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

Although embodiments of the present application have been shown and described above, it is to be understood that the above embodiments are exemplary and not to be construed as limiting the present application, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. An interaction method, characterized in that the interaction method comprises:

acquiring a face image of a user;

carrying out face recognition on the face image to obtain a recognition result;

2. The interaction method according to claim 1, further comprising:

acquiring a voice request of a user;

and controlling the digital person to play the response audio.

3. The interactive method of claim 2, wherein the generating text data according to the voice request and performing an intent understanding query and a request query to obtain a response result comprises:

judging the type of the voice request;

4. The interactive method of claim 2, wherein the generating text data according to the voice request and performing an intention understanding query and a request query to obtain a response result comprises:

5. The interaction method according to claim 1, wherein the detecting the face image for face recognition to obtain a recognition result comprises:

extracting the face characteristic points;

6. The interaction method of claim 1, wherein the generating corresponding greeting audio and greeting actions according to the recognition result comprises:

determining a greeting scene according to the identification result and the identification time;

7. The interaction method according to claim 1, further comprising:

responding to a first input of a user to display a digital human control;

8. An interaction apparatus, characterized in that the interaction apparatus comprises:

the acquisition module is used for acquiring a face image of a user;

9. An electronic device, characterized in that the electronic device comprises a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, implements the interaction method of any one of claims 1-7.

10. A non-transitory computer-readable storage medium of a computer program, wherein the computer program, when executed by one or more processors, implements the interaction method of any one of claims 1-7.