CN113703585A - Interaction method, interaction device, electronic equipment and storage medium - Google Patents

Interaction method, interaction device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113703585A
CN113703585A CN202111115768.XA CN202111115768A CN113703585A CN 113703585 A CN113703585 A CN 113703585A CN 202111115768 A CN202111115768 A CN 202111115768A CN 113703585 A CN113703585 A CN 113703585A
Authority
CN
China
Prior art keywords
greeting
voice
interaction
request
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111115768.XA
Other languages
Chinese (zh)
Inventor
李慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd filed Critical BOE Technology Group Co Ltd
Priority to CN202111115768.XA priority Critical patent/CN113703585A/en
Publication of CN113703585A publication Critical patent/CN113703585A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Acoustics & Sound (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses an interaction method, an interaction device, electronic equipment and a storage medium. The interaction method comprises the following steps: the method comprises the steps of obtaining a face image of a user, detecting the face image to perform face recognition to obtain a recognition result, generating corresponding greeting audio and greeting actions according to the recognition result, and controlling a digital person to play the greeting audio and display the greeting actions. According to the interactive method, the corresponding greeting audio frequency and greeting actions can be generated according to different users through face image recognition, interaction with the users can be carried out through the virtual digital persons, and therefore user experience is improved.

Description

Interaction method, interaction device, electronic equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to an interaction method, an interaction apparatus, an electronic device, and a computer-readable storage medium.
Background
With the development of information technology, digital people are more and more widely applied. Various convenient services can be provided for human beings through digital people. However, in most current scenes, the human-computer interaction mode based on the virtual image is single.
Disclosure of Invention
In view of this, the present application provides an interaction method, an interaction apparatus, a voice interaction system, an electronic device, and a computer-readable storage medium.
The application provides an interaction method, which is characterized by comprising the following steps:
acquiring a face image of a user;
detecting the face image to perform face recognition to obtain a recognition result;
generating corresponding greeting audio frequency and greeting actions according to the identification result; and
and controlling the digital person to play the greeting audio and displaying the greeting action.
In some embodiments, the interaction method further comprises:
acquiring a voice request of a user;
generating text data according to the voice request, and performing intention understanding query and request query to obtain a response result;
performing voice synthesis on the response result through a voice cloud service to generate response audio;
and controlling the digital person to play the response audio.
In some embodiments, the generating text data from the voice request and performing an intent understanding query and a request query to obtain a response result comprises:
judging the type of the voice request;
under the condition that the voice request is of a single-round type, adopting FAQ and KBQA to carry out intention understanding inquiry and request inquiry on the text data so as to obtain a response result;
and in the case that the voice request is of a multi-round type, performing intention understanding query and request query on the text data by adopting RASA to obtain a response result.
In some embodiments, the generating text data according to the voice request and performing an intention understanding query and a request query to obtain a response result includes:
determining a target interaction scene matched with the voice request according to the voice request;
determining the problem type of the voice request according to the voice request and the target interaction scene;
and obtaining a response result according to the question type and the voice request.
In some embodiments, the detecting the face image for face recognition to obtain a recognition result includes:
extracting the face characteristic points;
and matching the face feature points with a preset face feature library to obtain the recognition result.
In some embodiments, the generating corresponding greeting audio and greeting actions according to the recognition result comprises:
determining a greeting scene according to the identification result;
determining greetings according to the greeting scenes through a digital human cloud service;
and generating the greeting voice frequency corresponding to the greeting by the voice cloud service.
In some embodiments, the interaction method further comprises:
responding to a first input of a user to display a digital human control;
generating greeting control instructions in response to a second input by the user to the digital person control;
generating a dialog control instruction in response to a third input to the digital human control by the user;
generating action control instructions in response to a fourth input to the digital human control by the user, the action control instructions including greeting action instructions and dialog action instructions.
The present application further provides an interaction apparatus, the interaction apparatus comprising:
the acquisition module is used for acquiring a face image of a user;
the recognition module is used for detecting the face image so as to perform face recognition to obtain a recognition result;
the generating module is used for generating corresponding greeting audio according to the identification result; and
and the control module is used for controlling the digital person to play the greeting audio and displaying the greeting action.
The application also provides an electronic device, which includes a memory and a processor, wherein the memory stores a computer program, and the computer program is executed by the processor to implement the interaction method of any one of the above items.
The present application also provides a non-transitory computer-readable storage medium of a computer program, which, when executed by one or more processors, implements the interaction method described in any of the above embodiments.
According to the interaction method, the interaction device, the voice interaction system, the electronic equipment and the computer readable storage medium, the face image of the user is acquired, the image of the face image is detected, the face recognition is carried out to obtain the recognition result, the corresponding greeting audio frequency and greeting actions are generated according to the recognition result, and finally the greeting audio frequency is played through the digital person and the greeting actions are displayed. Therefore, a multi-mode interaction system is formed by integrating the modes of face recognition, action feedback and the like, the intelligence and the friendliness of the digital people are improved, and the user experience is improved.
Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow chart diagram of an interaction method of some embodiments of the present application;
FIG. 2 is a block diagram of an interaction device according to some embodiments of the present application;
FIG. 3 is a scene diagram of an interaction method of some embodiments of the present application;
FIG. 4 is a schematic view of a scenario of an electronic device according to some embodiments of the present application;
FIGS. 5-7 are flow diagrams of an interaction method according to some embodiments of the present application;
FIG. 8 is a schematic diagram of yet another module of an interaction device according to some embodiments of the present application;
FIGS. 9-11 are flow diagrams of an interaction method according to some embodiments of the present application;
fig. 12-14 are schematic diagrams of scenarios of digital human controls according to some embodiments of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the embodiments of the present application, and are not to be construed as limiting the embodiments of the present application.
At present, under the drive of new theoretical and new technologies such as internet of things, big data, sensor networks, brain science and the like and strong demands of economic and social development, artificial intelligence is developed in an accelerated manner, and is deeply integrated with various industries in various fields, so that new characteristics such as deep learning, cross-border integration, man-machine cooperation, crowd intelligence development, autonomous control and the like are presented. The virtual image is arranged in some intelligent equipment or applications, so that visual interaction with a user is realized through the virtual image, and the human-computer interaction experience of the user is improved. Nowadays, digital virtual people are widely applied in the fields of games, entertainment and movies. With the development of information technology, digital people are more and more widely applied. Various convenient services can be provided for human beings through digital people. However, in most current scenes, the human-computer interaction mode based on the virtual image is single, and the delivery form of the digital virtual human needs to be personified and multi-modal upgraded.
In view of this, referring to fig. 1, the present application provides an interaction method, which includes:
01: acquiring a face image of a user;
02: detecting the face image to perform face recognition to obtain a recognition result;
03: generating corresponding greeting audio frequency and greeting actions according to the identification result; and
04: and controlling the digital person to play greeting audio and displaying greeting actions.
Correspondingly, referring to fig. 2, an interaction apparatus 100 is further provided in the embodiments of the present application, and the interaction method in the embodiments of the present application may be implemented by the interaction apparatus 100.
The interaction device 100 includes an acquisition module 110, a recognition module 120, a generation module 130, and a control module 140. Step 01 may be implemented by the obtaining module 110, step 02 may be implemented by the identifying module 120, step 03 may be implemented by the generating module 130, and step 04 may be implemented by the controlling module 140. Alternatively, the obtaining module 110 is used to obtain a face image of the user. The recognition module 120 is configured to detect a face image to perform face recognition to obtain a recognition result. The generating module 130 is configured to generate corresponding greeting audio and greeting actions according to the recognition result. The control module 140 is used for controlling the digital person to play the greeting audio and displaying the greeting action.
The embodiment of the application also provides the electronic equipment. The electronic device includes a memory and a processor. The memory has stored therein a computer program. The processor is used for acquiring a face image of a user, detecting the face image to perform face recognition to obtain a recognition result, generating corresponding greeting audio and greeting actions according to the recognition result, and controlling the digital person to play the greeting audio and display the greeting actions.
According to the interaction method, the interaction device and the electronic equipment, the face image of the user is acquired, the image detection is carried out on the face image, the face recognition is carried out to obtain the recognition result, the corresponding greeting audio frequency and the greeting action are generated according to the recognition result, and finally the greeting audio frequency is played through the digital person and the greeting action is displayed. Therefore, a multi-mode interaction system is formed by integrating the modes of face recognition, action feedback and the like, the intelligence and the friendliness of the digital people are improved, and the user experience is improved.
In particular, the electronic device includes a screen having a graphical user interface display, and a voice recognition device capable of voice interaction. Electronic devices may include, but are not limited to, robots, computers, tablets, cell phones, and the like. Taking a robot as an example, the robot includes a display area, an electroacoustic element, a communication element, and a processor. The display area of the robot may include a display screen or the like. The system in which the robot operates presents the presented content to the User using a Graphical User Interface (GUI). The display area includes a number of UI elements, and different display areas may present the same or different UI elements. The UI elements may include card objects, application icons or interfaces, folder icons, multimedia file icons, and controls for making interactive operations, among others. The electro-acoustic element may be used to collect a voice request of a user. The system can send the voice request and the interactive scene information to the server through the communication element, and receive greeting audio operation instructions generated by the server according to the voice request through the communication element. The processor is used for executing the operation corresponding to the operation instruction.
Please refer to fig. 3 and fig. 4, for convenience of description, the following embodiment is developed by taking an electronic device as a conversation robot.
The electronic equipment is internally provided with a digital person, wherein the digital person utilizes an information science method to perform virtual simulation on the shapes and functions of the human body at different levels. The method comprises four cross-overlapped development stages, namely a visible person, a physical person, a physiological person and an intelligent person, and finally establishes a multidisciplinary and multi-level digital model and achieves accurate simulation of a human body from a microscopic level to a macroscopic level. Digital people can enable interaction with users.
The electronic equipment comprises an image sensor and a display screen, wherein the image sensor is used for collecting face images, the display screen can be used for displaying a digital person, and the digital person is used for realizing interaction with a user. The electronic device is also in communication with a server, which may provide digital human cloud services and voice cloud services.
When a user passes through the detection range of the image sensor of the electronic equipment, the image sensor can scan a face image, the face image is subjected to face recognition by the processor, a recognition result is obtained, the recognition result is transmitted to the digital human cloud service of the server after the processor obtains the recognition result according to the face image, greeting text data and greeting actions are generated by the digital human cloud service according to the recognition result, the greeting text data are transmitted to the voice cloud service for voice synthesis, corresponding greeting audio is obtained, and finally the greeting actions and the greeting audio are transmitted to the electronic equipment by the digital human cloud service, so that the greeting actions are displayed while the digital human of the electronic equipment is controlled by the processor to play the audio. Greeting actions may include, but are not limited to, waving, bowing, nodding, and the like.
Therefore, a multi-mode interaction system can be formed by integrating the modes of face recognition, action feedback and the like, the intelligence and the friendliness of the digital people are improved, and the user experience is improved.
Preferably, referring to fig. 5, in some embodiments, step 02 includes:
021: extracting human face characteristic points;
022: and matching the face characteristic points with a preset face characteristic library to obtain a recognition result.
Referring further to FIG. 2, in some embodiments, steps 021 and 022 may be performed by identification module 120. Or, the recognition module 120 is configured to extract the facial feature points, and match the facial feature points with a preset facial feature library to obtain a recognition result.
In some embodiments, the processor is configured to extract the face feature points and match the face feature points with a preset face feature library to obtain a recognition result.
The extraction method of the human face Feature points may include, but is not limited to, Neural Networks (NNs), Scale-Invariant Feature Transform (sift) algorithm, surf (speeded Up robustfeatures), and other Feature point extraction algorithms. The neural network is an algorithmic mathematical model which simulates animal neural network behavior characteristics and performs distributed parallel information processing. The network achieves the aim of processing information by adjusting the mutual connection relationship among a large number of nodes in the network depending on the complexity of the system. The scale invariant feature transformation algorithm is an algorithm for detecting and describing local features in an image in the field of computer vision, and keeps invariance to rotation, scale scaling and brightness change and stable to a certain degree on view angle change, affine transformation and noise.
The face feature library can store a plurality of face features and identity information (such as name, gender, age and the like) corresponding to the face features in advance, after the face feature points are obtained, the face feature points can be matched with the face features in the preset face feature library, if the face feature points are overlapped with the features in the preset face feature library, the identity Information (ID) corresponding to the face features can be sent to a digital person cloud service of a server, and therefore the digital person cloud service performs text synthesis according to the identity information and a preset greeting to obtain a greeting text and generates a corresponding greeting action according to the greeting.
Preferably, referring to fig. 6, in some embodiments, step 03 includes:
031: determining a greeting scene according to the recognition result and the recognition time;
032: determining greetings according to the greeting scenes through the digital human cloud service;
033: and generating a corresponding greeting audio from the greeting by the voice cloud service.
In some embodiments, the sub-step 031-.
In some embodiments, the processor is configured to determine a greeting scene from the recognition result and the recognition time, determine a greeting from the greeting scene through the digital personal cloud service, and generate a corresponding greeting audio from the greeting through the voice cloud service.
The recognition results may include, but are not limited to, gender characteristics (male, female), age characteristics (teenager, adolescent, middle aged, elderly), and character characteristics (VIP client, general client, stranger), etc.
The recognition time may be a time in the recognition process or a time when the recognition result is received by the digital human cloud service, and the recognition time may include morning, noon, afternoon, evening, or the like.
Greeting scenes may include, but are not limited to, greeting time, location, gender, age, etc. of the person. For example, that is, the greeting can be determined by the time, place, gender, age, etc. of the person in the scene of the greeting.
For example, the processor identifies a sex characteristic as male, an age characteristic as young, a character characteristic as common client and a time as morning according to the identification result obtained by the face image identification. The generated greeting audio may be: mr. XX, good morning, welcome your decoration to the business hall ". For another example, if the gender feature obtained by the processor according to the face image recognition is female, the age feature is juvenile, the character feature is VIP client, and the time is night, the generated greeting audio may be: "Zunjing XX child, Happy evening, welcome you a design book business hall".
Referring to fig. 7, in some embodiments, the interaction method further includes:
05: acquiring a voice request of a user;
06: generating text data according to the voice request, and performing intention understanding query and request query to obtain a response result;
07: performing voice synthesis on the response result through a voice cloud service to generate response audio;
08: and controlling the digital person to play the response audio.
Referring to fig. 8, in some embodiments, the interaction apparatus further includes a query module 150 and a synthesis module 160, step 05 may be implemented by the obtaining module 110, step 06 may be implemented by the query module 150, step 07 may be implemented by the synthesis module 160, and step 08 may be implemented by the control module 140. Or, the obtaining module 110 is further configured to obtain a voice request of a user, the query module 150 is configured to generate text data according to the voice request and perform an intention understanding query and a request query to obtain a response result, the synthesis module 160 may be configured to perform voice synthesis on the response result through a voice cloud service to generate a response audio, and the control module 140 may be configured to control the digital person to play the response audio.
In some embodiments, the processor is configured to generate text data according to a voice request and perform an intention understanding query and a request query to obtain a response result, and the processor is further configured to perform voice synthesis on the response result through a voice cloud service to generate a response audio and control the digital person to play the response audio.
A digital person of the electronic device may be pre-provisioned with a voice software development kit. When the digital person runs, the digital person can be displayed in real time in the display area of the electronic equipment. Software Development Kit (SDK) refers to a tool Kit developed to implement a certain function of product Software. The digital person can synchronize to the server through a voice software development kit, and the voice software development kit is a hub for voice interaction between the electronic device and a voice cloud service of the server. In one aspect, a voice software development kit defines a generation specification for a voice request. On the other hand, the voice software development kit can realize a voice cloud service for synchronizing the digital person information in the electronic equipment to the server and transmit an operation instruction generated by the voice cloud service of the server for the voice request to the digital person.
With further reference to fig. 3, in particular, the electronic device may further include a sound pickup device, and the processor may control the sound pickup device to obtain the voice input of the user, and after the voice input of the user is received, perform noise reduction and other processing on the voice input, and transmit the processed voice input to a voice cloud service of the server, where the voice cloud service performs voice recognition through an automatic voice recognition technology (ASR) to generate text data. And then sending the response audio to the digital human cloud service, performing intention understanding query and request query by the digital human cloud service to obtain a response result, sending the response result back to the voice cloud service, performing voice synthesis by the voice cloud service to generate a response audio, and sending the response audio back to the electronic equipment. So that the processor controls the digital person to play the response audio and display the corresponding response action. Therefore, a multi-mode interaction system can be formed by integrating the modes of face recognition, action feedback, voice interaction and the like, the intelligence and the friendliness of the digital people are improved, and the user experience is improved.
Preferably, referring to fig. 9, in some embodiments, step 06 includes:
061: judging the type of the voice request;
062: under the condition that the voice request is of a single-round type, FAQ and KBQA are adopted to carry out intention understanding query and request query on the text data so as to obtain a response result; or
063: in the case where the voice request is of a multi-round type, the RASA is used to perform an intention understanding query and a request query on the text data to obtain a response result.
Referring to fig. 8, in some embodiments, step 061-.
In some embodiments, the processor is configured to determine a type of the voice request, and perform an intention understanding query and a request query on the text data using the FAQ and the KBQA to obtain a response result in a case where the voice request is of a single-round type, or perform an intention understanding query and a request query on the text data using the RASA to obtain a response result in a case where the voice request is of a multi-round type.
The Frequently Asked Questions (FAQ) refers to a main means for providing online help on the current network, and is published on a webpage to provide consulting services for users by organizing some possible Frequently Asked question and answer pairs in advance. QA refers to directly giving answers to natural language questions put forward by users by using various technologies and data, and KBQA refers to natural language questions and answers based on a knowledge base.
Rasa is an open source machine learning framework for building contextual AI assistants and conversational robots, and has two main modules: a Rasa NLU module and a Rasa Core module. The Rasa NLU module is used to understand user messages, including intent recognition and entity recognition, which will translate the user's input into structured data. The Rasa Core module is a dialogue management platform for taking dialogue and deciding what to do next.
Therefore, the electronic equipment can interact with the user more intelligently, and the intelligence and the friendliness of the electronic equipment are further improved.
Referring to fig. 10, in some embodiments, step 06 further includes:
064: determining a target interaction scene matched with the voice request according to the voice request;
065: determining the problem type of the voice request according to the voice request and the target interaction scene;
066: and obtaining a response result according to the question type and the voice request.
Referring to fig. 8, in some embodiments, step 064-.
In some embodiments, the processor is configured to determine a target interaction scenario matching the voice request according to the voice request, determine a question type of the voice request according to the voice request and the target interaction scenario, and obtain a response result according to the question type and the voice request.
Specifically, after text data is generated according to a voice request through a voice cloud service, intention recognition can be performed on the text data, so that a target interaction scene matched with the text data is determined. The target interaction scenario may include, but is not limited to, a question and answer scenario, a specific business scenario, and the like. For example, in some examples, the voice cloud service may identify a target interaction scenario corresponding to the voice request using the trained intent recognition neural network. For example, for each target interaction scenario, text data corresponding to a plurality of sample voice requests commonly used in the target interaction scenario may be stored, and the intent recognition neural network may be used to determine the similarity between the text data generated by the voice request and the text data corresponding to each target scenario, so as to determine a matching target interaction scenario.
Therefore, the different response results are determined according to different problem types aiming at the received voice request, the flexibility of conversation interaction is improved, meanwhile, the voice request determines a target interaction scene, the response result of the voice request is determined under the target interaction scene, the response result can be matched with the current interaction scene, the matching degree of the response result and the voice request is improved, and therefore the user experience can be further improved.
Preferably, referring to fig. 11, in some embodiments, before step 01, the interaction method further includes:
001: responding to a first input of a user to display a digital human control;
002: generating greeting control instructions in response to a second input by the user to the digital person control;
003: responding to a third input of the digital human control by the user to generate a conversation control instruction;
004: and responding to a fourth input of the digital human control by the user to generate action control instructions, wherein the action control instructions comprise greeting action instructions and dialogue action instructions.
Referring further to fig. 8, in some embodiments, step 001-. In other words, the control module 140 is configured to respond to a first input from the user to display the digital-to-person control, respond to a second input from the user to the digital-to-person control to generate greeting control instructions, respond to a third input from the user to the digital-to-person control to generate conversation control instructions, and respond to a fourth input from the user to the digital-to-person control to generate action control instructions, wherein the action control instructions include greeting action instructions and conversation action instructions.
In some embodiments, the processor is configured to generate greeting control instructions in response to a first user input to display the digital-person control, generate conversation control instructions in response to a second user input to the digital-person control, generate conversation control instructions in response to a third user input to the digital-person control, and generate action control instructions in response to a fourth user input to the digital-person control, the action control instructions including greeting action instructions and conversation action instructions.
Specifically, the electronic device may further be provided with a digital human control, which generally includes but is not limited to the following information: control identification, control type, action type of the control, and the like. Wherein the control identification is unique for each control by which the control can be found. The control types may include groups, text, images, and the like. The action type of the control may include clicking, sliding, and the like.
Referring to fig. 12 to 14, the digital person control includes a reception greeting sub-control, an action editing sub-control, and a voice conversation sub-control, and when the corresponding sub-control is clicked, the relevant interface of the corresponding sub-control can be displayed, and the user can perform user setting on the interface displayed by the sub-control. The reception greeting sub-control is used for setting greetings in different scenes aiming at the scene of first awakening. The reception greeting child control may set a role, time, gender, age, etc. For example, in some examples, a greeting may be set for a male VIP customer as "Mr. XX, good morning, welcome to the business hall". Of course, the above description is only an example, that is, the greeting can be set according to the user's preference, and is not limited to the above example.
The action edit sub-control can be used for settings where the voice (user's voice request, audio generated by a voice cloud service, etc.) matches the action, e.g., in a welcome scenario, the voice "welcome" matches the action "bow.
The voice conversation sub-control can be set according to conversation contents of different scenes, so that the robot guides a user to better handle related services.
The second input may be used to set a reception greeting child control, the third input may be used to set a dialog child control, and the fourth input may be used to set an action editing child control. The input of the user may be a voice input, a touch input, an external device input, or the like. For example, the first input of the user may be a voice input, and the second input, the third input, and the fourth input may be touch inputs, that is, the user may call a digital human control through the voice input control electronic device, and edit and set the relevant contents of the greeting sub-control, the conversation sub-control, and the action sub-control through the input of the display screen, so as to generate a greeting control instruction, a conversation control instruction, an action control instruction, and the like.
Therefore, the digital person of the electronic equipment is configured in a visual mode, the use flexibility is improved, the application range of the electronic equipment is wider, and the user experience is further improved.
The present application further provides a non-transitory computer-readable storage medium storing a computer program, which when executed by one or more processors implements the interaction method of any one of the above embodiments. It will be understood by those skilled in the art that all or part of the processes in the method for implementing the above embodiments may be implemented by a computer program instructing relevant software. The program may be stored in a non-volatile computer readable storage medium, which when executed, may include the flows of embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), or the like.
In the description herein, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example" or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Meanwhile, the description referring to the terms "first", "second", and the like is intended to distinguish the same kind or similar operations, and "first" and "second" have a logical context in some embodiments, and do not necessarily have a logical context in some embodiments, and need to be determined according to actual embodiments, and should not be determined only by a literal meaning.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
Although embodiments of the present application have been shown and described above, it is to be understood that the above embodiments are exemplary and not to be construed as limiting the present application, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. An interaction method, characterized in that the interaction method comprises:
acquiring a face image of a user;
carrying out face recognition on the face image to obtain a recognition result;
generating corresponding greeting audio frequency and greeting actions according to the identification result; and
and controlling the digital person to play the greeting audio and displaying the greeting action.
2. The interaction method according to claim 1, further comprising:
acquiring a voice request of a user;
generating text data according to the voice request, and performing intention understanding query and request query to obtain a response result;
performing voice synthesis on the response result through a voice cloud service to generate response audio;
and controlling the digital person to play the response audio.
3. The interactive method of claim 2, wherein the generating text data according to the voice request and performing an intent understanding query and a request query to obtain a response result comprises:
judging the type of the voice request;
under the condition that the voice request is of a single-round type, adopting FAQ and KBQA to carry out intention understanding inquiry and request inquiry on the text data so as to obtain a response result;
and in the case that the voice request is of a multi-round type, performing intention understanding query and request query on the text data by adopting RASA to obtain a response result.
4. The interactive method of claim 2, wherein the generating text data according to the voice request and performing an intention understanding query and a request query to obtain a response result comprises:
determining a target interaction scene matched with the voice request according to the voice request;
determining the problem type of the voice request according to the voice request and the target interaction scene;
and obtaining a response result according to the question type and the voice request.
5. The interaction method according to claim 1, wherein the detecting the face image for face recognition to obtain a recognition result comprises:
extracting the face characteristic points;
and matching the face feature points with a preset face feature library to obtain the recognition result.
6. The interaction method of claim 1, wherein the generating corresponding greeting audio and greeting actions according to the recognition result comprises:
determining a greeting scene according to the identification result and the identification time;
determining greetings according to the greeting scenes through a digital human cloud service;
and generating the greeting voice frequency corresponding to the greeting by the voice cloud service.
7. The interaction method according to claim 1, further comprising:
responding to a first input of a user to display a digital human control;
generating greeting control instructions in response to a second input by the user to the digital person control;
generating a dialog control instruction in response to a third input to the digital human control by the user;
generating action control instructions in response to a fourth input to the digital human control by the user, the action control instructions including greeting action instructions and dialog action instructions.
8. An interaction apparatus, characterized in that the interaction apparatus comprises:
the acquisition module is used for acquiring a face image of a user;
the recognition module is used for detecting the face image so as to perform face recognition to obtain a recognition result;
the generating module is used for generating corresponding greeting audio according to the identification result; and
and the control module is used for controlling the digital person to play the greeting audio and displaying the greeting action.
9. An electronic device, characterized in that the electronic device comprises a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, implements the interaction method of any one of claims 1-7.
10. A non-transitory computer-readable storage medium of a computer program, wherein the computer program, when executed by one or more processors, implements the interaction method of any one of claims 1-7.
CN202111115768.XA 2021-09-23 2021-09-23 Interaction method, interaction device, electronic equipment and storage medium Withdrawn CN113703585A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111115768.XA CN113703585A (en) 2021-09-23 2021-09-23 Interaction method, interaction device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111115768.XA CN113703585A (en) 2021-09-23 2021-09-23 Interaction method, interaction device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113703585A true CN113703585A (en) 2021-11-26

Family

ID=78661641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111115768.XA Withdrawn CN113703585A (en) 2021-09-23 2021-09-23 Interaction method, interaction device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113703585A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114387643A (en) * 2021-12-28 2022-04-22 达闼机器人有限公司 Robot control method, system, computer device and storage medium
CN116543082A (en) * 2023-05-18 2023-08-04 无锡捷通数智科技有限公司 Digital person generation method and device and digital person generation system
CN116708905A (en) * 2023-08-07 2023-09-05 海马云(天津)信息技术有限公司 Method and device for realizing digital human interaction on television box
CN117672180A (en) * 2023-12-08 2024-03-08 广州凯迪云信息科技有限公司 Voice communication control method and system for digital robot

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114387643A (en) * 2021-12-28 2022-04-22 达闼机器人有限公司 Robot control method, system, computer device and storage medium
WO2023124026A1 (en) * 2021-12-28 2023-07-06 达闼机器人股份有限公司 Robot control method and system, computer device, storage medium and computer program product
CN116543082A (en) * 2023-05-18 2023-08-04 无锡捷通数智科技有限公司 Digital person generation method and device and digital person generation system
CN116708905A (en) * 2023-08-07 2023-09-05 海马云(天津)信息技术有限公司 Method and device for realizing digital human interaction on television box
CN117672180A (en) * 2023-12-08 2024-03-08 广州凯迪云信息科技有限公司 Voice communication control method and system for digital robot

Similar Documents

Publication Publication Date Title
US10664060B2 (en) Multimodal input-based interaction method and device
CN107894833B (en) Multi-modal interaction processing method and system based on virtual human
CN109176535B (en) Interaction method and system based on intelligent robot
JP6889281B2 (en) Analyzing electronic conversations for presentations in alternative interfaces
CN113703585A (en) Interaction method, interaction device, electronic equipment and storage medium
CN107632706B (en) Application data processing method and system of multi-modal virtual human
CN107704169B (en) Virtual human state management method and system
CN110598576A (en) Sign language interaction method and device and computer medium
CN109086860B (en) Interaction method and system based on virtual human
KR101887637B1 (en) Robot system
CN107807734B (en) Interactive output method and system for intelligent robot
CN110519636A (en) Voice messaging playback method, device, computer equipment and storage medium
CN107480766B (en) Method and system for content generation for multi-modal virtual robots
EP3635513B1 (en) Selective detection of visual cues for automated assistants
KR20190089451A (en) Electronic device for providing image related with text and operation method thereof
CN112528004B (en) Voice interaction method, voice interaction device, electronic equipment, medium and computer program product
CN108664472A (en) Natural language processing method, apparatus and its equipment
CN107832720B (en) Information processing method and device based on artificial intelligence
CN111291151A (en) Interaction method and device and computer equipment
CN110825164A (en) Interaction method and system based on wearable intelligent equipment special for children
CN110442867A (en) Image processing method, device, terminal and computer storage medium
WO2023034722A1 (en) Conversation guided augmented reality experience
JP2023120130A (en) Conversation-type ai platform using extraction question response
CN108628454B (en) Visual interaction method and system based on virtual human
CN114974253A (en) Natural language interpretation method and device based on character image and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20211126

WW01 Invention patent application withdrawn after publication