CN113436622A - Processing method and device of intelligent voice assistant - Google Patents

Processing method and device of intelligent voice assistant Download PDF

Info

Publication number
CN113436622A
CN113436622A CN202010144535.1A CN202010144535A CN113436622A CN 113436622 A CN113436622 A CN 113436622A CN 202010144535 A CN202010144535 A CN 202010144535A CN 113436622 A CN113436622 A CN 113436622A
Authority
CN
China
Prior art keywords
avatar
voice assistant
intelligent voice
target user
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010144535.1A
Other languages
Chinese (zh)
Inventor
陈姿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010144535.1A priority Critical patent/CN113436622A/en
Publication of CN113436622A publication Critical patent/CN113436622A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention provides a processing method and a processing device of an intelligent voice assistant. The method comprises the following steps: acquiring a voice instruction corresponding to a target user; sending a voice instruction to perform feature extraction on the voice instruction to obtain biological feature parameters of a corresponding target user, and performing emotion recognition on the voice instruction to obtain the current emotion type of the corresponding target user; receiving returned virtual image identification of the intelligent voice assistant corresponding to the biological characteristic parameters and virtual emotion matched with the current emotion category of the target user and corresponding to the virtual image identification; presenting an avatar of the intelligent voice assistant indicated by the avatar identification; and responding to an interactive instruction triggered by the intelligent voice assistant, and controlling the virtual image of the intelligent voice assistant to play voice conforming to the interactive instruction in a virtual emotion mode. By the aid of the method and the device, the personalized requirements of the user on the virtual image of the intelligent voice assistant can be met, and simultaneously, the humanized interaction of the virtual image is enhanced.

Description

Processing method and device of intelligent voice assistant
Technical Field
The invention relates to the technical field of Artificial Intelligence (AI), in particular to a processing method and a processing device of an intelligent voice assistant.
Background
Artificial intelligence is a theory, method and technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.
With the development of artificial intelligence technology, various intelligent voice assistant products are widely applied, and in the related technology, the developed intelligent voice assistant usually has a fixed virtual image or no corresponding virtual image, is only a tool for voice interaction with a user, and cannot meet the individual requirements of the user on the virtual image of the intelligent voice assistant; in the process of interaction between the intelligent voice assistant and the user, the virtual image of the intelligent voice assistant lacks emotional expression, and the good sensitivity of the user to the application of the intelligent voice assistant product and the viscosity of the user are reduced.
Disclosure of Invention
In view of this, embodiments of the present invention provide a processing method and an apparatus for an intelligent voice assistant, which can enhance the human-based interaction of an avatar while satisfying the personalized requirements of a user on the avatar of the intelligent voice assistant.
The technical scheme of the embodiment of the invention is realized as follows:
the embodiment of the invention provides a processing method of an intelligent voice assistant, which comprises the following steps:
acquiring a voice instruction corresponding to a target user;
sending the voice instruction to perform feature extraction on the voice instruction to obtain a biological feature parameter corresponding to the target user, and performing emotion recognition on the voice instruction to obtain a current emotion type corresponding to the target user;
receiving returned virtual image identification of the intelligent voice assistant corresponding to the biological characteristic parameters and virtual emotion matched with the current emotion category of the target user and corresponding to the virtual image identification;
presenting an avatar of the intelligent voice assistant indicated by the avatar identification;
and in response to an interactive instruction triggered by the intelligent voice assistant, controlling an avatar of the intelligent voice assistant to play voice conforming to the interactive instruction in a virtual emotion manner.
In the above scheme, the method further comprises:
sending the identification of the target user and the virtual image of the intelligent voice assistant corresponding to the identification of the target user to a block chain network so as to enable the target user to be identified
And the node of the block chain network fills the identification of the target user and the virtual image of the intelligent voice assistant corresponding to the identification of the target user into a new block, and when the new block is consistent in consensus, the new block is added to the tail part of the block chain.
The embodiment of the invention also provides another processing method of the intelligent voice assistant, which comprises the following steps:
receiving a voice instruction of a target user sent by a client;
extracting the characteristics of the voice instruction to obtain biological characteristic parameters corresponding to the target user, and performing emotion recognition on the voice instruction to obtain the current emotion type corresponding to the target user;
determining an avatar identification of the intelligent voice assistant corresponding to the biological characteristic parameter and a virtual emotion matched with the current emotion category of the target user and corresponding to the avatar identification;
and sending the avatar identification and the virtual emotion to the client so as to enable the client to present the avatar of the intelligent voice assistant indicated by the avatar identification, and responding to an interaction instruction triggered by the intelligent voice assistant, and controlling the avatar of the intelligent voice assistant to play voice conforming to the interaction instruction in the mode of the virtual emotion.
The embodiment of the invention also provides a processing device of the intelligent voice assistant, which comprises:
the acquisition unit is used for acquiring a voice instruction corresponding to a target user;
the first sending unit is used for sending the voice command so as to perform feature extraction on the voice command to obtain biological feature parameters corresponding to the target user and perform emotion recognition on the voice command to obtain the current emotion type corresponding to the target user;
the first receiving unit is used for receiving the returned virtual character identification of the intelligent voice assistant corresponding to the biological characteristic parameters and the virtual emotion matched with the current emotion category of the target user and corresponding to the virtual character identification;
a presenting unit for presenting the avatar of the intelligent voice assistant indicated by the avatar identification;
and the control unit is used for responding to an interactive instruction triggered by the intelligent voice assistant and controlling the virtual image of the intelligent voice assistant to play voice conforming to the interactive instruction in a virtual emotion mode.
In the above scheme, the apparatus further comprises:
a third receiving unit, configured to receive a returned voice recognition instruction before the first receiving unit receives a returned avatar identifier of the intelligent voice assistant corresponding to the biometric parameter, where the voice recognition instruction represents a determination result of whether the biometric parameter is recorded in the client;
a third sending unit, configured to send a first prompt message to the target user to prompt the target user to confirm a biometric account of the target user when the determination result is that the biometric parameter is recorded in the client;
a fourth sending unit, configured to send a second prompt message to the target user to prompt the target user to select the avatar of the intelligent voice assistant when the determination result indicates that the biometric parameter is not recorded in the client;
and the storage unit is used for storing the selected virtual image of the intelligent voice assistant in a database of a server.
In the foregoing scheme, the fourth sending unit is further configured to:
presenting an avatar selection interface for selecting an avatar of the intelligent voice assistant in response to a setting request of the avatar of the intelligent voice assistant triggered by the target user based on the second prompt message;
and responding to an avatar selection instruction triggered based on the avatar selection interface, and acquiring the avatar of the intelligent voice assistant which accords with the preference of the target user.
In the foregoing solution, the first receiving unit is further configured to:
submitting an avatar acquisition request corresponding to the intelligent voice assistant to a server so that
The server inquires the virtual image identification of the intelligent voice assistant which is matched with the biological characteristic parameter in a database based on the biological characteristic parameter of the target user, and the database stores the corresponding relation between the biological characteristic parameter of a plurality of target users and the virtual image identification;
and receiving the virtual image identification of the intelligent voice assistant which is sent by the server and is adapted to the biological characteristic parameters.
In the foregoing solution, the presenting unit is further configured to:
determining an avatar corresponding to the intelligent voice assistant based on the indication of the avatar identification;
acquiring image resources of the virtual image corresponding to the intelligent voice assistant;
presenting a default avatar of an avatar of the intelligent voice assistant based on the avatar resources, the default avatar of the avatar including at least one of: a default skin of the avatar; a default prop for the avatar.
In the foregoing solution, the control unit is further configured to:
acquiring an interactive instruction triggered by the target user based on the intelligent voice assistant, and sending the interactive instruction to a server so that the target user can interact with the server
When the interactive instruction comprises a voice interactive instruction, the server performs voice recognition on the voice interactive instruction to obtain corresponding text information, and performs semantic recognition on the text information to obtain an intention corresponding to the voice interactive instruction;
and receiving the returned intention of the voice interaction instruction, and controlling the virtual image of the intelligent voice assistant to play the voice conforming to the interaction instruction in a virtual emotion mode based on the control instruction corresponding to the intention of the voice interaction instruction.
In the above scheme, the apparatus further comprises:
the fourth receiving unit is used for receiving the returned emotion animation corresponding to the virtual image identifier, wherein the emotion animation is generated by the server based on the current emotion category of the target user;
the presenting unit is further used for presenting the emotion animation corresponding to the virtual image identification when the virtual image of the intelligent voice assistant indicated by the virtual image identification is presented.
In the above scheme, the apparatus further comprises:
a fifth sending unit, configured to send the identifier of the target user and the avatar of the intelligent voice assistant corresponding to the identifier of the target user to a blockchain network, so that the target user and the avatar of the intelligent voice assistant are sent to the blockchain network
And the node of the block chain network fills the identification of the target user and the virtual image of the intelligent voice assistant corresponding to the identification of the target user into a new block, and when the new block is consistent in consensus, the new block is added to the tail part of the block chain.
The embodiment of the invention also provides another processing device of the intelligent voice assistant, which comprises:
the second receiving unit is used for receiving the voice instruction of the target user sent by the client;
the extraction unit is used for carrying out feature extraction on the voice command to obtain a biological feature parameter corresponding to the target user;
the recognition unit is used for carrying out emotion recognition on the voice command to obtain the current emotion category corresponding to the target user;
the determining unit is used for determining an avatar identification of the intelligent voice assistant corresponding to the biological characteristic parameter and a virtual emotion matched with the current emotion category of the target user and corresponding to the avatar identification;
and the second sending unit is used for sending the avatar identification and the virtual emotion to the client so as to enable the client to present the avatar of the intelligent voice assistant indicated by the avatar identification, and controlling the avatar of the intelligent voice assistant to play voice conforming to the interactive instruction in a virtual emotion mode in response to the interactive instruction triggered by the intelligent voice assistant.
The embodiment of the invention also provides processing equipment of the intelligent voice assistant, which comprises:
a memory for storing executable instructions;
and the processor is used for realizing the processing method of the intelligent voice assistant provided by the embodiment of the invention when the processor executes the executable instructions stored in the memory.
The embodiment of the invention also provides a computer-readable storage medium, which stores executable instructions, and the executable instructions are used for realizing the processing method of the intelligent voice assistant provided by the embodiment of the invention when being executed by the processor.
The application of the embodiment of the invention has the following beneficial effects:
the client sends the voice instruction of the target user to the server so that the server performs feature extraction on the voice instruction to obtain a biological feature parameter corresponding to the target user; receiving the virtual image identification of the intelligent voice assistant corresponding to the biological characteristic parameters, which is returned by the server; therefore, each target user has the own exclusive virtual image of the intelligent voice assistant based on the uniqueness and uniqueness of the biological characteristic parameters of the target user, and the individual requirements of the target user on the virtual image of the intelligent voice assistant can be met; the terminal presents the virtual image of the intelligent voice assistant indicated by the virtual image identification, so that the virtual image of the intelligent voice assistant has a visual effect;
the returned virtual emotion matched with the current emotion category of the target user and corresponding to the virtual image identifier is received, and the virtual image of the intelligent voice assistant is controlled to play voice conforming to the interactive instruction in a virtual emotion mode in response to the interactive instruction triggered by the intelligent voice assistant; therefore, in the interaction process of the intelligent voice assistant and the target user, the emotion expression of the virtual image of the intelligent voice assistant is increased, so that the humanized interaction of the virtual image is enhanced, the good sensitivity and the user viscosity of the target user to the application intelligent voice assistant product can be improved, and the use experience of the target user is improved.
Drawings
FIG. 1 is a schematic diagram of an alternative architecture of a processing system 10 of an intelligent voice assistant according to an embodiment of the present invention;
FIG. 2A is a diagram illustrating an alternative hardware configuration of a processing device 40 of the intelligent voice assistant according to an embodiment of the present invention;
FIG. 2B is a diagram illustrating an alternative hardware configuration of a processing device 50 of the intelligent voice assistant according to an embodiment of the present invention;
FIG. 3A is a schematic diagram of an alternative configuration of a processing device 455 of the intelligent voice assistant according to an embodiment of the present invention;
FIG. 3B is a diagram illustrating an alternative configuration of a processing device 555 of the intelligent voice assistant according to an embodiment of the present invention;
FIG. 4 is an alternative flow chart of the processing method of the intelligent voice assistant according to the embodiment of the present invention;
fig. 5 is an alternative interface diagram of a terminal device presenting a voice instruction editing entry according to an embodiment of the present invention;
fig. 6 is an alternative interface diagram for presenting a second prompting message by the terminal device according to the embodiment of the present invention;
FIG. 7 is an alternative diagram of a presentation avatar selection interface provided in accordance with an embodiment of the present invention;
fig. 8 is an alternative schematic diagram of identity confirmation of a target user according to an embodiment of the present invention;
FIG. 9 is an alternative diagram of specific name confirmation for a target user according to an embodiment of the present invention;
fig. 10 is a schematic view of an application architecture of a blockchain network according to an embodiment of the present invention;
fig. 11 is an alternative structural diagram of a block chain in the block chain network 81 according to an embodiment of the present invention;
fig. 12 is a functional architecture diagram of a blockchain network 81 according to an embodiment of the present invention;
FIG. 13 is a schematic flow chart illustrating another alternative method for processing an intelligent voice assistant according to an embodiment of the present invention;
FIG. 14 is a schematic flow chart illustrating another alternative method for processing an intelligent voice assistant according to an embodiment of the present invention;
FIG. 15 is an alternative flow diagram of the avatar configuration of the intelligent voice assistant provided by an embodiment of the present invention;
fig. 16 is a schematic view of an alternative flow chart of a process of providing feedback of virtual emotion from an avatar according to an embodiment of the present invention;
fig. 17 is a schematic view of an alternative process of the voiceprint-based personalized avatar account system according to an embodiment of the present invention;
fig. 18 is a schematic flow chart of another alternative process of providing virtual emotion feedback for an avatar according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, references to the terms "first," "second," etc. are used merely to distinguish similar elements and do not denote a particular order or sequence of elements, but rather the terms "first," "second," etc. are used to interchange particular orders or sequences as may be permitted, and to enable embodiments of the invention described herein to be practiced in other than the order shown or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which embodiments of the present invention belong. The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Before further detailed description of the embodiments of the present invention, terms and expressions referred to in the embodiments of the present invention will be described, and the terms and expressions referred to in the embodiments of the present invention will be explained as follows.
1) The intelligent voice assistant is an intelligent terminal application, and can help a user to solve problems through intelligent interaction of intelligent conversation and instant question and answer, namely the user can use natural conversation to carry out intelligent voice interaction with the intelligent voice assistant in the terminal, an interaction mode based on voice input is realized by using a natural language processing technology, and a feedback result can be obtained through intelligent voice interaction.
2) Natural language processing is a technology for studying human language processing by computers, and includes, but is not limited to, syntactic semantic analysis, information extraction, text mining, machine translation, information retrieval, question-answering systems, and dialog systems.
3) Skin, ornamentation representing an avatar for the intelligent voice assistant, which may include clothing, equipment, etc., may be used to adorn the individual avatar.
4) Props, representing weapons, tools, etc. used to accompany the avatar of the intelligent voice assistant.
5) In response to the condition or state on which the performed operation depends, one or more of the performed operations may be in real-time or may have a set delay when the dependent condition or state is satisfied; there is no restriction on the order of execution of the operations performed unless otherwise specified.
6) Transactions (transactions), equivalent to the computer term "Transaction," include operations that need to be committed to a blockchain network for execution and do not refer solely to transactions in the context of commerce, which embodiments of the present invention follow in view of the convention colloquially used in blockchain technology.
For example, a deployment (deployment) transaction is used to install a specified smart contract to a node in a blockchain network and is ready to be invoked; the call (Invoke) transaction is used for adding a record of the transaction in the blockchain by calling an intelligent contract, and performing operations on a state database of the blockchain, including updating operations (including adding, deleting and modifying Key-Value pairs in the state database) and query operations (i.e., querying Key-Value pairs in the state database).
7) Block Chain (Block Chain) is a storage structure for encrypted, chained transactions formed by blocks (blocks).
For example, the header of each block may include hash values of all transactions in the block, and also include hash values of all transactions in the previous block, so as to achieve tamper resistance and forgery resistance of the transactions in the block based on the hash values; newly generated transactions, after being filled into the tiles and passing through the consensus of nodes in the blockchain network, are appended to the end of the blockchain to form a chain growth.
8) A Block Chain Network (Block Chain Network) incorporates a new Block into a set of a series of nodes of a Block Chain in a consensus manner.
9) Ledger (legger) is a general term for blockchains (also called Ledger data) and state databases synchronized with blockchains.
Wherein, the blockchain records the transaction in the form of a file in a file system; the state database records the transactions in the blockchain in the form of different types of Key (Key) Value pairs for supporting fast query of the transactions in the blockchain.
10) Intelligent Contracts (Smart Contracts), also called Chain codes (Chain codes) or application codes, are programs deployed in nodes of a blockchain network, the programs are triggered to be executed according to conditions, and the nodes execute the intelligent Contracts called in received transactions to update or inquire key value pair data of a state database.
11) Consensus (Consensus), a process in a blockchain network, is used to agree on transactions in a block among a plurality of nodes involved, the agreed block is to be appended to the end of the blockchain, and the mechanisms for achieving Consensus include Proof of workload (PoW, Proof of Work), Proof of rights and interests (PoS, Proof of equity (DPoS), Proof of granted of shares (DPoS), Proof of Elapsed Time (PoET, Proof of Elapsed Time), and so on.
The Processing method of the intelligent voice assistant provided by the embodiment of the invention relates To a voice Technology (Speech Technology) and a Natural Language Processing (NLP) Technology in the field of artificial intelligence, wherein the key Technology of the voice Technology comprises an Automatic voice Recognition Technology (ASR), a voice synthesis Technology (TTS, Text To Speech) and a voiceprint Recognition Technology, so that a computer can listen, see, speak and feel, and is the development direction of future human-computer interaction, and the voice becomes one of the best viewed human-computer interaction modes in the future; natural language processing is an important direction in the fields of computer science and artificial intelligence, and it is an important direction to study various theories and methods that can realize effective communication between people and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics, so that research in the field relates to natural language, namely language used by people daily, and is closely related to the research of linguistics, and natural language processing technologies generally comprise technologies such as text processing, semantic understanding, machine translation, robot question and answer, knowledge graph and the like.
According to the embodiment of the invention, through an artificial intelligence technology, each target user has the own exclusive virtual image of the intelligent voice assistant based on the uniqueness and uniqueness of the biological characteristic parameters of the target user, and the individual requirements of the target user on the virtual image of the intelligent voice assistant can be met; presenting the virtual image of the intelligent voice assistant indicated by the virtual image identification through the terminal, so that the virtual image of the intelligent voice assistant has a visual effect; in the interaction process of the intelligent voice assistant and the target user, the emotion expression of the virtual image of the intelligent voice assistant is increased so as to enhance the humanized interaction of the virtual image, thereby improving the good feeling of the target user to the application intelligent voice assistant product and the user viscosity and improving the use experience of the target user.
The following describes an exemplary application of a processing device of an intelligent voice assistant, which implements the processing method of the intelligent voice assistant according to the embodiment of the present invention, where the processing device of the intelligent voice assistant according to the embodiment of the present invention may be implemented as various types of terminal devices with display screens, such as a notebook computer, a tablet computer, a desktop computer, an intelligent television, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), and may also be implemented as a server, and of course, the processing device may also be implemented cooperatively with the server, where the server may be a cloud server, but the embodiment of the present invention is not limited to the cloud server.
An exemplary application of the processing system of the intelligent voice assistant according to the embodiment of the present invention is described below with reference to the accompanying drawings by taking a terminal device and a server as an example. Referring to fig. 1, fig. 1 is an alternative architecture diagram of a processing system 10 of an intelligent voice assistant according to an embodiment of the present invention, in order to implement an exemplary application supported by the processing system, a terminal 100 (an exemplary terminal 100-1 and a terminal 100-2 are shown) is connected to a server 300 through a network 200, where the network 200 may be a wide area network or a local area network, or a combination of the two, and data transmission is implemented using a wireless link.
In some embodiments, the terminal 100 (e.g., the terminal 100-1) is configured to obtain a voice instruction corresponding to a target user, and send the voice instruction corresponding to the target user to the server 300; here, in practical applications, the server 300 may be a single server configured to support various services, or may be a server cluster.
The server 300 is configured to perform feature extraction on the received voice instruction to obtain a biological feature parameter of the corresponding target user, and perform emotion recognition on the voice instruction to obtain a current emotion category of the corresponding target user; and determining and sending the virtual image identifier of the intelligent voice assistant corresponding to the biological characteristic parameters and the virtual emotion matched with the current emotion category of the target user and corresponding to the virtual image identifier.
The terminal 100 (such as the terminal 100-1) is further configured to receive the avatar identifier of the intelligent voice assistant returned by the server 300, and the virtual emotion matching with the current emotion category of the target user and corresponding to the avatar identifier; the terminal 100 may also present the avatar of the intelligent voice assistant indicated by the avatar identification in the graphical interface 110 (e.g., graphical interface 110-1 of terminal 100-1 and graphical interface 110-2 of terminal 100-2); and responding to an interactive instruction triggered by the intelligent voice assistant, and controlling the virtual image of the intelligent voice assistant to play voice conforming to the interactive instruction in a virtual emotion mode. By the scheme, each target user can have the exclusive virtual image of the intelligent voice assistant, the emotional expression of the virtual image of the intelligent voice assistant is increased, and the humanized interaction of the virtual image can be enhanced while the individual requirements of the user on the virtual image of the intelligent voice assistant are met.
The following is a description of the hardware structure of the processing device of the intelligent voice assistant that implements the processing method of the intelligent voice assistant according to the embodiment of the present invention. The processing device of the intelligent voice assistant may be implemented as a terminal device, may also be implemented as a server, and may also be implemented cooperatively with the terminal device and the server shown in fig. 1.
Referring to fig. 2A and fig. 2A, which are schematic diagrams illustrating an alternative hardware structure of a processing device 40 of an intelligent voice assistant according to an embodiment of the present invention, it is to be understood that fig. 2A only shows an exemplary structure of the processing device of the intelligent voice assistant, and a part of or the entire structure shown in fig. 2A may be implemented as needed. Taking the processing device 40 of the intelligent voice assistant as an example of a client arranged in a terminal device, the processing device 40 of the intelligent voice assistant provided by the embodiment of the present invention may include: at least one processor 410, memory 450, at least one network interface 420, and a user interface 430. The various components in the processing device 40 of the intelligent voice assistant are coupled together by a bus system 440. It will be appreciated that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in fig. 2A.
The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.
The memory 450 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 450 described in embodiments of the invention is intended to comprise any suitable type of memory.
In some embodiments, memory 450 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.
An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;
a network communication module 452 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
a presentation module 453 for enabling presentation of information (e.g., user interfaces for operating peripherals and displaying content and information) via one or more output devices 431 (e.g., display screens, speakers, etc.) associated with user interface 430;
an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.
In some embodiments, the processing apparatus of the intelligent voice assistant provided by the embodiments of the present invention may be implemented in software, fig. 2A illustrates the processing apparatus 455 of the intelligent voice assistant stored in the memory 450, which may be software in the form of programs and plug-ins, and includes a series of software modules, referring to fig. 3A, fig. 3A is a schematic diagram of an optional component structure of the processing apparatus 455 of the intelligent voice assistant provided by the embodiments of the present invention, for example, the processing apparatus 455 of the intelligent voice assistant provided by the embodiments of the present invention may include: an obtaining unit 4551, a first sending unit 4552, a first receiving unit 4553, a presenting unit 4554, and a control unit 4555, which are logical in function, and thus, may be arbitrarily combined or further split according to the functions implemented by the respective software modules. Here, it should be noted that specific functions of each unit in the processing device 455 of the intelligent voice assistant provided by the embodiment of the present invention shown in fig. 3A will be described below.
Referring to fig. 2B and fig. 2B, which are schematic diagrams illustrating another alternative hardware structure of a processing device 50 of an intelligent voice assistant according to an embodiment of the present invention, it is to be understood that fig. 2B only shows an exemplary structure of the processing device of the intelligent voice assistant, and a part of or the entire structure shown in fig. 2B may be implemented as needed. Taking the processing device 50 of the intelligent voice assistant as an example of a server, the processing device 50 of the intelligent voice assistant provided by the embodiment of the present invention may include: at least one processor 510, memory 550, at least one network interface 520, and a user interface 530. The various components in the processing device 50 of the intelligent voice assistant are coupled together by a bus system 540. It will be appreciated that the bus system 540 is used to enable communications among the components. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 540 in figure 2B.
The user interface 530 includes one or more output devices 531 enabling presentation of media content, including one or more speakers and/or one or more visual display screens. The user interface 530 also includes one or more input devices 532, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
In some embodiments, memory 550 can store data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 552 for communicating to other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
a presentation module 553 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;
an input processing module 554 to detect one or more user inputs or interactions from one of the one or more input devices 532 and to translate the detected inputs or interactions.
In some embodiments, the processing apparatus of the intelligent voice assistant provided by the embodiments of the present invention may be implemented in software, fig. 2B illustrates the processing apparatus 555 of the intelligent voice assistant stored in the memory 550, which may be software in the form of programs and plug-ins, and includes a series of software modules, referring to fig. 3B, where fig. 3B is a schematic diagram of an optional component structure of the processing apparatus 555 of the intelligent voice assistant provided by the embodiments of the present invention, for example, the processing apparatus 555 of the intelligent voice assistant provided by the embodiments of the present invention may include: the functions of the second receiving unit 5551, the extracting unit 5552, the identifying unit 5553, the determining unit 5554, and the second transmitting unit 5555 are logical, and thus, any combination or further division may be performed according to the functions implemented by the respective software modules. Here, it should be noted that specific functions of each unit in the processing device 555 of the intelligent voice assistant provided by the embodiment of the present invention shown in fig. 3B will be described below.
In other embodiments, the processing device 455 of the intelligent voice assistant (or the processing device 555 of the intelligent voice assistant) provided by the embodiments of the present invention may be implemented in hardware, and by way of example, the processing device 455 of the intelligent voice assistant (or the processing device 555 of the intelligent voice assistant) provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor, for example, the processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
Based on the above description of exemplary applications and implementations of the processing system of the intelligent voice assistant and the processing device of the intelligent voice assistant provided by the embodiment of the present invention, an implementation of the processing method of the intelligent voice assistant provided by the embodiment of the present invention is described next.
Referring to fig. 4, fig. 4 is an optional flowchart of a processing method of an intelligent voice assistant according to an embodiment of the present invention, in some embodiments, the processing method of the intelligent voice assistant may be implemented by a terminal device, or implemented by a server, or implemented by the terminal device and the server in a cooperative manner, where the terminal device is provided with a client, and the following takes the terminal device as an example, as implemented by the terminal 100 in fig. 1, and is described with reference to the steps shown in fig. 4.
In step 401, the terminal device obtains a voice command corresponding to a target user.
In the embodiment of the present invention, the terminal device may obtain the voice instruction corresponding to the target user by: and responding to voice instruction input operation triggered by a user interface based on the client, and acquiring a voice instruction input by a target user.
Here, in actual implementation, the terminal device presents a voice instruction editing portal through the user interface, so that the target user can edit the voice instruction through the portal. Illustratively, the terminal device may present a voice instruction editing entry in the user interface in the form of an icon, and the target user may enter a voice instruction editing page by clicking the icon presented by the terminal device to trigger an editing instruction of the voice instruction.
Referring to fig. 5, fig. 5 is an optional interface schematic diagram of the terminal device presenting a voice instruction editing entry according to the embodiment of the present invention, an icon of a "voice instruction" is presented in a user page of the terminal device, a target user can trigger entering the voice instruction editing page by clicking the icon, and an input operation of the voice instruction is performed on the voice instruction editing page to obtain a voice instruction input by the target user.
In step 402, a voice command is sent to perform feature extraction on the voice command to obtain a biometric parameter of the corresponding target user, and perform emotion recognition on the voice command to obtain a current emotion category of the corresponding target user.
In the embodiment of the present invention, since each target user has differentiation, different target users can be distinguished by using uniqueness and uniqueness of the biological characteristic parameters of the target user; the biometric parameters include, but are not limited to, voiceprint characteristic parameters, face characteristic parameters, iris characteristic parameters, and the like.
In some embodiments, after obtaining the biometric parameters of the corresponding target user, the server may generate the avatar identifier of the corresponding intelligent voice assistant based on the biometric parameters of the target user, and then return the generated avatar identifier of the intelligent voice assistant to the terminal device.
In some embodiments, the server may further perform the following technical solution before generating the avatar identification of the corresponding intelligent voice assistant based on the biometric parameters of the target user: identifying the identity of a target user initiating a voice instruction based on the biometric parameters; when the target user is identified as an authorized user, determining to continue generating the virtual image identifier of the corresponding intelligent voice assistant based on the biological characteristic parameters of the target user; and when the target user is identified to be an unauthorized user, returning prompt information that the target user does not have the operation authority to the terminal equipment.
Specifically, before generating the avatar identifier of the corresponding intelligent voice assistant, the server performs authorization authentication on an initiator of the voice command, different target users can upload the voice command to the server by using the same terminal device, in order to prevent a malicious user from using the terminal device under the condition that the authorized user is not aware of, the server performs voiceprint recognition on the received voice command, extracts biological characteristic parameters of the target user in the voice command, such as voiceprint characteristic parameters, identifies the identity of the target user sending the voice command by the voiceprint characteristic parameters, and when the target user is the authorized user, the server continues to generate the avatar identifier of the corresponding intelligent voice assistant based on the voiceprint characteristic parameters of the target user to determine the avatar of the intelligent voice assistant; when the target user is an unauthorized user, the target user is prompted to have no operation authority and cannot interact with the intelligent voice assistant, and before authorization passes, the corresponding virtual image identifier of the intelligent voice assistant cannot be generated based on the voiceprint characteristic parameters.
In actual implementation, the terminal device sends the voice command to the server so that the server performs emotion recognition on the voice command to obtain a current emotion category corresponding to the target user, and generates a virtual emotion which is matched with the current emotion category of the target user and corresponds to the virtual character identifier. Here, the current emotion category of the target user may be used to represent an emotional tendency of the target user expressed in the voice command, and the emotional tendency may be a three-dimensional feature, for example, a three-classification feature including positive, neutral, and negative emotions, or, of course, a two-dimensional feature, that is, the current emotion category of the target user is directly predicted by performing emotion recognition on the voice command input by the target user through the regression task, and is a positive emotion (such as a happy category) or a negative emotion (such as a sad category).
It should be noted that, the server stores in advance a correspondence between the emotion category of the target user and the virtual emotion, and based on the correspondence, the server can query the virtual emotion matching with the current emotion category of the target user.
In step 403, the returned avatar identification of the intelligent voice assistant corresponding to the biometric parameters and the virtual emotion matching with the current emotion category of the target user and corresponding to the avatar identification are received.
In some embodiments, the terminal device may receive the returned avatar identification of the intelligent voice assistant corresponding to the biometric parameters by:
submitting an avatar acquisition request corresponding to the intelligent voice assistant to a server so that the server queries avatar identifications of the intelligent voice assistant adapted to the biological characteristic parameters in a database based on the biological characteristic parameters of the target user; and receiving the virtual image identification of the intelligent voice assistant which is sent by the server and is adapted to the biological characteristic parameters.
In the embodiment of the invention, the database stores the corresponding relationship between the biological characteristic parameters of a plurality of target users and the virtual image identifiers, and when the method is actually implemented, for the intelligent voice assistant in different environments, as the user data of the target users in the environment where the intelligent voice assistant is located are different, each target user using the intelligent voice assistant in the same environment has personalized selection on the virtual image of the intelligent voice assistant, associates the selected virtual image identifier of the virtual image of the intelligent voice assistant with the biological characteristic parameters of the target user, and stores the corresponding relationship between the two.
Here, the user data may be at least one dimension of user data in an environment where the intelligent voice assistant is located, wherein the at least one dimension of user data includes at least one of: user preferences, user tags, user roles, etc., for example, user tags may be determined based on the age group of the target user, including child tags, young tags, elderly tags; if the environment where the intelligent voice assistant is located is a family, the user role may be an attribute that the target user has in the family, for example, the target user a is a child in the family, and the avatar identifier of the intelligent voice assistant corresponding to the target user a may be an avatar identifier of a piglet peclet, a maiden fighter, or the like.
In some embodiments, before the terminal device receives the returned avatar identification of the intelligent voice assistant corresponding to the biometric parameters, the processing method of the intelligent voice assistant further includes:
receiving a returned voice recognition instruction, wherein the voice recognition instruction represents a judgment result for whether the biological characteristic parameters are recorded in the client; when the judgment result is that the biological characteristic parameters are recorded in the client, sending a first prompt message to the target user to prompt the target user to confirm the biological characteristic account of the target user; and when the judgment result is that the biological characteristic parameters are not recorded in the client, sending a second prompt message to the target user to prompt the target user to select the virtual image of the intelligent voice assistant, and storing the selected virtual image of the intelligent voice assistant in a database of the server.
Here, in practical implementation, when the recorded biometric parameters are not found in the client, the terminal device will present or send a second prompt message to the target user, where the second prompt message is used to prompt the target user for identification and to select the avatar of the intelligent voice assistant. For example, a second prompt message may be further presented on the basis of fig. 5, referring to fig. 6, fig. 6 is an optional interface schematic diagram for presenting the second prompt message on the terminal device according to the embodiment of the present invention, and the content of the second prompt message as shown in fig. 6 may be "who is not known, and a personalized avatar selection interface may be entered to select an avatar of the smart voice assistant of the mood meter".
In some embodiments, the terminal device may send a second prompting message to the target user to prompt the target user to select the avatar of the intelligent voice assistant by: presenting an avatar selection interface for selecting the avatar of the intelligent voice assistant in response to a setting request of the avatar of the intelligent voice assistant triggered by the target user based on the second prompt message; and responding to an avatar selection instruction triggered based on the avatar selection interface, and acquiring the avatar of the intelligent voice assistant which accords with the preference of the target user.
Here, in actual implementation, the avatar of the initialized intelligent voice assistant (automatically set by the terminal system) will be presented in the avatar selection interface, and if the target user is satisfied with the avatar of the initialized intelligent voice assistant, the initialized avatar can be determined to be the avatar according with the preference of the target user; if the target user is not satisfied with the initialized virtual image of the intelligent voice assistant, the virtual image of the intelligent voice assistant can be reselected through a virtual image selection instruction triggered based on the virtual image selection interface.
Referring to fig. 7, fig. 7 is an optional schematic diagram of an avatar selection interface according to an embodiment of the present invention, a search bar and a search control for a target user to search for an avatar may be displayed in the avatar selection interface, arrow a indicates the search control, arrow E indicates the search bar, or referred to as a search box, the target user may input keywords of the avatar through the search bar, such as "rally" and "pig peck", and then search for a custom avatar may be implemented by clicking the search control. Arrow controls in the left direction and the right direction are also displayed in the avatar selection interface, for example, a left direction arrow control indicated by an arrow B and a right direction arrow control indicated by an arrow C, the target user can select an avatar according with the preference of the target user from the avatar list by clicking the left direction arrow control or the right direction arrow control, wherein the target user can turn pages upwards by clicking the left direction arrow control and return to the previous avatar, and can turn pages downwards by clicking the right direction arrow control and select the next avatar. When the target user is satisfied with the searched custom avatar or the avatar selected through page turning, confirmation of the searched or selected avatar may be achieved by clicking a "confirm" button indicated by arrow D.
In the embodiment of the invention, after the terminal device detects that the target user searches or selects the virtual image, the terminal device can also perform matching confirmation on the biological characteristic parameters of the target user, such as voiceprint characteristic parameters, so as to confirm the identity of the target user. Because each target user corresponds to different voiceprint characteristic parameters, the virtual image of the dedicated intelligent voice assistant of each target user has a corresponding relation with the voiceprint characteristic parameters.
Illustratively, the terminal device prompts the target user to input a piece of voice data through a popup box, so that matching confirmation of the voiceprint characteristic parameters of the target user can be realized. Referring to fig. 8, fig. 8 is an optional schematic diagram of identity confirmation of a target user according to an embodiment of the present invention, a popup box is presented in a user interface of a terminal device, and according to a prompt of the popup box, the target user reads following text contents "light before bed, bright moon, and supposedly frosted on the ground" through speech, and when the speech of the target user is completely consistent with the text contents, it may be determined that a voiceprint feature parameter of the target user is successfully matched, which indicates that the identity of the target user is legal.
In practical implementation, after the target user finishes reading the text content, the setting of the specific name of the target user can be finished in a voice input mode. Referring to fig. 9, fig. 9 is an optional schematic diagram of performing specific name confirmation on a target user according to an embodiment of the present invention, where a message "your exclusive assistant" is presented in fig. 9, and a host, who wants to call you, and the target user may input a name that wants to be called by voice, such as "please call my swordman" by voice, so that during a subsequent human-computer interaction process, an avatar of an exclusive intelligent voice assistant interacts with the target user and calls the target user by using the specific name "swordman".
In step 404, the avatar of the intelligent voice assistant indicated by the avatar identification is presented.
In some embodiments, the terminal device may present the avatar of the intelligent voice assistant indicated by the avatar identification by: determining an avatar corresponding to the intelligent voice assistant based on the indication of the avatar identifier; acquiring image resources of a virtual image corresponding to the intelligent voice assistant; presenting a default avatar of an avatar of the intelligent voice assistant based on the avatar resources, the default avatar of the avatar including at least one of: a default skin of the avatar; the default prop of the avatar.
Here, as for the image resources of the avatar, development of the image resources by the art resource provider may be unified to ensure the presentation quality of the avatar, that is, the image resources of the avatar of the intelligent voice assistant acquired by the terminal device are the image resources uploaded by the art resource provider, wherein the image resources include at least one of: scene resources, model resources, skin resources, and action resources. The model resource can be a two-dimensional model or a three-dimensional model, and the default image of the virtual image of the intelligent voice assistant is presented on the terminal equipment in the embodiment of the invention.
In some embodiments, for the avatar of the intelligent voice assistant, when being presented on the display interface of the terminal device for the first time, the avatar will appear with an initialized default avatar, where the initialized default avatar may include a default scene where the avatar is presented for the first time, a skin presented by the avatar for the first time, a prop worn by the avatar for the first time, and a special effect surrounding the avatar.
In some embodiments, the processing method of the intelligent voice assistant further comprises: receiving returned emotion animations corresponding to the virtual image identifications, wherein the emotion animations are generated by the server based on the current emotion types of the target users; and when the virtual image of the intelligent voice assistant indicated by the virtual image identification is presented, presenting the emotional animation corresponding to the virtual image identification.
Here, the server stores an emotion animation package corresponding to the avatar identifier, such as emotion animations that may include happy, celebrated, hard, comforted, lovely, etc., and after obtaining the current emotion category of the corresponding target user based on the voice command, the server may query an emotion animation matching the current emotion category and return the queried emotion animation to the terminal device to present the emotion animation on the display interface of the terminal device. It should be noted that the number of the emotion animations matched with the current emotion category may be one or more, and is not limited herein.
In step 405, in response to the interactive instruction triggered by the intelligent voice assistant, the avatar of the intelligent voice assistant is controlled to play voice conforming to the interactive instruction in a virtual emotion manner.
In some embodiments, the terminal device may control the avatar of the intelligent voice assistant to play the voice conforming to the interactive instruction in a virtual emotion manner as follows:
the method comprises the steps that an interactive instruction triggered by a target user based on an intelligent voice assistant is obtained, and the interactive instruction is sent to a server, so that when the interactive instruction comprises the voice interactive instruction, the server performs voice recognition on the voice interactive instruction to obtain corresponding text information, performs semantic recognition on the text information, and obtains the intention corresponding to the voice interactive instruction; and receiving the intention of the returned voice interaction instruction, and controlling the virtual image of the intelligent voice assistant to play the voice conforming to the interaction instruction in a virtual emotion mode on the basis of the control instruction corresponding to the intention of the voice interaction instruction.
In an embodiment of the present invention, the intent of the corresponding voice interaction instruction may include at least one of: the method comprises the steps of intelligent conversation, equipment control, vehicle-mounted device message leaving, adding of props of the intelligent voice assistant, switching of scenes where the intelligent voice assistant is located, and setting of active time of the intelligent voice assistant.
The terminal device uploads a voice interaction instruction corresponding to the intelligent voice assistant to the server, when the server receives the voice interaction instruction, voice recognition is carried out on the voice interaction instruction so as to convert the received voice into text information, semantic analysis is carried out on the converted text information so as to obtain an intention corresponding to the voice interaction instruction, and the obtained intention is returned to the terminal device. And then, the terminal equipment can control the virtual image of the intelligent voice assistant to play the voice conforming to the interactive instruction in a virtual emotion mode based on the control instruction corresponding to the intention of the voice interactive instruction.
Here, the intelligent dialog includes a chat intention, a knowledge question and answer intention, a weather query intention, and the like; the equipment control comprises controlling furniture intelligent equipment through an intelligent voice assistant client on a vehicle-mounted system, and controlling the intelligent equipment in a vehicle at home through the intelligent voice assistant client in the terminal equipment; the vehicle-mounted voice assistant client sends a message to an intelligent product corresponding to the intelligent voice assistant.
Here, in actual implementation, the Text-To-Speech (TTS) module in the server may perform semantic recognition on the Text information To obtain an intention corresponding To the voice interaction instruction.
In order to facilitate the safe storage and tamper-free of the avatar of the exclusive intelligent voice assistant of the target user, in some embodiments, the processing method of the intelligent voice assistant further comprises: and sending the identification of the target user and the virtual image of the intelligent voice assistant corresponding to the identification of the target user to a block chain network, so that the node of the block chain network fills the identification of the target user and the virtual image of the intelligent voice assistant corresponding to the identification of the target user into a new block, and when the new block is identified in a consistent manner, the new block is added to the tail part of the block chain.
Here, specifically, after the terminal device receives the avatar identifier of the intelligent voice assistant corresponding to the biometric parameter returned by the server and determines the avatar of the intelligent voice assistant indicated by the avatar identifier, it may further combine with the blockchain technique to generate a transaction for storing the identifier of the target user and the avatar of the intelligent voice assistant corresponding to the identifier of the target user, and submit the generated transaction to the node of the blockchain network, so that the node of the blockchain network stores the identifier of the target user after the node of the blockchain network identifies the transaction together, and the avatar of the intelligent voice assistant corresponding to the identifier of the target user is submitted to the blockchain network. Therefore, the target user identification and the virtual image of the intelligent voice assistant corresponding to the target user identification are subjected to uplink storage, and the recorded backup is realized, so that the safety of the virtual image of the exclusive intelligent voice assistant of the target user is ensured.
Next, a block chain network according to an embodiment of the present invention will be described. Referring to fig. 10, fig. 10 is a schematic diagram of an application architecture of a blockchain network according to an embodiment of the present invention, which includes a blockchain network 81 (exemplarily illustrating a consensus node 810-1 to a consensus node 810-3), an authentication center 82, a service entity 83, and a service entity 84, which are described below.
The type of blockchain network 81 is flexible and may be, for example, any one of a public chain, a private chain, or a federation chain. Taking a public link as an example, electronic devices of any business entity, such as a user terminal and a server (e.g., a cloud server), can access the blockchain network 81 without authorization; taking a federation chain as an example, an electronic device (e.g., a terminal/server) under the jurisdiction of a service entity after obtaining authorization may access the blockchain network 81, and at this time, become a client node in the blockchain network 81.
In some embodiments, the client node may act as a mere watcher of the blockchain network 81, i.e., provides functionality to support a business entity to initiate a transaction (e.g., for uplink storage of data or querying of data on a chain), and may be implemented by default or selectively (e.g., depending on the specific business requirements of the business entity) for the functions of the consensus node 810 in the blockchain network 81, such as a ranking function, a consensus service, and an accounting function, etc. Therefore, the data and the service processing logic of the service subject can be migrated to the block chain network 81 to the maximum extent, and the credibility and traceability of the data and service processing process are realized through the block chain network 81.
The consensus nodes in the blockchain network 81 receive transactions submitted from client nodes (e.g., client node 410 attributed to business entity 83, and client node 510 attributed to business entity 84, shown in fig. 10) of different business entities (e.g., business entity 83 and business entity 84, shown in fig. 10), perform the transactions to update the ledger or query the ledger, and various intermediate or final results of performing the transactions may be returned to the business entity's client nodes for display.
For example, the client node 410/510 may subscribe to events of interest in the blockchain network 81, such as transactions occurring in a particular organization/channel in the blockchain network 81, and the consensus node 810 pushes corresponding transaction notifications to the client node 410/510, thereby triggering corresponding business logic in the client node 410/510.
An exemplary application of the blockchain network is described below, taking as an example that a plurality of service entities access the blockchain network to implement management of the avatar of the intelligent voice assistant. Referring to fig. 10, a plurality of business entities involved in the management link, such as a business entity 83, may be application clients, for example, clients implementing the processing method of the intelligent voice assistant according to the embodiment of the present invention, register and register from the certificate authority 82 to obtain a digital certificate, where the digital certificate includes a public key of the business entity and a digital signature signed by the certificate authority 82 on the public key and identity information of the business entity, and is used to be attached to a transaction together with the digital signature of the business entity for the transaction, and sent to the blockchain network, so that the blockchain network takes out the digital certificate and the signature from the transaction, verifies the authenticity of the message (i.e., whether the message is not tampered) and the identity information of the business entity sending the message, and the blockchain network 81 verifies the identity, for example, whether the client has the right to initiate the transaction. Clients running on electronic devices (e.g., terminals or servers) hosted by the business entity may request access from the blockchain network 81 to become client nodes.
The client node 410 of the service agent 83 is configured to obtain a voice instruction corresponding to the target user, and send the voice instruction to the server, so that the server performs feature extraction on the voice instruction to obtain a biometric parameter corresponding to the target user; receiving the returned avatar identification of the intelligent voice assistant corresponding to the biometric parameters, determining the avatar of the intelligent voice assistant indicated by the avatar identification, and sending the identification of the target user and the avatar of the intelligent voice assistant corresponding to the identification of the target user to the blockchain network 81.
Here, the identification of the target user and the avatar of the intelligent voice assistant corresponding to the identification of the target user are sent to the blockchain network 81, service logic may be set in the client node 410 in advance, and when the avatar of the intelligent voice assistant is found based on the identification of the target user, the client node 410 automatically sends the avatar of the intelligent voice assistant to the blockchain network 81, or a service person of the service agent 83 logs in the client node 410, manually packs the avatars of the intelligent voice assistants corresponding to the identifications of a plurality of target users, and sends them to the blockchain network 81. During sending, the client node 410 generates a transaction corresponding to the update operation according to the avatar of the intelligent voice assistant corresponding to the identifications of the plurality of target users, specifies an intelligent contract to be invoked for implementing the update operation and parameters passed to the intelligent contract, and also carries the digital certificate of the client node 410 and a signed digital signature (for example, a digest of the transaction is encrypted by using a private key in the digital certificate of the client node 410), and broadcasts the transaction to the consensus node 810 in the blockchain network 81.
When the consensus node 810 in the blockchain network 81 receives the transaction, the digital certificate and the digital signature carried by the transaction are verified, after the verification is successful, whether the business main body 83 has the transaction right or not is confirmed according to the identity of the business main body 83 carried in the transaction, and the transaction fails due to any verification judgment of the digital signature and the right verification. After successful verification, node 810 signs its own digital signature (e.g., by encrypting the digest of the transaction using the private key of node 810-1) and continues to broadcast in blockchain network 81.
After receiving the transaction successfully verified, the consensus node 810 in the blockchain network 81 fills the transaction into a new block and broadcasts the new block. When broadcasting a new block, the consensus node 810 in the block chain network 81 performs a consensus process on the new block, if the consensus is successful, the new block is added to the tail of the block chain stored in the new block, the state database is updated according to the result of the transaction, and the transaction in the new block is executed: and adding a key value pair comprising the avatar of the intelligent voice assistant corresponding to the identification of the target user in the state database for the transaction of submitting the avatar of the intelligent voice assistant corresponding to the identification of the target user.
As an example of a block chain, referring to fig. 11, fig. 11 is an optional structural schematic diagram of a block chain in a block chain network 81 provided in an embodiment of the present invention, a header of each block may include hash values of all transactions in the block and also include hash values of all transactions in a previous block, a record of a newly generated transaction is filled in the block and is added to a tail of the block chain after being identified by nodes in the block chain network, so as to form a chain growth, and a chain structure based on hash values between blocks ensures tamper resistance and forgery prevention of transactions in the block.
An exemplary functional architecture of the blockchain network provided by the embodiment of the present invention is described below, referring to fig. 12, fig. 12 is a schematic functional architecture diagram of a blockchain network 81 provided by the embodiment of the present invention, which includes an application layer 201, a consensus layer 202, a network layer 203, a data layer 204, and a resource layer 205, which are described below respectively.
The resource layer 205 encapsulates the computing, storage, and communication resources that implement each node 810 in the blockchain network 81.
The data layer 204 encapsulates various data structures that implement the ledger, including blockchains implemented in files in a file system, state databases of the key-value type, and presence certificates (e.g., hash trees of transactions in blocks).
The network layer 203 encapsulates the functions of a Point-to-Point (P2P) network protocol, a data propagation mechanism and a data verification mechanism, an access authentication mechanism and service agent identity management.
Wherein the P2P network protocol implements communication between nodes 810 in the blockchain network 81, the data propagation mechanism ensures propagation of transactions in the blockchain network 81, and the data verification mechanism is used for implementing reliability of data transmission between nodes 810 based on cryptography methods (e.g., digital certificates, digital signatures, public/private key pairs); the access authentication mechanism is used for authenticating the identity of the service subject added to the block chain network 81 according to an actual service scene, and endowing the service subject with the authority of accessing the block chain network 81 when the authentication is passed; the service agent identity management is used to store the identity of the service agent that is allowed to access the blockchain network 81, as well as the rights (e.g., the type of transaction that can be initiated).
The consensus layer 202 encapsulates the functionality of the mechanisms by which nodes 810 in the blockchain network 81 agree on a block (i.e., consensus mechanisms), transaction management, and ledger management. The consensus mechanism comprises consensus algorithms such as POS, POW and DPOS, and the pluggable consensus algorithm is supported.
The transaction management is used for verifying the digital signature carried in the transaction received by the node 810, verifying the identity information of the service subject, and determining whether the service subject has the right to perform the transaction (reading the relevant information from the identity management of the service subject) according to the identity information; for the service agents authorized to access the blockchain network 81, the service agents all have digital certificates issued by the certificate authority, and the service agents sign submitted transactions by using private keys in the digital certificates of the service agents, so that the legal identities of the service agents are declared.
The ledger administration is used to maintain blockchains and state databases. For the block with the consensus, adding the block to the tail of the block chain; executing the transaction in the acquired consensus block, updating the key-value pairs in the state database when the transaction comprises an update operation, querying the key-value pairs in the state database when the transaction comprises a query operation and returning a query result to the client node of the business entity. Supporting query operations for multiple dimensions of a state database, comprising: querying the block based on the block vector number (e.g., hash value of the transaction); inquiring the block according to the block hash value; inquiring a block according to the transaction vector number; inquiring the transaction according to the transaction vector number; inquiring account data of a business main body according to an account (vector number) of the business main body; and inquiring the block chain in the channel according to the channel name.
The application layer 201 encapsulates various services that the blockchain network can implement, including tracing, crediting, and verifying transactions.
By adopting the processing method of the intelligent voice assistant provided by the embodiment of the invention, on one hand, each target user has the own exclusive virtual image of the intelligent voice assistant based on the uniqueness and uniqueness of the biological characteristic parameters of the target user, and the individualized requirements of the target user on the virtual image of the intelligent voice assistant can be met; the terminal presents the virtual image of the intelligent voice assistant indicated by the virtual image identification, so that the virtual image of the intelligent voice assistant has a visual effect; on the other hand, in the process of interaction between the intelligent voice assistant and the target user, the emotional expression of the virtual image of the intelligent voice assistant is increased so as to enhance the humanized interaction of the virtual image, thereby improving the good sensitivity and the user viscosity of the target user to the application intelligent voice assistant product and improving the use experience of the target user.
Referring to fig. 13, fig. 13 is another optional flowchart of the processing method of the intelligent voice assistant according to an embodiment of the present invention, in some embodiments, the processing method of the intelligent voice assistant may be implemented by a terminal device, or implemented by a server, or implemented by the terminal device and the server in a cooperation manner, the terminal device is provided with a client, and the following takes the server as an example, and the description is made with reference to the steps shown in fig. 13. For details which are not exhaustive in the following description of the steps, reference is made to the above for an understanding.
In step 1201, the server receives a voice instruction of the target user sent by the client.
In some embodiments, the voice instruction of the target user may be collected by the client by using a voice collecting device such as a microphone, and then the client sends the collected voice instruction of the target user to the server.
In step 1202, feature extraction is performed on the voice command to obtain a biometric parameter of the corresponding target user, and emotion recognition is performed on the voice command to obtain a current emotion category of the corresponding target user.
In step 1203, avatar identification of the intelligent voice assistant corresponding to the biometric parameters and virtual emotion matching with the current emotion category of the target user and corresponding to the avatar identification are determined.
In some embodiments, after obtaining the biometric parameters of the corresponding target user, the server may generate the avatar identifier of the corresponding intelligent voice assistant based on the biometric parameters of the target user, and then return the generated avatar identifier of the intelligent voice assistant to the terminal device.
In some embodiments, the server may further perform the following technical solution before generating the avatar identification of the corresponding intelligent voice assistant based on the biometric parameters of the target user: identifying the identity of a target user initiating a voice instruction based on the biometric parameters; when the target user is identified as an authorized user, determining to continue generating the virtual image identifier of the corresponding intelligent voice assistant based on the biological characteristic parameters of the target user; and when the target user is identified to be an unauthorized user, returning prompt information that the target user does not have the operation authority to the terminal equipment.
It should be noted that, the server stores in advance a correspondence between the emotion category of the target user and the virtual emotion, and based on the correspondence, the server can query the virtual emotion matching with the current emotion category of the target user.
In step 1204, the avatar identification and the emotion are sent to the client, so that the client presents the avatar of the intelligent voice assistant indicated by the avatar identification, and controls the avatar of the intelligent voice assistant to play the voice conforming to the interactive instruction in the emotion manner in response to the interactive instruction triggered by the intelligent voice assistant.
Next, continuing to describe the processing method of the intelligent voice assistant provided by the embodiment of the present invention, referring to fig. 14, fig. 14 is another optional flowchart of the processing method of the intelligent voice assistant provided by the embodiment of the present invention, in some embodiments, the processing method of the intelligent voice assistant may be implemented by a terminal device, or implemented by the terminal device and a server in a cooperative manner, for example, implemented by the terminal device 100-1 and the server 300 in fig. 1 in a cooperative manner, and described with reference to the steps shown in fig. 14. For details which are not exhaustive in the following description of the steps, reference is made to the above for an understanding.
In step 1301, the terminal device obtains a voice command input by a target user.
In some embodiments, the terminal device may employ a voice acquisition device such as a microphone to acquire the voice instruction input by the target user. The voice acquisition device such as a microphone can be arranged in the terminal device, and can also be an electronic device with a voice acquisition function and a communication link with the terminal device.
In step 1302, the terminal device uploads the voice command to the server.
In step 1303, the server performs feature extraction on the voice command to obtain a biometric parameter of the corresponding target user, and generates a virtual image identifier of the corresponding intelligent voice assistant based on the biometric parameter.
In some embodiments, the server may further identify the identity of the target user initiating the voice instruction based on the biometric parameters before generating the avatar identification of the corresponding intelligent voice assistant based on the biometric parameters of the target user; when the target user is identified as an authorized user, determining to continue generating the virtual image identifier of the corresponding intelligent voice assistant based on the biological characteristic parameters of the target user; and when the target user is identified to be an unauthorized user, returning prompt information that the target user does not have the operation authority to the terminal equipment.
In the embodiment of the invention, as each target user has differentiation, different target users can be distinguished by utilizing the uniqueness and uniqueness of the biological characteristic parameters of the target users; the biometric parameters include, but are not limited to, voiceprint characteristic parameters, face characteristic parameters, iris characteristic parameters, and the like.
In step 1304, the server performs emotion recognition on the voice command to obtain a current emotion category corresponding to the target user, and generates a virtual emotion matching with the current emotion category of the target user and corresponding to the avatar identifier.
Here, the server stores in advance a correspondence between the emotion type of the target user and the virtual emotion, and can search for a virtual emotion matching the current emotion type of the target user based on the correspondence.
In step 1305, the terminal device receives the avatar identification of the intelligent voice assistant returned by the server and the generated virtual emotion.
In some embodiments, the terminal device may receive the avatar identification of the intelligent voice assistant returned by the server by:
submitting an avatar acquisition request corresponding to the intelligent voice assistant to a server so that the server queries avatar identifications of the intelligent voice assistant adapted to the biological characteristic parameters in a database based on the biological characteristic parameters of the target user; and receiving the virtual image identification of the intelligent voice assistant which is sent by the server and is adapted to the biological characteristic parameters.
Here, the database stores the correspondence between the biometric parameters of a plurality of target users and the avatar identifications.
In some embodiments, before receiving the returned virtual image identifier of the intelligent voice assistant corresponding to the biometric parameter, the terminal device may further receive a returned voice recognition instruction, where the voice recognition instruction represents a determination result of whether the biometric parameter is recorded in the client; when the judgment result is that the biological characteristic parameters are recorded in the client, sending a first prompt message to the target user to prompt the target user to confirm the biological characteristic account of the target user; and when the judgment result is that the biological characteristic parameters are not recorded in the client, sending a second prompt message to the target user to prompt the target user to select the virtual image of the intelligent voice assistant, and storing the selected virtual image of the intelligent voice assistant in a database of the server.
In step 1306, the terminal device presents the avatar of the intelligent voice assistant indicated by the avatar identification.
In some embodiments, the terminal device may present the avatar of the intelligent voice assistant indicated by the avatar identification by: determining an avatar corresponding to the intelligent voice assistant based on the indication of the avatar identifier; acquiring image resources of a virtual image corresponding to the intelligent voice assistant; presenting a default avatar of an avatar of the intelligent voice assistant based on the avatar resources, the default avatar of the avatar including at least one of: a default skin of the avatar; the default prop of the avatar.
In some embodiments, the terminal device may further receive an emotion animation corresponding to the avatar identifier, which is returned by the server, where the emotion animation is generated by the server based on the current emotion category of the target user; and when the virtual image of the intelligent voice assistant indicated by the virtual image identification is presented, presenting the emotional animation corresponding to the virtual image identification.
In step 1307, the terminal device obtains a voice interaction instruction triggered by the target user based on the intelligent voice assistant.
In step 1308, the terminal device sends a voice interaction command to the server.
In step 1309, the server performs speech recognition on the speech interaction instruction to obtain corresponding text information, and performs semantic recognition on the text information to obtain an intention corresponding to the speech interaction instruction.
Here, the intent of the voice interaction instruction may include at least one of: the method comprises the steps of intelligent conversation, equipment control, vehicle-mounted device message leaving, adding of props of the intelligent voice assistant, switching of scenes where the intelligent voice assistant is located, and setting of active time of the intelligent voice assistant.
In step 1310, the server returns the intent of the voice interaction instruction to the terminal device.
In step 1311, the terminal device controls the avatar of the intelligent voice assistant to play the voice corresponding to the interactive instruction in a virtual mood manner based on the control instruction corresponding to the intention of the voice interactive instruction.
By the mode, each target user has the own exclusive virtual image of the intelligent voice assistant based on the uniqueness and uniqueness of the biological characteristic parameters of the target user, and the individual requirements of the target user on the virtual image of the intelligent voice assistant can be met; the terminal presents the virtual image of the intelligent voice assistant indicated by the virtual image identification, so that the virtual image of the intelligent voice assistant has a visual effect; in the interaction process of the intelligent voice assistant and the target user, the emotional expression of the virtual image of the intelligent voice assistant is added to enhance the humanized interaction of the virtual image, so that the good sensitivity and the user viscosity of the application intelligent voice assistant product of the target user can be improved, and the use experience of the target user is improved.
In the following, an exemplary application of the embodiments of the present invention in a practical application scenario will be described.
In the related art, an intelligent voice assistant applied in a household usually has a fixed avatar, or does not have a corresponding avatar, and has a fixed avatar immediately, however, because of the use of the household, each target user in the household has own preference, the fixed avatar cannot meet the personalized requirements of each target user in the household, and the avatar cannot perform voice interaction with the target user based on emotional expression.
In order to solve the technical problems, the embodiment of the invention provides a scheme suitable for individual requirements of each target user in a family on the virtual image of an intelligent voice assistant, each target user is identified through voiceprint recognition, and each target user can select the favorite virtual image of the intelligent voice assistant, such as other cartoon images of pig cookies, beauty girl fighters and the like, as the exclusive virtual image of the intelligent voice assistant; after a specific target user is identified, the virtual image of the corresponding intelligent voice assistant can be on-line by using voice control; each voice assistant stores the corresponding 'owner' video viewing history and favorite skills (stocks, poems and the like), identifies the current emotion of the target user through emotion recognition, and makes feedback of virtual emotion corresponding to the current emotion of the target user.
The technical scheme of the invention mainly comprises two parts, namely an avatar setting process of the intelligent voice assistant and a virtual emotion feedback process of the avatar. The following description will be made separately.
Referring to fig. 15, fig. 15 is a schematic view of an optional process for setting an avatar of an intelligent voice assistant according to an embodiment of the present invention, in which a terminal device collects a voice instruction input by a target user, where the voice instruction carries voiceprint feature parameters of the target user, and determines whether the voiceprint feature parameters of the target user are never entered, if so, the target user is prompted to select the avatar of the intelligent voice assistant, and perform identity confirmation, and if not, the preset avatar of the intelligent voice assistant corresponding to the target user is obtained. Referring to fig. 16, fig. 16 is an optional schematic flow chart of a virtual emotion feedback process of an avatar provided in an embodiment of the present invention, in which a terminal device acquires a voice instruction input by a target user, sends the voice instruction to a background server, the background server analyzes the voice instruction through emotion recognition to obtain a current emotion category of the target user, queries a virtual emotion matching with the current emotion category of the target user, and feeds back the queried virtual emotion to the terminal device, and in some embodiments, the background server may also feed back an emotion animation corresponding to the virtual emotion to the terminal device.
In the technical scheme of the invention, the virtual image setting of the intelligent voice assistant can be realized by a voiceprint-based personalized virtual image account system, and the flow of the voiceprint-based personalized virtual image account system is explained below.
Referring to fig. 17, fig. 17 is a schematic view of an alternative process of the voiceprint-based personalized avatar account system according to the embodiment of the present invention, including the following steps:
in step 1601, a user a inputs a voice instruction in a client;
in step 1602, the client uploads the acquired voice command of the user a to the background server;
here, the input voice command carries the voiceprint feature parameters of the user a.
In step 1603, the background server judges whether voiceprint characteristic parameters of the user A already exist in the client, and generates a voice recognition instruction based on a judgment result;
here, the background server performs feature extraction on the voice command to obtain a voiceprint feature parameter corresponding to the user a.
In step 1604, the background server returns the speech recognition instruction to the client;
in step 1605, the client sends a prompt message to the user a for prompting the user a to confirm the voiceprint account;
in step 1606, the user a selects the avatar of the intelligent voice assistant that meets his preferences, and uploads his voiceprint data, avatar identifier, avatar name to the client;
in step 1607, the client sends the voiceprint data, the avatar identifier and the avatar name uploaded by the user a to the background server;
in step 1608, the background server saves the data and establishes an avatar account;
in step 1609, the background server returns the emotion animation package of the avatar corresponding to the avatar account;
here, the background server stores a series of emotional animations of the avatar in advance, such as happy, celebratory, difficult, comforting, lovely, and the like. In the subsequent virtual emotion feedback process, the background server only needs to return the virtual emotion and the emotion animation, and the emotion animation corresponding to the virtual image identifier is directly displayed in the client.
In step 1610, the user a sends a voice control instruction for the emotion animation package to the client;
in step 1611, the client uploads the voice control instruction to the background server;
in step 1612, the background server responds to the voice control instruction, and classifies and stores the emotion animation in the emotion animation package for emotion analysis and recommendation.
The following describes a process of providing an interactive virtual emotion feedback to an avatar. Referring to fig. 18, fig. 18 is a schematic view of another alternative flow chart of a process of providing feedback of virtual emotion of an avatar according to an embodiment of the present invention, including the following steps:
in step 1701, the user a inputs a voice instruction in the client;
in step 1702, the client uploads the acquired voice instruction of the user a to the background server;
in step 1703, the background server identifies the current emotion type of the user a through emotion identification, and generates a virtual emotion ID and an emotion animation ID which are matched with the current emotion type;
here, the background server may recognize the current emotion category of the user a through emotion recognition in the related art, and for virtual emotions, the background server may feed back one-to-one virtual emotion to the client, for example, if it is recognized that the current emotion category of the user a is an open-heart emotion, the virtual emotion returned to the client is open.
In step 1704, the backend server returns the virtual emotion ID, as well as the emotion animation ID, to the client.
Here, the client may recognize a virtual emotion matching the current emotion category of the user a and corresponding to the avatar based on the virtual emotion ID, and recognize an emotional animation corresponding to the avatar based on the emotional animation ID.
The technical scheme of the invention provides the virtual image scheme of the intelligent voice assistant with emotion expression based on the screen equipment such as intelligent televisions, mobile equipment and the like, which not only can ensure that the virtual image of the intelligent voice assistant has a visual effect, but also ensures that each user has the virtual image of the intelligent voice assistant which is exclusive for the user, thereby meeting the individual requirements of the user on the virtual image of the intelligent voice assistant, and increasing the emotion expression of the virtual image, so that the virtual image is more humanized.
The following continues the description of the software implementation of the processing device 455 of the intelligent voice assistant provided by the embodiments of the present invention. Taking the software module included in the memory 450 of the processing device 40 (implemented as a client) of the intelligent voice assistant implementing the processing method of the intelligent voice assistant according to the embodiment of the present invention as an example, the details that are not described in the following functional description of the module can be understood by referring to the above description of the method embodiment on the client side according to the present invention. As shown in fig. 3A, the processing device 455 of the intelligent voice assistant according to the embodiment of the present invention may include:
an obtaining unit 4551, configured to obtain a voice instruction of a corresponding target user; a first sending unit 4552, configured to send the voice instruction, perform feature extraction on the voice instruction to obtain a biometric parameter corresponding to the target user, and perform emotion recognition on the voice instruction to obtain a current emotion category corresponding to the target user; a first receiving unit 4553, configured to receive a returned avatar identifier of the intelligent voice assistant corresponding to the biometric parameter, and a virtual emotion matching with the current emotion category of the target user and corresponding to the avatar identifier; a presenting unit 4554 configured to present the avatar of the intelligent voice assistant indicated by the avatar identification; and the control unit 4555 is configured to, in response to an interaction instruction triggered by the intelligent voice assistant, control the avatar of the intelligent voice assistant to play voice conforming to the interaction instruction in the manner of the virtual emotion.
In some embodiments, the processing device of the intelligent voice assistant further comprises:
a third receiving unit, configured to receive a returned voice recognition instruction before the first receiving unit receives a returned avatar identifier of the intelligent voice assistant corresponding to the biometric parameter, where the voice recognition instruction represents a determination result of whether the biometric parameter is recorded in the client;
a third sending unit, configured to send a first prompt message to the target user to prompt the target user to confirm a biometric account of the target user when the determination result is that the biometric parameter is recorded in the client;
a fourth sending unit, configured to send a second prompt message to the target user to prompt the target user to select the avatar of the intelligent voice assistant when the determination result indicates that the biometric parameter is not recorded in the client;
and the storage unit is used for storing the selected virtual image of the intelligent voice assistant in a database of a server.
In some embodiments, to the extent that the fourth sending unit sends a second prompting message to the target user to prompt the target user to select the avatar of the intelligent voice assistant, the following may be implemented:
presenting an avatar selection interface for selecting an avatar of the intelligent voice assistant in response to a setting request of the avatar of the intelligent voice assistant triggered by the target user based on the second prompt message; and responding to an avatar selection instruction triggered based on the avatar selection interface, and acquiring the avatar of the intelligent voice assistant which accords with the preference of the target user.
In some embodiments, to the extent that the first receiving unit receives the returned avatar identification of the intelligent voice assistant corresponding to the biometric parameters, the following may be implemented:
submitting an avatar acquisition request corresponding to the intelligent voice assistant to a server so that the server queries avatar identifications of the intelligent voice assistant adapted to the biometric parameters in a database based on the biometric parameters of the target users, wherein the database stores corresponding relations between the biometric parameters of a plurality of target users and the avatar identifications; and receiving the virtual image identification of the intelligent voice assistant which is sent by the server and is adapted to the biological characteristic parameters.
In some embodiments, to the extent that the presentation unit presents the avatar of the intelligent voice assistant indicated by the avatar identification, this may be accomplished in the following manner:
determining an avatar corresponding to the intelligent voice assistant based on the indication of the avatar identification; acquiring image resources of the virtual image corresponding to the intelligent voice assistant; presenting a default avatar of an avatar of the intelligent voice assistant based on the avatar resources, the default avatar of the avatar including at least one of: a default skin of the avatar; a default prop for the avatar.
In some embodiments, to the extent that the control unit controls the avatar of the intelligent voice assistant to play the voice conforming to the interaction instruction in the virtual emotion manner in response to the interaction instruction triggered by the intelligent voice assistant, the following may be implemented:
acquiring an interactive instruction triggered by the target user based on the intelligent voice assistant, and sending the interactive instruction to a server, so that when the interactive instruction comprises a voice interactive instruction, the server performs voice recognition on the voice interactive instruction to obtain corresponding text information, performs semantic recognition on the text information, and obtains an intention corresponding to the voice interactive instruction;
and receiving the returned intention of the voice interaction instruction, and controlling the virtual image of the intelligent voice assistant to play the voice conforming to the interaction instruction in a virtual emotion mode based on the control instruction corresponding to the intention of the voice interaction instruction.
In some embodiments, the processing device of the intelligent voice assistant further comprises:
the fourth receiving unit is used for receiving the returned emotion animation corresponding to the virtual image identifier, wherein the emotion animation is generated by the server based on the current emotion category of the target user;
and the presenting unit is also used for presenting the emotion animation corresponding to the virtual image identification when presenting the virtual image of the intelligent voice assistant indicated by the virtual image identification.
In some embodiments, the processing device of the intelligent voice assistant further comprises:
and the fifth sending unit is used for sending the identification of the target user and the avatar of the intelligent voice assistant corresponding to the identification of the target user to a block chain network, so that the node of the block chain network fills the identification of the target user and the avatar of the intelligent voice assistant corresponding to the identification of the target user into a new block, and when the new block is identified in common, the new block is added to the tail of the block chain.
Taking the software module included in the memory 550 of the processing device 50 (implemented as a server) of the intelligent voice assistant for implementing the processing method of the intelligent voice assistant according to the embodiment of the present invention as an example, the details that are not described in the following functional description of the module can be understood by referring to the above description of the server-side method embodiment of the present invention. As shown in FIG. 3B, the processing device 555 of the intelligent voice assistant provided by the embodiment of the present invention may include:
a second receiving unit 5551, configured to receive a voice instruction of a target user sent by a client; an extracting unit 5552, configured to perform feature extraction on the voice instruction, so as to obtain a biometric parameter corresponding to the target user; the recognition unit 5553 is configured to perform emotion recognition on the voice instruction to obtain a current emotion category corresponding to the target user; a determining unit 5554, configured to determine an avatar identifier of the intelligent voice assistant corresponding to the biometric parameter, and a virtual emotion matching with the current emotion category of the target user and corresponding to the avatar identifier; a second sending unit 5555, configured to send the avatar identification and the emotion to the client, so that the client presents the avatar of the intelligent voice assistant indicated by the avatar identification, and in response to an interactive instruction triggered by the intelligent voice assistant, controls the avatar of the intelligent voice assistant to play the voice conforming to the interactive instruction in the manner of the emotion.
The embodiment of the invention also provides a computer-readable storage medium, which stores executable instructions, and the executable instructions are used for realizing the processing method of the intelligent voice assistant provided by the embodiment of the invention when being executed by the processor.
In some embodiments, the computer may include various computing devices including a smart terminal and a server, and the computer-readable storage medium may be, for example, a Memory such as a Ferroelectric Random Access Memory (FRAM), a ROM, a PROM, an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a flash Memory (FlashMemory), a magnetic surface Memory, an optical disk or a Compact Disc Read-Only Memory (CD-ROM), and the like; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A processing method of an intelligent voice assistant is characterized by comprising the following steps:
acquiring a voice instruction corresponding to a target user;
sending the voice instruction to perform feature extraction on the voice instruction to obtain a biological feature parameter corresponding to the target user, and performing emotion recognition on the voice instruction to obtain a current emotion type corresponding to the target user;
receiving returned virtual image identification of the intelligent voice assistant corresponding to the biological characteristic parameters and virtual emotion matched with the current emotion category of the target user and corresponding to the virtual image identification;
presenting an avatar of the intelligent voice assistant indicated by the avatar identification;
and in response to an interactive instruction triggered by the intelligent voice assistant, controlling an avatar of the intelligent voice assistant to play voice conforming to the interactive instruction in a virtual emotion manner.
2. The method of claim 1, wherein the method further comprises:
before the returned virtual image identification of the intelligent voice assistant corresponding to the biological characteristic parameters is received, a returned voice recognition instruction is received, and the voice recognition instruction represents a judgment result of whether the biological characteristic parameters are recorded in a client side;
when the judgment result is that the biological characteristic parameters are recorded in the client, sending a first prompt message to the target user to prompt the target user to confirm the biological characteristic account of the target user;
and when the judgment result is that the biological characteristic parameters are not recorded in the client, sending a second prompt message to the target user to prompt the target user to select the virtual image of the intelligent voice assistant, and storing the selected virtual image of the intelligent voice assistant in a database of a server.
3. The method of claim 2, wherein said sending a second prompting message to the target user to prompt the target user to select the avatar of the intelligent voice assistant comprises:
presenting an avatar selection interface for selecting an avatar of the intelligent voice assistant in response to a setting request of the avatar of the intelligent voice assistant triggered by the target user based on the second prompt message;
and responding to an avatar selection instruction triggered based on the avatar selection interface, and acquiring the avatar of the intelligent voice assistant which accords with the preference of the target user.
4. The method of claim 1, wherein receiving the returned avatar identification of the intelligent voice assistant corresponding to the biometric parameters comprises:
submitting an avatar acquisition request corresponding to the intelligent voice assistant to a server so that
The server inquires the virtual image identification of the intelligent voice assistant which is matched with the biological characteristic parameter in a database based on the biological characteristic parameter of the target user, and the database stores the corresponding relation between the biological characteristic parameter of a plurality of target users and the virtual image identification;
and receiving the virtual image identification of the intelligent voice assistant which is sent by the server and is adapted to the biological characteristic parameters.
5. The method of claim 1 wherein the presenting the avatar of the intelligent voice assistant indicated by the avatar identification comprises:
determining an avatar corresponding to the intelligent voice assistant based on the indication of the avatar identification;
acquiring image resources of the virtual image corresponding to the intelligent voice assistant;
presenting a default avatar of an avatar of the intelligent voice assistant based on the avatar resources, the default avatar of the avatar including at least one of: a default skin of the avatar; a default prop for the avatar.
6. The method of claim 1 wherein controlling the avatar of the intelligent voice assistant to play speech conforming to the interaction instructions in the virtual mood in response to the interaction instructions triggered based on the intelligent voice assistant comprises:
acquiring an interactive instruction triggered by the target user based on the intelligent voice assistant, and sending the interactive instruction to a server so that the target user can interact with the server
When the interactive instruction comprises a voice interactive instruction, the server performs voice recognition on the voice interactive instruction to obtain corresponding text information, and performs semantic recognition on the text information to obtain an intention corresponding to the voice interactive instruction;
and receiving the returned intention of the voice interaction instruction, and controlling the virtual image of the intelligent voice assistant to play the voice conforming to the interaction instruction in a virtual emotion mode based on the control instruction corresponding to the intention of the voice interaction instruction.
7. The method of claim 1, wherein the method further comprises:
receiving returned emotion animations corresponding to the virtual image identifications, wherein the emotion animations are generated by the server based on the current emotion types of the target users;
and when presenting the virtual image of the intelligent voice assistant indicated by the virtual image identification, presenting an emotional animation corresponding to the virtual image identification.
8. A processing method of an intelligent voice assistant is characterized by comprising the following steps:
receiving a voice instruction of a target user sent by a client;
extracting the characteristics of the voice instruction to obtain biological characteristic parameters corresponding to the target user, and performing emotion recognition on the voice instruction to obtain the current emotion type corresponding to the target user;
determining an avatar identification of the intelligent voice assistant corresponding to the biological characteristic parameter and a virtual emotion matched with the current emotion category of the target user and corresponding to the avatar identification;
and sending the avatar identification and the virtual emotion to the client so as to enable the client to present the avatar of the intelligent voice assistant indicated by the avatar identification, and responding to an interaction instruction triggered by the intelligent voice assistant, and controlling the avatar of the intelligent voice assistant to play voice conforming to the interaction instruction in the mode of the virtual emotion.
9. An intelligent voice assistant processing apparatus, the apparatus comprising:
the acquisition unit is used for acquiring a voice instruction corresponding to a target user;
the first sending unit is used for sending the voice command so as to perform feature extraction on the voice command to obtain biological feature parameters corresponding to the target user and perform emotion recognition on the voice command to obtain the current emotion type corresponding to the target user;
the first receiving unit is used for receiving the returned virtual character identification of the intelligent voice assistant corresponding to the biological characteristic parameters and the virtual emotion matched with the current emotion category of the target user and corresponding to the virtual character identification;
a presenting unit for presenting the avatar of the intelligent voice assistant indicated by the avatar identification;
and the control unit is used for responding to an interactive instruction triggered by the intelligent voice assistant and controlling the virtual image of the intelligent voice assistant to play voice conforming to the interactive instruction in a virtual emotion mode.
10. An intelligent voice assistant processing apparatus, the apparatus comprising:
the second receiving unit is used for receiving the voice instruction of the target user sent by the client;
the extraction unit is used for carrying out feature extraction on the voice command to obtain a biological feature parameter corresponding to the target user;
the recognition unit is used for carrying out emotion recognition on the voice command to obtain the current emotion category corresponding to the target user;
the determining unit is used for determining an avatar identification of the intelligent voice assistant corresponding to the biological characteristic parameter and a virtual emotion matched with the current emotion category of the target user and corresponding to the avatar identification;
and the second sending unit is used for sending the avatar identification and the virtual emotion to the client so as to enable the client to present the avatar of the intelligent voice assistant indicated by the avatar identification, and controlling the avatar of the intelligent voice assistant to play voice conforming to the interactive instruction in a virtual emotion mode in response to the interactive instruction triggered by the intelligent voice assistant.
CN202010144535.1A 2020-03-04 2020-03-04 Processing method and device of intelligent voice assistant Pending CN113436622A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010144535.1A CN113436622A (en) 2020-03-04 2020-03-04 Processing method and device of intelligent voice assistant

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010144535.1A CN113436622A (en) 2020-03-04 2020-03-04 Processing method and device of intelligent voice assistant

Publications (1)

Publication Number Publication Date
CN113436622A true CN113436622A (en) 2021-09-24

Family

ID=77752363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010144535.1A Pending CN113436622A (en) 2020-03-04 2020-03-04 Processing method and device of intelligent voice assistant

Country Status (1)

Country Link
CN (1) CN113436622A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900751A (en) * 2021-09-29 2022-01-07 平安普惠企业管理有限公司 Method, device, server and storage medium for synthesizing virtual image
CN114356083A (en) * 2021-12-22 2022-04-15 阿波罗智联(北京)科技有限公司 Virtual personal assistant control method and device, electronic equipment and readable storage medium
CN114385285A (en) * 2021-11-30 2022-04-22 重庆长安汽车股份有限公司 Image creating method based on automobile AI intelligent assistant
CN114974312A (en) * 2022-07-29 2022-08-30 环球数科集团有限公司 Virtual human emotion generation method and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900751A (en) * 2021-09-29 2022-01-07 平安普惠企业管理有限公司 Method, device, server and storage medium for synthesizing virtual image
CN114385285A (en) * 2021-11-30 2022-04-22 重庆长安汽车股份有限公司 Image creating method based on automobile AI intelligent assistant
CN114385285B (en) * 2021-11-30 2024-02-06 重庆长安汽车股份有限公司 Image creation method based on automobile AI intelligent assistant
CN114356083A (en) * 2021-12-22 2022-04-15 阿波罗智联(北京)科技有限公司 Virtual personal assistant control method and device, electronic equipment and readable storage medium
CN114974312A (en) * 2022-07-29 2022-08-30 环球数科集团有限公司 Virtual human emotion generation method and system
CN114974312B (en) * 2022-07-29 2022-10-21 环球数科集团有限公司 Virtual human emotion generation method and system

Similar Documents

Publication Publication Date Title
CN113436622A (en) Processing method and device of intelligent voice assistant
CN111427534B (en) Virtual assistant system capable of actionable messaging
CN108028798B (en) Method, apparatus and computer device for unified messaging platform
CN110767220A (en) Interaction method, device, equipment and storage medium of intelligent voice assistant
CN107632706B (en) Application data processing method and system of multi-modal virtual human
CN116401646B (en) Multi-user configuration
KR20240007888A (en) Electronic device and method for communicating with chatbot
CN110046227B (en) Configuration method, interaction method, device, equipment and storage medium of dialogue system
US20020052913A1 (en) User support apparatus and system using agents
KR20090086805A (en) Self-evolving artificial intelligent cyber robot system
CN111639503B (en) Conference data processing method and device, storage medium and equipment
CN112130874A (en) Method and system for background control panel configuration selection
WO2022205772A1 (en) Method and apparatus for displaying page element of live-streaming room
US10218770B2 (en) Method and system for sharing speech recognition program profiles for an application
CN110688476A (en) Text recommendation method and device based on artificial intelligence
CN110598441B (en) User privacy protection method and device
KR102079979B1 (en) Method for providing service using plurality wake up word in artificial intelligence device, and system thereof
CN108881649B (en) Method and apparatus for providing voice service
CN110597963A (en) Expression question-answer library construction method, expression search method, device and storage medium
CN113134231B (en) Live broadcast processing method and device, electronic equipment and storage medium
US20230396661A1 (en) Systems and methods for sharing content externally from a group-based communication platform
CN110047484A (en) A kind of speech recognition exchange method, system, equipment and storage medium
US11977714B2 (en) Methods and systems for provisioning a collaborative virtual experience
CN111191200B (en) Three-party linkage authentication page display method and device and electronic equipment
CN112995014B (en) Method and device for mass sending of messages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40051405

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination