CN116758908A - Interaction method, device, equipment and storage medium based on artificial intelligence - Google Patents

Interaction method, device, equipment and storage medium based on artificial intelligence Download PDF

Info

Publication number
CN116758908A
CN116758908A CN202311042338.9A CN202311042338A CN116758908A CN 116758908 A CN116758908 A CN 116758908A CN 202311042338 A CN202311042338 A CN 202311042338A CN 116758908 A CN116758908 A CN 116758908A
Authority
CN
China
Prior art keywords
user
interaction
determining
voice
interactive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311042338.9A
Other languages
Chinese (zh)
Other versions
CN116758908B (en
Inventor
顾维玺
廉润泽
马戈
叶鸿儒
王青春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Industrial Internet Research Institute
Original Assignee
China Industrial Internet Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Industrial Internet Research Institute filed Critical China Industrial Internet Research Institute
Priority to CN202311042338.9A priority Critical patent/CN116758908B/en
Publication of CN116758908A publication Critical patent/CN116758908A/en
Application granted granted Critical
Publication of CN116758908B publication Critical patent/CN116758908B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/60Static or dynamic means for assisting the user to position a body part for biometric acquisition
    • G06V40/67Static or dynamic means for assisting the user to position a body part for biometric acquisition by interactive indications to the user
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Social Psychology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention belongs to the technical field of artificial intelligence, and discloses an interaction method, device, equipment and storage medium based on artificial intelligence. The method comprises the steps of determining the user interaction emotion of a user according to a intonation frequency spectrum and a target emotion recognition model in the interaction voice of the user, ensuring the accuracy of determining the user interaction emotion, determining the current user action of the user according to the collected user gesture image, realizing the accurate recognition of the current user action, determining the reply text according to the interaction voice, determining the voice interaction intonation according to the user interaction emotion, and determining the interaction reply action according to the current user action, so that intelligent interaction is carried out, the current emotion of the user can be attached in the interaction process, the action and the sent voice which are currently made by the user can be accurately responded, and the interaction experience of the user is improved.

Description

Interaction method, device, equipment and storage medium based on artificial intelligence
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to an interaction method, apparatus, device, and storage medium based on artificial intelligence.
Background
With rapid development of technology, interactive products of artificial intelligence are also increasing, and users can interact with the interactive products of artificial intelligence in terms of voice and actions. However, in the prior art, the artificial intelligent interaction product is interacted with the user by adopting mechanical voice in the interaction process of the user, and the interaction process can only carry out simple voice interaction with the user, so that the interaction process is boring and rigid, the experience of the user is poor, and the intelligent interaction process cannot be immersively experienced.
Disclosure of Invention
The invention mainly aims to provide an interaction method, device, equipment and storage medium based on artificial intelligence, and aims to solve the technical problem of how to improve the interaction intelligence under an interaction scene of the artificial intelligence so as to improve the interaction experience of a user.
In order to achieve the above object, the present invention provides an artificial intelligence based interaction method, comprising:
when receiving interactive voice of a user, performing intonation spectrum extraction on the interactive voice;
determining the user interaction emotion according to the intonation frequency spectrum and the target emotion recognition model;
performing content recognition according to the interactive voice, determining a reply text, and determining a voice interactive intonation according to the user interactive emotion;
Detecting joint points according to the acquired user gesture images, and determining joint point coordinates of each user joint point;
determining the current user action of the user according to the joint point coordinates of each user joint point, and determining the interactive reply action according to the current user action;
and performing intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action.
Optionally, before determining the user interaction emotion according to the interaction voice and the target emotion recognition model, the method further comprises:
performing frequency spectrum extraction on each sample voice in the sample voice training set to obtain frequency spectrum characteristics of each sample voice;
inputting the spectrum characteristics to a convolutional neural network, and determining space characteristic vectors of the spectrum characteristics;
inputting the spectrum characteristics to a two-way memory network, and determining global characteristic vectors of the spectrum characteristics;
and performing model training according to the global feature vector, the spatial feature vector and the attention mechanism to obtain a target emotion recognition model.
Optionally, the training of the model according to the global feature vector, the spatial feature vector and the attention mechanism to obtain a target emotion recognition model includes:
Inputting the global feature vector and the spatial feature vector to an attention mechanism, and determining an attention training weight;
carrying out normalization processing according to the attention training weight to obtain a target training weight;
performing feature calculation according to the target training weight and the global feature vector to determine target training features;
and inputting the target training characteristics into an initial classification network, and performing network training on the initial classification network to obtain a target emotion recognition model.
Optionally, the detecting the joint point according to the collected user gesture image, determining the coordinate of the joint point of each user joint point includes:
performing feature processing on the acquired user gesture image through a convolutional neural network to obtain a gesture feature map;
performing position prediction on the gesture feature map to obtain a position confidence code and a position affinity code corresponding to the gesture feature map;
detecting a location association vector of the user according to the location confidence code and the location affinity code;
and detecting the affinity vector according to the position association vector, and determining each user node and the node coordinates of each user node.
Optionally, the determining the current user action of the user according to the joint point coordinates of each user joint point includes:
Positioning a target node in a plurality of user nodes, and calculating the distance between each user node and the target node according to the node coordinates of each user node;
normalizing according to the distance between each user node and the target node to obtain the joint distance characteristic;
calculating the joint point angle of each user joint point according to the joint point coordinates of each user joint point, and determining the joint angle characteristics;
and performing action matching according to the joint distance characteristic and the joint angle characteristic, and determining the current user action of the user.
Optionally, the content recognition according to the interactive voice determines a reply text, including:
performing content recognition on the interactive voice to determine a voice interactive text;
determining a current interaction scene according to the voice interaction text;
calling a corresponding interaction knowledge base according to the current interaction scene, and searching a matching text of the voice interaction text in the interaction knowledge base;
and determining a reply text according to the matched text.
Optionally, the intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action includes:
Determining the interactive sex of the user;
determining virtual interactive voice according to the interactive gender;
generating interactive reply voice according to the virtual interactive voice, the reply text and the voice interactive intonation;
and performing intelligent interaction according to the interaction reply voice and the interaction reply action.
In addition, in order to achieve the above object, the present invention also provides an artificial intelligence based interaction device, which includes:
the extraction module is used for extracting intonation frequency spectrum of the interactive voice when the interactive voice of the user is received;
the determining module is used for determining the user interaction emotion according to the intonation frequency spectrum and the target emotion recognition model;
the recognition module is used for carrying out content recognition according to the interactive voice, determining a reply text and determining a voice interactive intonation according to the user interactive emotion;
the detection module is used for detecting joint points according to the acquired user gesture images and determining joint point coordinates of each user joint point;
the determining module is further used for determining the current user action of the user according to the joint point coordinates of each user joint point and determining the interaction reply action according to the current user action;
And the interaction module is used for performing intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action.
In addition, in order to achieve the above object, the present invention also proposes an artificial intelligence based interactive apparatus, comprising: a memory, a processor, and an artificial intelligence based interactive program stored on the memory and executable on the processor, the artificial intelligence based interactive program configured to implement an artificial intelligence based interactive method as described above.
In addition, in order to achieve the above object, the present invention also proposes a storage medium having stored thereon an artificial intelligence based interactive program which, when executed by a processor, implements an artificial intelligence based interactive method as described above.
According to the invention, when the interactive voice of the user is received, the intonation spectrum extraction is carried out on the interactive voice; determining the user interaction emotion according to the intonation frequency spectrum and the target emotion recognition model; performing content recognition according to the interactive voice, determining a reply text, and determining a voice interactive intonation according to the user interactive emotion; detecting joint points according to the acquired user gesture images, and determining joint point coordinates of each user joint point; determining the current user action of the user according to the joint point coordinates of each user joint point, and determining the interactive reply action according to the current user action; and performing intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action. Through the method, the user interaction emotion of the user is determined according to the intonation frequency spectrum and the target emotion recognition model in the interaction voice, the accuracy of determining the user interaction emotion is guaranteed, meanwhile, the current user action of the user is determined according to the collected user gesture image, the accurate recognition of the current user action is realized, the reply text is determined according to the interaction voice, the voice interaction intonation is determined according to the user interaction emotion, the interaction reply action is determined according to the current user action, intelligent interaction is carried out, the current emotion of the user can be attached in the interaction process, the action and the sent voice of the user can be accurately responded, the interaction intelligence is improved, and the interaction experience of the user is improved.
Drawings
FIG. 1 is a schematic diagram of an artificial intelligence based interaction device of a hardware runtime environment in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of a first embodiment of an artificial intelligence based interaction method of the present invention;
FIG. 3 is a flow chart of a second embodiment of an artificial intelligence based interaction method of the present invention;
FIG. 4 is a block diagram of a first embodiment of an artificial intelligence based interaction device of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an interaction device based on artificial intelligence of a hardware running environment according to an embodiment of the present invention.
As shown in fig. 1, the artificial intelligence based interactive apparatus may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Those skilled in the art will appreciate that the architecture shown in FIG. 1 is not limiting of an artificial intelligence based interaction device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in FIG. 1, an operating system, a network communication module, a user interface module, and an artificial intelligence based interactive program may be included in the memory 1005 as one type of storage medium.
In the artificial intelligence based interactive apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the artificial intelligence based interaction device of the present invention may be disposed in the artificial intelligence based interaction device, and the artificial intelligence based interaction device invokes the artificial intelligence based interaction program stored in the memory 1005 through the processor 1001 and executes the artificial intelligence based interaction method provided by the embodiment of the present invention.
The embodiment of the invention provides an interaction method based on artificial intelligence, and referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the interaction method based on artificial intelligence.
The interaction method based on artificial intelligence comprises the following steps:
step S10: and when the interactive voice of the user is received, performing intonation spectrum extraction on the interactive voice.
It should be noted that, the execution body of the embodiment is an interaction device based on artificial intelligence, and the interaction device based on artificial intelligence may be an electronic device such as a personal computer or a server, or may be other controllers and devices capable of implementing the same or similar functions, which is not limited in this embodiment, and in the embodiment and the following embodiments, the interaction method based on artificial intelligence is described by taking the interaction device based on artificial intelligence as an example.
It can be understood that the interactive voice refers to a voice which is sent by a user and interacts with the interaction equipment based on the artificial intelligence, after the interaction equipment based on the artificial intelligence is started, whether the interactive voice sent by the user is received is detected in real time, and when the user is in the interaction range of the interaction equipment based on the artificial intelligence and the voice is detected, the interactive voice sent by the user is determined to be received.
In a specific implementation, the interactive voice is subjected to voice pretreatment by the interactive equipment based on artificial intelligence, and the pretreatment process comprises voice windowing, framing and the like so as to ensure the accuracy of voice recognition and emotion positioning in the follow-up process. And the interaction equipment based on artificial intelligence performs frequency spectrum extraction on the preprocessed interaction voice, and the Mel frequency spectrum extracted from the preprocessed interaction voice is the intonation frequency spectrum.
Step S20: and determining the user interaction emotion according to the intonation frequency spectrum and the target emotion recognition model.
It should be noted that, the target emotion recognition model is a model obtained by training a convolutional neural network, an attention mechanism and a large number of sample voices, and can recognize the emotion of the user contained in the voices. User moods include, but are not limited to, happiness, anxiety, fear, anger, calm, and the like.
It can be understood that, the intonation spectrum of the interactive voice is input to the target emotion recognition model, and the target emotion recognition model can output the emotion of the user when the user currently sends the interactive voice, and the emotion of the user when the user currently sends the interactive voice is the user interactive emotion.
In a specific implementation, in order to ensure that the target emotion recognition model can accurately output the emotion of the user, before determining the interactive emotion of the user according to the interactive voice and the target emotion recognition model, the method further includes: performing frequency spectrum extraction on each sample voice in the sample voice training set to obtain frequency spectrum characteristics of each sample voice; inputting the spectrum characteristics to a convolutional neural network, and determining space characteristic vectors of the spectrum characteristics; inputting the spectrum characteristics to a two-way memory network, and determining global characteristic vectors of the spectrum characteristics; and performing model training according to the global feature vector, the spatial feature vector and the attention mechanism to obtain a target emotion recognition model.
The sample voice training set refers to a data set containing a large number of sample voices marked with user emotion, each sample voice in the sample voice training set is preprocessed, each preprocessed sample voice is subjected to spectrum extraction, and the obtained mel spectrum of each sample voice is the spectrum characteristic of each sample voice.
It can be understood that the spectral features of each sample voice are respectively input into a convolutional neural network and a bidirectional long-short-time memory network, wherein the bidirectional long-short-time memory network is a bidirectional memory network, and the convolutional neural network performs feature extraction on each spectral featureObtaining the space feature vector of each frequency spectrum featureWherein->For the offset +.>Is weight(s)>After each spectrum characteristic is input to the two-way memory network, the forward long-short-term memory network output and the reverse long-short-term memory network output of each spectrum characteristic can be obtained, and the forward long-short-term memory network output and the reverse long-short-term memory network output of each spectrum characteristic are global characteristic vectors>Wherein->For outputting to the positive long-short-term memory network, +.>Is output by the reverse long-short-term memory network.
In a specific implementation, in order to obtain an accurate target emotion recognition model according to a global feature vector, a spatial feature vector and an attention mechanism, further, the training of the model according to the global feature vector, the spatial feature vector and the attention mechanism is performed to obtain the target emotion recognition model, which includes: inputting the global feature vector and the spatial feature vector to an attention mechanism, and determining an attention training weight; carrying out normalization processing according to the attention training weight to obtain a target training weight; performing feature calculation according to the target training weight and the global feature vector to determine target training features; and inputting the target training characteristics into an initial classification network, and performing network training on the initial classification network to obtain a target emotion recognition model.
It should be noted that, feature similarity is calculated from global feature vectors and spatial feature vectorsWherein W is a weight matrix, c is a model bias term obtained in the model training process, and low-rank distribution is used as a parameter matrix +.>Will->And->Performing superposition calculation to obtain corresponding attention training weights, performing normalization processing on the attention training weights to obtain target training weights M, and calculating M and global feature vectors to obtain final target training features->And splicing the target training characteristics to obtain attention expression, and training a model based on the attention expression and the marked user emotion to obtain a trained target emotion recognition model, wherein the target emotion recognition model is built based on a convolutional neural network, a two-way long and short memory network and a multi-head attention mechanism.
Step S30: and carrying out content recognition according to the interactive voice, determining a reply text, and determining a voice interactive intonation according to the user interactive emotion.
It should be noted that, the interactive voice is subjected to content recognition by the interactive device based on artificial intelligence, and is converted into text content corresponding to the interactive voice, and the content recognition may be performed by a hidden markov model method based on a parameter model, or may be performed by other manners, which is not limited in this embodiment.
It can be appreciated that the artificial intelligence based interactive device searches the database for a reply text matching the text content corresponding to the interactive voice, or determines the reply text by a natural language processing method.
In a specific implementation, in order to take care of the emotion of the user, when the user is lost or injured, the user interacts with the voice full of passion or sunlight, and the corresponding voice interaction intonation can be selected in the intonation emotion mapping table according to the user interaction emotion. The tone emotion mapping table has a mapping relation between user emotion and interactive tone, wherein one user emotion corresponds to the interactive tone, for example, when the user emotion is a heart injury, the interactive tone is a high-priced and active tone, when the user emotion is anxiety, the interactive tone is a calm and calm tone, and various interactive tones are obtained by conducting tone simulation and actual combat training in advance based on artificial intelligent interactive equipment.
Step S40: and detecting the joint points according to the acquired user gesture images, and determining the joint point coordinates of each user joint point.
When the interactive voice of the user is received, the target positioning is required according to the sound source of the interactive voice, the current position of the user is determined, and the acquisition direction of the camera is adjusted to the current position of the user by the interactive device based on artificial intelligence, so that the camera can acquire images containing most of the body parts of the user, and the acquired images containing most of the body parts of the user are the gesture images of the user.
It can be understood that a plurality of nodes on the user body are identified in the user gesture image, the plurality of nodes on the user body are the user nodes, two-dimensional image coordinates of the user nodes are determined, and the two-dimensional image coordinates of the user nodes are the node coordinates.
Step S50: and determining the current user action of the user according to the joint point coordinates of each user joint point, and determining the interactive reply action according to the current user action.
It should be noted that, the action formed by the coordinates of the multiple joints is searched in the data storage, the action formed by the multiple joints is the current user action presented by the user, and the interaction reply action corresponding to the current user action is searched in the action mapping table, and each user action in the action mapping table corresponds to one interaction reply action.
Step S60: and performing intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action.
It should be noted that, the interactive device based on artificial intelligence presents interactive reply actions to respond to the current user actions of the user, and deducts the reply text according to the voice interactive intonation, so as to ensure lively and flexible movements in the interactive process and attach to the emotion of the user.
According to the embodiment, when the interactive voice of the user is received, the intonation spectrum extraction is carried out on the interactive voice; determining the user interaction emotion according to the intonation frequency spectrum and the target emotion recognition model; performing content recognition according to the interactive voice, determining a reply text, and determining a voice interactive intonation according to the user interactive emotion; detecting joint points according to the acquired user gesture images, and determining joint point coordinates of each user joint point; determining the current user action of the user according to the joint point coordinates of each user joint point, and determining the interactive reply action according to the current user action; and performing intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action. Through the method, the user interaction emotion of the user is determined according to the intonation frequency spectrum and the target emotion recognition model in the interaction voice, the accuracy of determining the user interaction emotion is guaranteed, meanwhile, the current user action of the user is determined according to the collected user gesture image, the accurate recognition of the current user action is realized, the reply text is determined according to the interaction voice, the voice interaction intonation is determined according to the user interaction emotion, the interaction reply action is determined according to the current user action, intelligent interaction is carried out, the current emotion of the user can be attached in the interaction process, the action and the sent voice of the user can be accurately responded, the interaction intelligence is improved, and the interaction experience of the user is improved.
Referring to fig. 3, fig. 3 is a schematic flow chart of a second embodiment of an interaction method based on artificial intelligence according to the present invention.
Based on the first embodiment, the step S40 in the interaction method based on artificial intelligence in this embodiment includes:
step S41: and carrying out feature processing on the acquired user gesture image through a convolutional neural network to obtain a gesture feature map.
It should be noted that, before the joint point detection is performed on the user gesture image, the image preprocessing needs to be performed on the user gesture image, which includes, but is not limited to, gaussian filtering, image denoising, and other processing procedures.
It can be understood that the preprocessed user gesture image is input to the convolutional neural network, and the convolutional neural network processes the user gesture image, so as to obtain a feature map corresponding to the user gesture image, and the feature map corresponding to the user gesture image is the gesture feature map.
Step S42: and carrying out position prediction on the gesture feature map to obtain a position confidence code and a position affinity code corresponding to the gesture feature map.
The convolutional neural network divides the gesture feature map into two threads, and predicts the confidence map code M and the affinity vector field code N of the user body part through multiple iterations, wherein the confidence map code M is the position confidence code, and the affinity vector field code N is the position affinity code. In this embodiment, the entire prediction process is t stages, The position confidence code obtained in the first stage is +.>The site-directed affinity code is->. The subsequent T-1 stages are respectively combined with the position confidence code and the position affinity code of the previous stage and the gesture feature map +.>Fusion, carrying out the next prediction, +.>,/>And finally outputting a part affinity code N and a confidence mapping code M.
Step S43: and detecting the position association vector of the user according to the position confidence code and the position affinity code.
The association vector field between the body parts of the user is detected through the position confidence code and the position affinity code, and the association vector field between the body parts of the user is the position association vector.
Step S44: and detecting the affinity vector according to the position association vector, and determining each user node and the node coordinates of each user node.
It should be noted that, through the affinity vector field of the user node of confidence analysis, all user nodes and their corresponding node coordinates in the user gesture image are finally determined. The specific process is as follows: for any two user node positions at the time of final detectionAnd->Calculate->And->The degree of association between the user node and the node is calculated by linear integration of the affinity of the user node, and the higher the G value is, the higher the G value is >And->The greater the degree of association between them,wherein->Is->And->Arbitrary value on the line by +.>Sampling and summing to approximate integration, there are: />And finally, the detection of the user node can be finished.
It may be appreciated that, to ensure that the current user action is accurately identified according to the node coordinates of each user node, further, the determining the current user action of the user according to the node coordinates of each user node includes: positioning a target node in a plurality of user nodes, and calculating the distance between each user node and the target node according to the node coordinates of each user node; normalizing according to the distance between each user node and the target node to obtain the joint distance characteristic; calculating the joint point angle of each user joint point according to the joint point coordinates of each user joint point, and determining the joint angle characteristics; and performing action matching according to the joint distance characteristic and the joint angle characteristic, and determining the current user action of the user.
In a specific implementation, the target node refers to a joint point for positioning the trunk of the user, which may be a cervical vertebra node or a lumbar vertebra node in this embodiment, and the distances from the elbow joint, the wrist joint, the ankle joint and the knee joint of the user to the target node X, Y are calculated, and the distances from the elbow joint, the wrist joint, the ankle joint and the knee joint of the user to the target node X, Y are the distances from each user node to the target node, and the distances from each user node to the target node are divided by the euclidean distance from each user node to the target node, so that the obtained result is the joint distance feature.
It should be noted that, the elbow joint, shoulder joint, crotch joint and knee joint among the user's joint points are determined, and the angle is calculated according to the cosine law, specificallyWherein x and y respectively represent the coordinates of the joint points of a certain user, and the calculated angle is the joint angle characteristic.
It can be understood that the joint distance feature and the joint angle feature displayed by the user are used as a group of gesture sequences, the gesture sequences are calculated to be similar to gesture sequences of user actions stored in the data storage library in advance, and the user actions corresponding to the gesture sequences which are most similar to the gesture sequences corresponding to the user are selected in the data storage library to be used as current user actions.
In a specific implementation, in order to ensure accuracy of determining the reply text, further, the determining the reply text according to the content recognition performed by the interactive voice includes: performing content recognition on the interactive voice to determine a voice interactive text; determining a current interaction scene according to the voice interaction text; calling a corresponding interaction knowledge base according to the current interaction scene, and searching a matching text of the voice interaction text in the interaction knowledge base; and determining a reply text according to the matched text.
It should be noted that, the interactive voice is subjected to content recognition by the interactive device based on artificial intelligence, and is converted into text content corresponding to the interactive voice, where the text content corresponding to the interactive voice is the voice interactive text, and the content recognition may be performed by using a hidden markov model method based on a parameter model, or other manners, which is not limited in this embodiment.
It can be understood that keyword extraction is performed on the voice interaction text, the key fields are identified, the content scene that the user wants to interact with is determined based on the key fields, and the content scene that the user wants to interact with is the current interaction scene. The interaction device based on artificial intelligence sets a large number of key fields in advance, and divides a corresponding scene for each key field, but there is a situation that one key field can appear in an outgoing scene, and at this time, a combined field needs to be allocated to the key field, so that the scene is limited. For example, "toothache" may occur in a daily interactive scenario or in a medical interactive scenario, and a combination field "symptom" is assigned to "toothache", and when two key fields of "toothache" and "symptom" occur together, the current interactive scenario is indicated as a medical interactive scenario.
In a specific implementation, in order to ensure the accuracy of the interaction, different interaction scenes correspond to different interaction knowledge bases, and the corresponding interaction knowledge bases are called according to the current interaction scene. For example, when the current interaction scene is a medical interaction scene, a large amount of medical related knowledge needs to be acquired, smoothness and accuracy of interaction with the user are guaranteed, and at the moment, a medical interaction knowledge base corresponding to the medical interaction scene needs to be called, so that sufficient medical knowledge storage is guaranteed.
It should be noted that, searching text content matched with the key field in the voice interaction text in the interaction knowledge base, wherein the text content matched with the key field in the voice interaction text is the matching text, and performing field smoothing based on the matching text, so as to finally generate a reply text which has smooth sentences and contains the matching text.
It can be understood that, besides the above method of determining the current interaction scene and thus calling the corresponding interaction knowledge base, determining the reply text based on the interaction knowledge base, a markov model method may also be used, and the reply text is obtained by inputting the voice interaction text into the markov model.
In a specific implementation, in order to ensure the liveliness of the interaction, further, the performing intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action includes: determining the interactive sex of the user; determining virtual interactive voice according to the interactive gender; generating interactive reply voice according to the virtual interactive voice, the reply text and the voice interactive intonation; and performing intelligent interaction according to the interaction reply voice and the interaction reply action.
It should be noted that, determining the gender corresponding to the user sending the interactive voice according to the interactive voice, where the gender corresponding to the user sending the interactive voice is the interactive gender, performing age detection on the user through the user gesture image, determining the age of the user, and creating a virtual voice with the same gender and similar age according to the interactive gender and the age of the user, where the virtual voice with the same gender and similar age is the virtual interactive voice.
It can be appreciated that the interactive device based on artificial intelligence presents interactive reply actions in response to the current user actions of the user, and deducts reply texts according to voice interactive intonation and virtual interactive human voice, so as to ensure lively and flexible interaction and fit with the emotion of the user in the interactive process.
In the embodiment, the acquired user gesture image is subjected to feature processing through a convolutional neural network to obtain a gesture feature map; performing position prediction on the gesture feature map to obtain a position confidence code and a position affinity code corresponding to the gesture feature map; detecting a location association vector of the user according to the location confidence code and the location affinity code; and detecting the affinity vector according to the position association vector, and determining each user node and the node coordinates of each user node. By the method, the gesture feature map of the user gesture image is determined through the convolutional neural network, the gesture feature map is subjected to position prediction, the position association vector of the user is detected by using the determined position confidence code and the determined position affinity code, and finally accurate detection and positioning of the user joint point can be realized.
In addition, referring to fig. 4, an embodiment of the present invention further provides an interaction device based on artificial intelligence, where the interaction device based on artificial intelligence includes:
and the extraction module 10 is used for extracting the intonation spectrum of the interactive voice when the interactive voice of the user is received.
A determining module 20, configured to determine the emotion of the user interaction according to the intonation spectrum and the target emotion recognition model.
The recognition module 30 is configured to perform content recognition according to the interactive voice, determine a reply text, and determine a voice interactive intonation according to the user interaction emotion.
The detection module 40 is configured to perform joint point detection according to the collected user gesture image, and determine joint point coordinates of each user joint point.
The determining module 20 is further configured to determine a current user action of the user according to the coordinates of the joints of each user joint, and determine an interaction reply action according to the current user action.
And the interaction module 50 is used for performing intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action.
According to the embodiment, when the interactive voice of the user is received, the intonation spectrum extraction is carried out on the interactive voice; determining the user interaction emotion according to the intonation frequency spectrum and the target emotion recognition model; performing content recognition according to the interactive voice, determining a reply text, and determining a voice interactive intonation according to the user interactive emotion; detecting joint points according to the acquired user gesture images, and determining joint point coordinates of each user joint point; determining the current user action of the user according to the joint point coordinates of each user joint point, and determining the interactive reply action according to the current user action; and performing intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action. Through the method, the user interaction emotion of the user is determined according to the intonation frequency spectrum and the target emotion recognition model in the interaction voice, the accuracy of determining the user interaction emotion is guaranteed, meanwhile, the current user action of the user is determined according to the collected user gesture image, the accurate recognition of the current user action is realized, the reply text is determined according to the interaction voice, the voice interaction intonation is determined according to the user interaction emotion, the interaction reply action is determined according to the current user action, intelligent interaction is carried out, the current emotion of the user can be attached in the interaction process, the action and the sent voice of the user can be accurately responded, the interaction intelligence is improved, and the interaction experience of the user is improved.
In an embodiment, the determining module 20 is further configured to perform spectrum extraction on each sample voice in the sample voice training set to obtain a spectrum feature of each sample voice;
inputting the spectrum characteristics to a convolutional neural network, and determining space characteristic vectors of the spectrum characteristics;
inputting the spectrum characteristics to a two-way memory network, and determining global characteristic vectors of the spectrum characteristics;
and performing model training according to the global feature vector, the spatial feature vector and the attention mechanism to obtain a target emotion recognition model.
In an embodiment, the determining module 20 is further configured to input the global feature vector and the spatial feature vector to an attention mechanism, and determine an attention training weight;
carrying out normalization processing according to the attention training weight to obtain a target training weight;
performing feature calculation according to the target training weight and the global feature vector to determine target training features;
and inputting the target training characteristics into an initial classification network, and performing network training on the initial classification network to obtain a target emotion recognition model.
In an embodiment, the detection module 40 is further configured to perform feature processing on the collected gesture image of the user through a convolutional neural network to obtain a gesture feature map;
Performing position prediction on the gesture feature map to obtain a position confidence code and a position affinity code corresponding to the gesture feature map;
detecting a location association vector of the user according to the location confidence code and the location affinity code;
and detecting the affinity vector according to the position association vector, and determining each user node and the node coordinates of each user node.
In an embodiment, the determining module 20 is further configured to perform target node positioning in a plurality of user nodes, and calculate a distance between each user node and the target node according to the node coordinates of each user node;
normalizing according to the distance between each user node and the target node to obtain the joint distance characteristic;
calculating the joint point angle of each user joint point according to the joint point coordinates of each user joint point, and determining the joint angle characteristics;
and performing action matching according to the joint distance characteristic and the joint angle characteristic, and determining the current user action of the user.
In an embodiment, the recognition module 30 is further configured to perform content recognition on the interactive voice to determine a voice interactive text;
Determining a current interaction scene according to the voice interaction text;
calling a corresponding interaction knowledge base according to the current interaction scene, and searching a matching text of the voice interaction text in the interaction knowledge base;
and determining a reply text according to the matched text.
In an embodiment, the interaction module 50 is further configured to determine an interaction sex of the user;
determining virtual interactive voice according to the interactive gender;
generating interactive reply voice according to the virtual interactive voice, the reply text and the voice interactive intonation;
and performing intelligent interaction according to the interaction reply voice and the interaction reply action.
Because the device adopts all the technical schemes of all the embodiments, the device at least has all the beneficial effects brought by the technical schemes of the embodiments, and the description is omitted here.
In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium is stored with an interaction program based on artificial intelligence, and the interaction program based on artificial intelligence realizes the steps of the interaction method based on artificial intelligence.
Because the storage medium adopts all the technical schemes of all the embodiments, the storage medium has at least all the beneficial effects brought by the technical schemes of the embodiments, and the description is omitted here.
It should be noted that the above-described working procedure is merely illustrative, and does not limit the scope of the present invention, and in practical application, a person skilled in the art may select part or all of them according to actual needs to achieve the purpose of the embodiment, which is not limited herein.
In addition, technical details not described in detail in this embodiment may refer to the interaction method based on artificial intelligence provided in any embodiment of the present invention, which is not described herein.
Furthermore, it should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of embodiments, it will be clear to a person skilled in the art that the above embodiment method may be implemented by means of software plus a necessary general hardware platform, but may of course also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. Read Only Memory (ROM)/RAM, magnetic disk, optical disk) and comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. An artificial intelligence based interaction method, comprising:
When receiving interactive voice of a user, performing intonation spectrum extraction on the interactive voice;
determining the user interaction emotion according to the intonation frequency spectrum and the target emotion recognition model;
performing content recognition according to the interactive voice, determining a reply text, and determining a voice interactive intonation according to the user interactive emotion;
detecting joint points according to the acquired user gesture images, and determining joint point coordinates of each user joint point;
determining the current user action of the user according to the joint point coordinates of each user joint point, and determining the interactive reply action according to the current user action;
and performing intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action.
2. The artificial intelligence based interaction method of claim 1, wherein before determining the user interaction emotion from the interaction speech and target emotion recognition model, further comprising:
performing frequency spectrum extraction on each sample voice in the sample voice training set to obtain frequency spectrum characteristics of each sample voice;
inputting the spectrum characteristics to a convolutional neural network, and determining space characteristic vectors of the spectrum characteristics;
inputting the spectrum characteristics to a two-way memory network, and determining global characteristic vectors of the spectrum characteristics;
And performing model training according to the global feature vector, the spatial feature vector and the attention mechanism to obtain a target emotion recognition model.
3. The artificial intelligence based interaction method of claim 2, wherein the model training based on the global feature vector, the spatial feature vector and the attention mechanism to obtain a target emotion recognition model comprises:
inputting the global feature vector and the spatial feature vector to an attention mechanism, and determining an attention training weight;
carrying out normalization processing according to the attention training weight to obtain a target training weight;
performing feature calculation according to the target training weight and the global feature vector to determine target training features;
and inputting the target training characteristics into an initial classification network, and performing network training on the initial classification network to obtain a target emotion recognition model.
4. The artificial intelligence based interaction method of claim 1, wherein the determining the coordinates of the joints of each user joint based on the joint detection performed based on the collected user gesture image comprises:
performing feature processing on the acquired user gesture image through a convolutional neural network to obtain a gesture feature map;
Performing position prediction on the gesture feature map to obtain a position confidence code and a position affinity code corresponding to the gesture feature map;
detecting a location association vector of the user according to the location confidence code and the location affinity code;
and detecting the affinity vector according to the position association vector, and determining each user node and the node coordinates of each user node.
5. The artificial intelligence based interaction method of claim 1, wherein the determining the current user action of the user based on the joint point coordinates of each user joint point comprises:
positioning a target node in a plurality of user nodes, and calculating the distance between each user node and the target node according to the node coordinates of each user node;
normalizing according to the distance between each user node and the target node to obtain the joint distance characteristic;
calculating the joint point angle of each user joint point according to the joint point coordinates of each user joint point, and determining the joint angle characteristics;
and performing action matching according to the joint distance characteristic and the joint angle characteristic, and determining the current user action of the user.
6. The artificial intelligence based interaction method of claim 1, wherein the content recognition from the interactive voice, determining reply text, comprises:
performing content recognition on the interactive voice to determine a voice interactive text;
determining a current interaction scene according to the voice interaction text;
calling a corresponding interaction knowledge base according to the current interaction scene, and searching a matching text of the voice interaction text in the interaction knowledge base;
and determining a reply text according to the matched text.
7. The artificial intelligence based interaction method of claim 1, wherein the intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action comprises:
determining the interactive sex of the user;
determining virtual interactive voice according to the interactive gender;
generating interactive reply voice according to the virtual interactive voice, the reply text and the voice interactive intonation;
and performing intelligent interaction according to the interaction reply voice and the interaction reply action.
8. An artificial intelligence based interaction device, the artificial intelligence based interaction device comprising:
The extraction module is used for extracting intonation frequency spectrum of the interactive voice when the interactive voice of the user is received;
the determining module is used for determining the user interaction emotion according to the intonation frequency spectrum and the target emotion recognition model;
the recognition module is used for carrying out content recognition according to the interactive voice, determining a reply text and determining a voice interactive intonation according to the user interactive emotion;
the detection module is used for detecting joint points according to the acquired user gesture images and determining joint point coordinates of each user joint point;
the determining module is further used for determining the current user action of the user according to the joint point coordinates of each user joint point and determining the interaction reply action according to the current user action;
and the interaction module is used for performing intelligent interaction according to the voice interaction intonation, the reply text and the interaction reply action.
9. An artificial intelligence based interactive device, the device comprising: a memory, a processor, and an artificial intelligence based interactive program stored on the memory and executable on the processor, the artificial intelligence based interactive program configured to implement the artificial intelligence based interactive method of any one of claims 1 to 7.
10. A storage medium having stored thereon an artificial intelligence based interaction program which when executed by a processor implements the artificial intelligence based interaction method of any of claims 1 to 7.
CN202311042338.9A 2023-08-18 2023-08-18 Interaction method, device, equipment and storage medium based on artificial intelligence Active CN116758908B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311042338.9A CN116758908B (en) 2023-08-18 2023-08-18 Interaction method, device, equipment and storage medium based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311042338.9A CN116758908B (en) 2023-08-18 2023-08-18 Interaction method, device, equipment and storage medium based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN116758908A true CN116758908A (en) 2023-09-15
CN116758908B CN116758908B (en) 2023-11-07

Family

ID=87950089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311042338.9A Active CN116758908B (en) 2023-08-18 2023-08-18 Interaction method, device, equipment and storage medium based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN116758908B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108803874A (en) * 2018-05-30 2018-11-13 广东省智能制造研究所 A kind of human-computer behavior exchange method based on machine vision
WO2019057019A1 (en) * 2017-09-20 2019-03-28 阿里巴巴集团控股有限公司 Robot interaction method and device
CN110931002A (en) * 2019-10-12 2020-03-27 平安科技(深圳)有限公司 Human-computer interaction method and device, computer equipment and storage medium
CN112185389A (en) * 2020-09-22 2021-01-05 北京小米松果电子有限公司 Voice generation method and device, storage medium and electronic equipment
CN115328303A (en) * 2022-07-28 2022-11-11 竹间智能科技(上海)有限公司 User interaction method and device, electronic equipment and computer-readable storage medium
CN115543089A (en) * 2022-10-20 2022-12-30 昆明奥智科技有限公司 Virtual human emotion interaction system and method based on five-dimensional emotion model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019057019A1 (en) * 2017-09-20 2019-03-28 阿里巴巴集团控股有限公司 Robot interaction method and device
CN108803874A (en) * 2018-05-30 2018-11-13 广东省智能制造研究所 A kind of human-computer behavior exchange method based on machine vision
CN110931002A (en) * 2019-10-12 2020-03-27 平安科技(深圳)有限公司 Human-computer interaction method and device, computer equipment and storage medium
CN112185389A (en) * 2020-09-22 2021-01-05 北京小米松果电子有限公司 Voice generation method and device, storage medium and electronic equipment
CN115328303A (en) * 2022-07-28 2022-11-11 竹间智能科技(上海)有限公司 User interaction method and device, electronic equipment and computer-readable storage medium
CN115543089A (en) * 2022-10-20 2022-12-30 昆明奥智科技有限公司 Virtual human emotion interaction system and method based on five-dimensional emotion model

Also Published As

Publication number Publication date
CN116758908B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
WO2017112813A1 (en) Multi-lingual virtual personal assistant
WO2019019935A1 (en) Interaction method, interaction terminal, storage medium, and computer device
KR102167760B1 (en) Sign language analysis Algorithm System using Recognition of Sign Language Motion process and motion tracking pre-trained model
CN111967334B (en) Human body intention identification method, system and storage medium
CN113421547B (en) Voice processing method and related equipment
CN111414506B (en) Emotion processing method and device based on artificial intelligence, electronic equipment and storage medium
CN107832720B (en) Information processing method and device based on artificial intelligence
CN110125932B (en) Dialogue interaction method for robot, robot and readable storage medium
CN111401318A (en) Action recognition method and device
CN116740691A (en) Image-based emotion recognition method, device, equipment and storage medium
CN111383138B (en) Restaurant data processing method, device, computer equipment and storage medium
WO2021179703A1 (en) Sign language interpretation method and apparatus, computer device, and storage medium
CN113643789A (en) Method, device and system for generating fitness scheme information
Ryumin et al. Towards automatic recognition of sign language gestures using kinect 2.0
CN111160049B (en) Text translation method, apparatus, machine translation system, and storage medium
Brock et al. Learning three-dimensional skeleton data from sign language video
CN115188074A (en) Interactive physical training evaluation method, device and system and computer equipment
CN111444321B (en) Question answering method, device, electronic equipment and storage medium
CN111310590A (en) Action recognition method and electronic equipment
CN113873297A (en) Method and related device for generating digital character video
CN113658690A (en) Intelligent medical guide method and device, storage medium and electronic equipment
Akinpelu et al. Lightweight deep learning framework for speech emotion recognition
CN116758908B (en) Interaction method, device, equipment and storage medium based on artificial intelligence
CN116257762B (en) Training method of deep learning model and method for controlling mouth shape change of virtual image
CN111914822A (en) Text image labeling method and device, computer readable storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant