CN115641837A - Intelligent robot conversation intention recognition method and system - Google Patents

Intelligent robot conversation intention recognition method and system Download PDF

Info

Publication number
CN115641837A
CN115641837A CN202211652800.2A CN202211652800A CN115641837A CN 115641837 A CN115641837 A CN 115641837A CN 202211652800 A CN202211652800 A CN 202211652800A CN 115641837 A CN115641837 A CN 115641837A
Authority
CN
China
Prior art keywords
voice
emotion
result
content
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211652800.2A
Other languages
Chinese (zh)
Inventor
吴伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ifudata Information Technology Co ltd
Original Assignee
Beijing Ifudata Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ifudata Information Technology Co ltd filed Critical Beijing Ifudata Information Technology Co ltd
Priority to CN202211652800.2A priority Critical patent/CN115641837A/en
Publication of CN115641837A publication Critical patent/CN115641837A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to the field related to voice recognition, and discloses a method and a system for recognizing dialog intentions of an intelligent robot; the semantic prediction module, the object detection module, the voice emotion module and the semantic judgment module are arranged, the purpose that on the basis of semantic analysis and recognition in the prior art, physiological states such as facial micro-expressions of a user are monitored to obtain current emotional changes of the user, voice emotion of the user is analyzed to further determine the current emotional changes of the user, semantic judgment of voice content in combination with the emotional changes is conducted through the emotional states of the user, the accuracy of semantic recognition is improved, and the problem that effective recognition is difficult to achieve when ambiguity of various different scenes exists in the semantic meanings in the prior art can be effectively solved.

Description

Intelligent robot dialogue intention recognition method and system
Technical Field
The invention relates to the field related to voice recognition, in particular to a method and a system for recognizing dialog intentions of an intelligent robot.
Background
With the rapid development of computer technology and robot technology, speech recognition systems and intelligent robot technologies equipped with speech recognition systems are becoming more sophisticated, and robots equipped with speech recognition systems in the prior art can recognize speech contents spoken by users and perform corresponding control or response according to recognition results.
In the prior art, most voice recognition methods are to collect and recognize voice, convert voice information into text information, extract and combine key contents of the text information to judge the semantic meaning, and then execute corresponding response steps or generate feedback response and output according to the judgment result of the semantic meaning.
In the prior art, when a semantic meaning is identified, a situation that key content comprises a plurality of semantic meanings exists, which is often reflected by emotional environments of users, that is, contents expressed by similar semantic meanings under different emotional states of the users are different, and the semantic meaning identification method in the prior art cannot effectively judge the semantic meanings, so that a situation that the semantic meaning identification is wrong occurs.
Disclosure of Invention
The invention aims to provide a method and a system for recognizing dialog intentions of an intelligent robot, which aim to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
an intelligent robotic dialog intent recognition system, comprising:
the semantic prediction module is used for acquiring voice contents, analyzing the semantic meanings through a language neural network algorithm and generating semantic prediction results, wherein the number of the semantic prediction results is multiple, and the semantic prediction results are used for representing information contents expressed by the voice contents;
the object detection module is used for acquiring voice contents, identifying and converting the voice contents into character contents, analyzing the character contents through a language neural network algorithm and generating semantic prediction results, wherein the number of the semantic prediction results is multiple, and the semantic prediction results are used for representing information contents expressed by the voice contents;
the voice emotion module is used for analyzing emotion characteristics of the voice content according to the voice emotion database to generate object voice emotion, the object voice emotion represents emotion change expressed by voice characteristics of an object, and the object voice emotion is used for assisting judgment of the semantic prediction result;
and the semantic judgment module is used for carrying out auxiliary content judgment on the semantic prediction result according to the object speech emotion and the emotion analysis result to generate a speech recognition result, and the speech recognition result is used for representing the information content expressed by the speech content.
As a further scheme of the invention: the voice emotion module comprises:
the resampling unit is used for resampling the voice content according to a preset sampling frequency to generate a voice sampling result, wherein the sampling frequency is determined by the processing efficiency requirement on the voice content and the accuracy requirement on emotional feature analysis, the preset sampling frequency is not greater than the inherent sampling frequency of the voice content, and the signal type of the voice sampling result is a discrete time signal;
the quantization coding unit is used for carrying out hierarchical quantization on the voice sampling result and carrying out binary coding to generate a voice digital signal, and the hierarchical quantization is used for converting a discrete time signal into a digital signal;
the feature analysis unit is used for establishing a spectrogram according to the voice digital signal, and performing feature selection on the voice digital signal according to the spectrogram to generate voice frequency features;
and the emotion judging unit is used for comparing and analyzing the voice frequency characteristics through a preset voice emotion database to generate an emotion analysis result.
As a further scheme of the invention: the voice emotion database comprises a standard emotion database and a personalized emotion database;
the standard emotion database is used for storing preset voice emotion data, and the preset voice emotion data comprises voice frequency features and emotion analysis results corresponding to the voice frequency features;
the personalized emotion database is used for storing user voice emotion data, and the user voice emotion data are used for representing the correspondence between the voice frequency characteristics of the user and the emotion analysis result.
As a further scheme of the invention: still include and judge the feedback module, it includes to judge the feedback module:
the judging request unit is used for outputting a feedback request and receiving a feedback signal, and the feedback signal is used for representing whether the voice recognition result is accurate or not;
and the feedback execution unit is used for responding to the feedback signal, and updating and expanding the personalized voice emotion database according to the voice characteristic frequency and the emotion analysis result if the feedback signal is judged to be accurate.
As a still further scheme of the invention: the object physiological information comprises user facial expression information and user physiological information, the user facial information is used for recording the facial micro-expression changes of a user, the user physiological information is used for representing the physiological state changes of the user, and the physiological state comprises heartbeat blood pressure and the like.
The embodiment of the invention aims to provide an intelligent robot conversation intention identification method, which is characterized by comprising the following steps:
acquiring voice content, identifying and converting the voice content into text content, analyzing the text content through a linguistic neural network algorithm, and generating semantic prediction results, wherein the number of the semantic prediction results is multiple, and the semantic prediction results are used for representing information content expressed by the voice content;
acquiring physiological information of a subject, and performing physiological feature analysis on the subject through an emotional neural network algorithm to generate an emotion analysis result, wherein the emotion analysis result represents emotion changes of the subject expressed through facial micro-expressions and physiological responses, and is used for assisting judgment of the semantic prediction result;
performing emotion feature analysis on the voice content according to a voice emotion database to generate a target voice emotion, wherein the target voice emotion represents emotion change expressed by voice features of a target, and the target voice emotion is used for assisting judgment of the semantic prediction result;
and judging the auxiliary content of the semantic prediction result according to the voice emotion of the object and the emotion analysis result to generate a voice recognition result, wherein the voice recognition result is used for representing the information content expressed by the voice content.
As a further scheme of the invention: the steps of acquiring the physiological information of the object, analyzing the physiological characteristics of the object through an emotional neural network algorithm and generating an emotional analysis result comprise:
resampling the voice content according to a preset sampling frequency to generate a voice sampling result, wherein the sampling frequency is determined by the processing efficiency requirement on the voice content and the accuracy requirement on emotional feature analysis, the preset sampling frequency is not greater than the inherent sampling frequency of the voice content, and the signal type of the voice sampling result is a discrete time signal;
carrying out hierarchical quantization on the voice sampling result, carrying out binary coding, and generating a voice digital signal, wherein the hierarchical quantization is used for converting a discrete time signal into a digital signal;
establishing a spectrogram according to the voice digital signal, and performing feature selection on the voice digital signal according to the spectrogram to generate voice frequency features;
and comparing and analyzing the voice frequency characteristics through a preset voice emotion database to generate an emotion analysis result.
As a further scheme of the invention: the voice emotion database comprises a standard emotion database and a personalized emotion database;
the standard emotion database is used for storing preset voice emotion data, and the preset voice emotion data comprises voice frequency features and emotion analysis results corresponding to the voice frequency features;
the personalized emotion database is used for storing user voice emotion data, and the user voice emotion data are used for representing the correspondence between the voice frequency characteristics of the user and the emotion analysis result.
Compared with the prior art, the invention has the beneficial effects that: the semantic prediction module, the object detection module, the voice emotion module and the semantic judgment module are arranged, so that the physiological states of facial micro-expressions and the like of a user are monitored to obtain the current emotional changes of the user on the basis of semantic analysis and recognition in the prior art, the voice emotion of the user is analyzed to further determine the current emotional changes of the user, semantic judgment of voice content in combination with the emotional changes is carried out on the voice content through the emotional states of the user, the accuracy of semantic recognition is improved, and the problem that effective recognition is difficult to achieve when the semantic meanings have ambiguity in various different scenes in the prior art can be effectively solved.
Drawings
Fig. 1 is a block diagram showing the structure of an intelligent robot dialogue intention recognition system.
Fig. 2 is a block diagram of a speech emotion module in an intelligent robot dialogue intention recognition system.
Fig. 3 is a flow chart of a method for recognizing dialog intentions of an intelligent robot.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Specific embodiments of the present invention are described in detail below with reference to specific examples.
As shown in fig. 1, an intelligent robot dialog intention recognition system according to an embodiment of the present invention includes:
the semantic prediction module 100 is configured to acquire voice content, recognize and convert the voice content into text content, analyze the text content through a linguistic neural network algorithm, and generate a semantic prediction result, where the number of the semantic prediction results is multiple, and the semantic prediction result is used to represent information content expressed by the voice content.
In this embodiment, the semantic prediction module 100 is basically the same as the semantic analysis method adopted by the intelligent robot in the prior art, except that after the speech is acquired and the semantic is recognized by a text-to-text method in the prior art, a semantic prediction result is usually determined and outputted by analysis to represent information expressed by the speech content (the information may be used for generating a dialog feedback response, that is, for the robot to understand the speech content), while the present application generates a plurality of results with higher probability, and then further determines the semantic according to analysis from other angles, because the conventional technology ignores the change of human emotional feeling in the dialog occasion only according to the understanding manner of the text content, and the change of the aspect often causes the change of the semantic content, so the prior art has a certain unavoidable factor that causes recognition errors.
The object detection module 300 is configured to obtain physiological information of an object, perform physiological feature analysis on the object through an emotional neural network algorithm, and generate an emotion analysis result, where the emotion analysis result represents emotion changes expressed by the object through facial microexpression and physiological responses, and the emotion analysis result is used to assist in determining the semantic prediction result.
In this embodiment, the object detection module 300 is used to monitor the user, and may be implemented by a sensor, a camera, or other devices, and when in use, the emotional state of the user is analyzed by the feature changes such as the facial micro-expression of the user, and the partial physiological features (for example, the heartbeat, etc.) of the user, so as to obtain the emotional state of the user.
And the voice emotion module 500 is used for performing emotion feature analysis on the voice content according to the voice emotion database to generate object voice emotion, wherein the object voice emotion represents emotion change expressed by voice features of an object, and the object voice emotion is used for assisting judgment of the semantic prediction result.
In this embodiment, the speech emotion module 500 is used to perform further emotion analysis on the speech content, because the speech uttered by a person in different emotional states is different in frequency, pitch, and other data, and thus by analyzing these data, the emotional state of the user can be further analyzed to some extent, so as to assist the object detection module 300 in further determining the speech content.
And a semantic meaning determining module 700, configured to perform auxiliary content predication on the semantic meaning prediction result according to the target speech emotion and the emotion analysis result, and generate a speech recognition result, where the speech recognition result is used to represent information content expressed by the speech content.
In this embodiment, the voice determination module 700 determines a specific semantic result carried by the voice content by analyzing the physiological response (mainly, the response of the change of the facial feature to the emotion) of the user and the feature of the voice (voice tone, frequency, etc.) based on the foregoing emotional state determination, and specifically, the semantic environment of the meaning expressed by the plurality of semantic prediction results is different, and the emotion of the user is one of the variables representing this environment.
As shown in fig. 2, as another preferred embodiment of the present invention, the speech emotion module 500 includes:
a resampling unit 501, configured to resample the voice content according to a preset sampling frequency, and generate a voice sampling result, where the sampling frequency is determined by a requirement for processing efficiency of the voice content and a requirement for accuracy of emotion feature analysis, the preset sampling frequency is not greater than a natural sampling frequency of the voice content, and a signal type of the voice sampling result is a discrete time signal.
A quantization coding unit 502, configured to perform hierarchical quantization on the voice sampling result and perform binary coding to generate a voice digital signal, where the hierarchical quantization is used to convert a discrete-time signal into a digital signal.
The feature analysis unit 503 is configured to establish a spectrogram according to the voice digital signal, and perform feature selection on the voice digital signal according to the spectrogram to generate a voice frequency feature.
And an emotion judging unit 504, configured to compare and analyze the voice frequency features through a preset voice emotion database, and generate an emotion analysis result.
In this embodiment, the speech emotion module 500 is simply divided functionally, and mainly includes steps of sampling, quantizing, encoding, feature extraction, and emotion judgment, where sampling refers to sampling a sound at a certain sampling frequency according to a requirement, because the sound is continuous (speech content is generated by collection, and therefore has a certain code rate, and at this time, it is based on re-sampling, and therefore the speech content is required to have a higher code rate to meet the requirement of the sampling frequency during re-sampling), and during processing, only limited data can be processed, and therefore the sound needs to be sampled at a certain sampling frequency; the step quantization is to divide the amplitude of the signal sample into several sections, classify the sample value of each section, and give the corresponding quantization value, and then perform binary coding on the quantization value, and then select the characteristics of the quantization value for judging and analyzing emotion.
As another preferred embodiment of the invention, the voice emotion database comprises a standard emotion database and a personalized emotion database;
the standard emotion database is used for storing preset voice emotion data, and the preset voice emotion data comprises voice frequency features and emotion analysis results corresponding to the voice frequency features.
The personalized emotion database is used for storing user voice emotion data, and the user voice emotion data are used for representing the correspondence between the voice frequency characteristics of the user and the emotion analysis result.
In this embodiment, the standard emotion database is a standard comparison database that can adapt to more users on a more balanced standard, which is obtained after a large amount of data is learned, and the personalized emotion database is used for learning and updating according to voice characteristics and the like of the users after the users use the database for a long time, so that the system has a more personalized emotion database that can better adapt to the users.
As another preferred embodiment of the present invention, the present invention further comprises a determination feedback module, wherein the determination feedback module comprises:
and the judgment request unit is used for outputting a feedback request and receiving a feedback signal, and the feedback signal is used for representing whether the voice recognition result is accurate or not.
And the feedback execution unit is used for responding to the feedback signal, and updating and expanding the personalized voice emotion database according to the voice characteristic frequency and the emotion analysis result if the feedback signal is judged to be accurate.
In this embodiment, the determination feedback module may be understood as a system in which a user determines a robot recognition result, and the system is internally matched with related units and steps, the determination request unit may output a voice recognition result (after establishing a feedback request connection with a user-side interaction device), and then receive a feedback signal of the user for the voice recognition result, where the feedback signal is controlled and established by the user, for example, the user uses a tablet device to establish the feedback request connection, the tablet device may pull up a feedback program, and the user selects whether the voice recognition result is accurate through the feedback program, if so, the feedback execution unit may use a voice characteristic frequency and an emotion analysis result related to the voice recognition result to expand a personalized voice emotion database more, and the personalized emotion database is established with a recognition object (user) as a unit, and when the system monitors the same type of voice of the same recognition object again, the system may be used for more accurate auxiliary determination.
As another preferred embodiment of the present invention, the physiological information of the subject includes facial expression information of the user and physiological information of the user, the facial information of the user is used for recording facial micro-expression changes of the user, the physiological information of the user is used for representing physiological state changes of the user, and the physiological state includes heartbeat blood pressure and the like.
In the embodiment, the physiological information of the subject is simply explained, that is, the physiological characteristics of emotion determination, the main content is the change of the micro-expression of the face, the change of the micro-expression of the face can reflect the change of human emotion, and the change of the physiological states such as heartbeat and blood pressure, so that in the prior art, the research on human emotion and the research on the micro-expression and the physiological change and the intelligent neural network for interpretation are more.
As shown in fig. 3, the present invention also provides an intelligent robot dialog intention recognition method, which comprises the following steps:
s200, acquiring voice contents, recognizing and converting the voice contents into character contents, analyzing the character contents through a language neural network algorithm, and generating a plurality of semantic prediction results which are used for representing information contents expressed by the voice contents.
S400, acquiring physiological information of the object, analyzing the physiological characteristics of the object through an emotional neural network algorithm, and generating an emotion analysis result, wherein the emotion analysis result represents emotion changes of the object expressed through facial micro-expressions and physiological reactions, and is used for assisting judgment of the semantic prediction result.
S600, performing emotion feature analysis on the voice content according to the voice emotion database to generate object voice emotion, wherein the object voice emotion represents emotion change expressed by the voice feature of an object, and the object voice emotion is used for assisting judgment of the semantic prediction result.
And S800, judging auxiliary content of the semantic prediction result according to the voice emotion of the object and the emotion analysis result, and generating a voice recognition result, wherein the voice recognition result is used for representing the information content expressed by the voice content.
As another preferred embodiment of the present invention, the step of acquiring physiological information of a subject and performing physiological characteristic analysis on the subject by using an emotional neural network algorithm to generate an emotion analysis result includes:
the method comprises the steps of resampling the voice content according to a preset sampling frequency to generate a voice sampling result, wherein the sampling frequency is determined by the processing efficiency requirement on the voice content and the accuracy requirement on emotional feature analysis, the preset sampling frequency is not greater than the inherent sampling frequency of the voice content, and the signal type of the voice sampling result is a discrete time signal.
And carrying out hierarchical quantization on the voice sampling result, carrying out binary coding, and generating a voice digital signal, wherein the hierarchical quantization is used for converting the discrete time signal into the digital signal.
And establishing a spectrogram according to the voice digital signal, and performing feature selection on the voice digital signal according to the spectrogram to generate voice frequency features.
And comparing and analyzing the voice frequency characteristics through a preset voice emotion database to generate an emotion analysis result.
As another preferred embodiment of the invention, the speech emotion database comprises a standard emotion database and a personalized emotion database;
the standard emotion database is used for storing preset voice emotion data, and the preset voice emotion data comprises voice frequency features and emotion analysis results corresponding to the voice frequency features.
The personalized emotion database is used for storing user voice emotion data, and the user voice emotion data are used for representing the correspondence between the voice frequency characteristics of the user and the emotion analysis result.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct Rambus Dynamic RAM (DRDRAM), and Rambus Dynamic RAM (RDRAM), among others.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (8)

1. An intelligent robotic dialog intent recognition system, comprising:
the semantic prediction module is used for acquiring voice contents, identifying and converting the voice contents into character contents, analyzing the character contents through a language neural network algorithm and generating semantic prediction results, wherein the number of the semantic prediction results is multiple, and the semantic prediction results are used for representing information contents expressed by the voice contents;
the object detection module is used for acquiring physiological information of an object, analyzing physiological characteristics of the object through an emotional neural network algorithm and generating an emotion analysis result, wherein the emotion analysis result represents emotion changes of the object expressed through facial micro-expressions and physiological reactions, and is used for assisting judgment of the semantic prediction result;
the voice emotion module is used for carrying out emotion feature analysis on the voice content according to a voice emotion database to generate object voice emotion, the object voice emotion represents emotion change expressed by voice features of an object, and the object voice emotion is used for assisting judgment of the semantic prediction result;
and the semantic judgment module is used for carrying out auxiliary content judgment on the semantic prediction result according to the object speech emotion and the emotion analysis result to generate a speech recognition result, and the speech recognition result is used for representing the information content expressed by the speech content.
2. The intelligent robotic dialog intent recognition system of claim 1 wherein the speech emotion module comprises:
the resampling unit is used for resampling the voice content according to a preset sampling frequency to generate a voice sampling result, wherein the sampling frequency is determined by the processing efficiency requirement on the voice content and the accuracy requirement on emotional feature analysis, the preset sampling frequency is not greater than the inherent sampling frequency of the voice content, and the signal type of the voice sampling result is a discrete time signal;
the quantization coding unit is used for carrying out hierarchical quantization on the voice sampling result and carrying out binary coding to generate a voice digital signal, and the hierarchical quantization is used for converting a discrete time signal into a digital signal;
the characteristic analysis unit is used for establishing a spectrogram according to the voice digital signal, and performing characteristic selection on the voice digital signal according to the spectrogram to generate voice frequency characteristics;
and the emotion judging unit is used for comparing and analyzing the voice frequency characteristics through a preset voice emotion database to generate an emotion analysis result.
3. The intelligent robot dialogue intention recognition system of claim 2, wherein the voice emotion database comprises a standard emotion database and a personalized emotion database;
the standard emotion database is used for storing preset voice emotion data, and the preset voice emotion data comprises voice frequency features and emotion analysis results corresponding to the voice frequency features;
the personalized emotion database is used for storing user voice emotion data, and the user voice emotion data are used for representing the correspondence between the voice frequency characteristics of the user and the emotion analysis result.
4. The intelligent robotic dialog intent recognition system of claim 3, further comprising a decision feedback module comprising:
the judging request unit is used for outputting a feedback request and receiving a feedback signal, and the feedback signal is used for representing whether the voice recognition result is accurate or not;
and the feedback execution unit is used for responding to the feedback signal, and updating and expanding the personalized voice emotion database according to the voice characteristic frequency and the emotion analysis result if the feedback signal is judged to be accurate.
5. The intelligent robot dialogue intention recognition system of claim 1, wherein the object physiological information comprises user facial expression information and user physiological information, the user facial information is used for recording facial micro-expression changes of a user, the user physiological information is used for representing changes of physiological states of the user, and the physiological states comprise heartbeat blood pressure and the like.
6. An intelligent robot dialogue intention recognition method is characterized by comprising the following steps:
acquiring voice content, identifying and converting the voice content into text content, analyzing the text content through a linguistic neural network algorithm, and generating semantic prediction results, wherein the number of the semantic prediction results is multiple, and the semantic prediction results are used for representing information content expressed by the voice content;
acquiring physiological information of a subject, and performing physiological feature analysis on the subject through an emotional neural network algorithm to generate an emotion analysis result, wherein the emotion analysis result represents emotion changes of the subject expressed through facial micro-expressions and physiological reactions, and is used for assisting judgment of the semantic prediction result;
performing emotion feature analysis on the voice content according to a voice emotion database to generate object voice emotion, wherein the object voice emotion represents emotion change expressed by voice features of an object, and the object voice emotion is used for assisting judgment of the semantic prediction result;
and judging the auxiliary content of the semantic prediction result according to the voice emotion of the object and the emotion analysis result to generate a voice recognition result, wherein the voice recognition result is used for representing the information content expressed by the voice content.
7. The intelligent robot dialogue intention recognition method according to claim 6, wherein the step of acquiring physiological information of a subject and performing physiological feature analysis on the subject through an emotional neural network algorithm to generate an emotion analysis result comprises:
resampling the voice content according to a preset sampling frequency to generate a voice sampling result, wherein the sampling frequency is determined by the processing efficiency requirement on the voice content and the accuracy requirement on emotional feature analysis, the preset sampling frequency is not greater than the inherent sampling frequency of the voice content, and the signal type of the voice sampling result is a discrete time signal;
carrying out hierarchical quantization on the voice sampling result, carrying out binary coding, and generating a voice digital signal, wherein the hierarchical quantization is used for converting a discrete time signal into a digital signal;
establishing a spectrogram according to the voice digital signal, and performing feature selection on the voice digital signal according to the spectrogram to generate voice frequency features;
and comparing and analyzing the voice frequency characteristics through a preset voice emotion database to generate an emotion analysis result.
8. The intelligent robot dialogue intention recognition method of claim 7, wherein the voice emotion database comprises a standard emotion database and a personalized emotion database;
the standard emotion database is used for storing preset voice emotion data, and the preset voice emotion data comprises voice frequency features and emotion analysis results corresponding to the voice frequency features;
the personalized emotion database is used for storing user voice emotion data, and the user voice emotion data are used for representing the correspondence between the voice frequency characteristics of the user and the emotion analysis result.
CN202211652800.2A 2022-12-22 2022-12-22 Intelligent robot conversation intention recognition method and system Pending CN115641837A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211652800.2A CN115641837A (en) 2022-12-22 2022-12-22 Intelligent robot conversation intention recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211652800.2A CN115641837A (en) 2022-12-22 2022-12-22 Intelligent robot conversation intention recognition method and system

Publications (1)

Publication Number Publication Date
CN115641837A true CN115641837A (en) 2023-01-24

Family

ID=84949386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211652800.2A Pending CN115641837A (en) 2022-12-22 2022-12-22 Intelligent robot conversation intention recognition method and system

Country Status (1)

Country Link
CN (1) CN115641837A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036776A (en) * 2014-05-22 2014-09-10 毛峡 Speech emotion identification method applied to mobile terminal
US20160322065A1 (en) * 2015-05-01 2016-11-03 Smartmedical Corp. Personalized instant mood identification method and system
CN107295201A (en) * 2017-08-09 2017-10-24 袁建峰 A kind of intelligent lexical analysis system
EP3291224A1 (en) * 2016-08-30 2018-03-07 Beijing Baidu Netcom Science and Technology Co., Ltd Method and apparatus for inputting information
CN111627444A (en) * 2020-05-22 2020-09-04 江洪华 Chat system based on artificial intelligence
WO2021073646A1 (en) * 2019-10-18 2021-04-22 四川大学华西医院 Method for evaluating emotional characteristics based on language guidance and heart rate response
CN114242045A (en) * 2021-12-20 2022-03-25 山东科技大学 Deep learning method for natural language dialogue system intention
CN114490947A (en) * 2022-02-16 2022-05-13 平安国际智慧城市科技股份有限公司 Dialog service method, device, server and medium based on artificial intelligence
CN114724222A (en) * 2022-04-14 2022-07-08 浙江康旭科技有限公司 AI digital human emotion analysis method based on multiple modes
CN115440196A (en) * 2022-09-23 2022-12-06 深圳通联金融网络科技服务有限公司 Voice recognition method, device, medium and equipment based on user facial expression

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036776A (en) * 2014-05-22 2014-09-10 毛峡 Speech emotion identification method applied to mobile terminal
US20160322065A1 (en) * 2015-05-01 2016-11-03 Smartmedical Corp. Personalized instant mood identification method and system
EP3291224A1 (en) * 2016-08-30 2018-03-07 Beijing Baidu Netcom Science and Technology Co., Ltd Method and apparatus for inputting information
CN107295201A (en) * 2017-08-09 2017-10-24 袁建峰 A kind of intelligent lexical analysis system
WO2021073646A1 (en) * 2019-10-18 2021-04-22 四川大学华西医院 Method for evaluating emotional characteristics based on language guidance and heart rate response
CN111627444A (en) * 2020-05-22 2020-09-04 江洪华 Chat system based on artificial intelligence
CN114242045A (en) * 2021-12-20 2022-03-25 山东科技大学 Deep learning method for natural language dialogue system intention
CN114490947A (en) * 2022-02-16 2022-05-13 平安国际智慧城市科技股份有限公司 Dialog service method, device, server and medium based on artificial intelligence
CN114724222A (en) * 2022-04-14 2022-07-08 浙江康旭科技有限公司 AI digital human emotion analysis method based on multiple modes
CN115440196A (en) * 2022-09-23 2022-12-06 深圳通联金融网络科技服务有限公司 Voice recognition method, device, medium and equipment based on user facial expression

Similar Documents

Publication Publication Date Title
CN111028827B (en) Interaction processing method, device, equipment and storage medium based on emotion recognition
US10008209B1 (en) Computer-implemented systems and methods for speaker recognition using a neural network
JP6731326B2 (en) Voice interaction device and voice interaction method
CN112289323B (en) Voice data processing method and device, computer equipment and storage medium
CN112037799B (en) Voice interrupt processing method and device, computer equipment and storage medium
CN110930989B (en) Speech intention recognition method and device, computer equipment and storage medium
CN111145782B (en) Overlapped speech recognition method, device, computer equipment and storage medium
KR20080086791A (en) Feeling recognition system based on voice
CN113192516A (en) Voice role segmentation method and device, computer equipment and storage medium
US6230129B1 (en) Segment-based similarity method for low complexity speech recognizer
CN114120978A (en) Emotion recognition model training and voice interaction method, device, equipment and medium
US20220108698A1 (en) System and Method for Producing Metadata of an Audio Signal
CN110689881A (en) Speech recognition method, speech recognition device, computer equipment and storage medium
CN110265008A (en) Intelligence pays a return visit method, apparatus, computer equipment and storage medium
CN111209380A (en) Control method and device for conversation robot, computer device and storage medium
CN111223476A (en) Method and device for extracting voice feature vector, computer equipment and storage medium
CN114678014A (en) Intention recognition method, device, computer equipment and computer readable storage medium
CN113571096B (en) Speech emotion classification model training method and device, computer equipment and medium
CN112951215B (en) Voice intelligent customer service answering method and device and computer equipment
US5704004A (en) Apparatus and method for normalizing and categorizing linear prediction code vectors using Bayesian categorization technique
CN110931002B (en) Man-machine interaction method, device, computer equipment and storage medium
CN115641837A (en) Intelligent robot conversation intention recognition method and system
CN115240684A (en) Role recognition method and system for double-person conversation voice information
Dogaru et al. Compact isolated speech recognition on raspberry-pi based on reaction diffusion transform
WO2006003542A1 (en) Interactive dialogue system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230124

RJ01 Rejection of invention patent application after publication