CN112002329B - Physical and mental health monitoring method, equipment and computer readable storage medium - Google Patents

Physical and mental health monitoring method, equipment and computer readable storage medium Download PDF

Info

Publication number
CN112002329B
CN112002329B CN202010925877.7A CN202010925877A CN112002329B CN 112002329 B CN112002329 B CN 112002329B CN 202010925877 A CN202010925877 A CN 202010925877A CN 112002329 B CN112002329 B CN 112002329B
Authority
CN
China
Prior art keywords
text
audio
emotion
physical
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010925877.7A
Other languages
Chinese (zh)
Other versions
CN112002329A (en
Inventor
温馨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen TCL New Technology Co Ltd
Original Assignee
Shenzhen TCL New Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen TCL New Technology Co Ltd filed Critical Shenzhen TCL New Technology Co Ltd
Priority to CN202010925877.7A priority Critical patent/CN112002329B/en
Publication of CN112002329A publication Critical patent/CN112002329A/en
Application granted granted Critical
Publication of CN112002329B publication Critical patent/CN112002329B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/02Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
    • A61B5/021Measuring pressure in heart or blood vessels
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/02Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
    • A61B5/024Detecting, measuring or recording pulse rate or heart rate
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4803Speech analysis specially adapted for diagnostic purposes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/68Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient
    • A61B5/6887Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient mounted on external non-worn devices, e.g. non-medical devices
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/68Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient
    • A61B5/6887Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient mounted on external non-worn devices, e.g. non-medical devices
    • A61B5/6891Furniture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Abstract

The invention discloses a physical and mental health monitoring method, which comprises the following steps: receiving voice information, converting the voice information into text information, and generating a text emotion state according to the text information; extracting audio features of the voice information, and generating an audio emotion state according to the audio features; fusing the text emotion state and the audio emotion state to obtain a voice emotion state; and acquiring biological indexes, and combining the voice emotion states with the biological indexes to generate physical and mental health monitoring information. The invention also discloses a physical and mental health monitoring device, equipment and a computer readable storage medium. The invention realizes the comprehensive monitoring of the physical and mental health of the user under the condition of not adding burden to the user.

Description

Physical and mental health monitoring method, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of health monitoring, and in particular, to a physical and mental health monitoring method, apparatus, device, and computer readable storage medium.
Background
With the improvement of life quality and the acceleration of life rhythm, people pay more attention to their health condition, and the traditional health monitoring method acquires the sign information of a user through some sensors worn with or implanted in the body, but the traditional health monitoring method has additional burden on the user no matter worn with or implanted in the body, and the traditional health monitoring method can only acquire some biological indexes of the user, which cannot play a role in comprehensively monitoring health (such as mental health).
Disclosure of Invention
The invention mainly aims to provide a physical and mental health monitoring method, equipment and a computer readable storage medium, and aims to solve the technical problems that the traditional health monitoring method has extra burden on a user and cannot comprehensively monitor health.
In addition, in order to achieve the above object, the present invention also provides a physical and mental health monitoring method, which includes the following steps:
receiving voice information, converting the voice information into text information, and generating a text emotion state according to the text information;
extracting audio features of the voice information, and generating an audio emotion state according to the audio features;
fusing the text emotion state and the audio emotion state to obtain a voice emotion state;
and acquiring biological indexes, and combining the voice emotion states with the biological indexes to generate physical and mental health monitoring information.
Optionally, the step of generating a text emotional state according to the text information includes:
word segmentation processing is carried out on the text information to obtain a target vocabulary;
obtaining emotion classification results corresponding to a preset voice classification model and a text database associated with each emotion classification result;
calculating the existence proportion of each target word in each text database and the sum of the existence proportion of all target words in each text database;
and taking the emotion classification result associated with the text database with the largest sum of the existing proportions as the text emotion state.
Optionally, the step of generating a text emotional state according to the text information includes:
vectorizing the text information to obtain a text vector;
and inputting the text vector to a preset text emotion sensor to obtain a text emotion state corresponding to the text information.
Optionally, the preset text emotion sensor includes: a closed recurrence model and a logistic regression model;
the step of inputting the text vector to a preset text emotion sensor to obtain a text emotion state corresponding to the text information comprises the following steps:
inputting the text vector to a coding module of the closed recurrence model to obtain a text coding vector;
decoding the text encoding vector through a decoding module of the closed recurrence model to obtain emotion characteristics;
and inputting the emotion characteristics into the logistic regression model to carry out emotion classification processing, and generating a text emotion state.
Optionally, the step of extracting audio features of the speech information and generating an audio emotional state according to the audio features comprises:
extracting the audio features of the voice information, and carrying out vectorization processing on the audio features to obtain audio vectors;
inputting the audio vector to an encoding module of a preset sequence encoding and decoding model to obtain an audio encoding vector;
and decoding the audio coding vector by a decoding module of the preset sequence coding and decoding model to generate an audio emotion state.
Optionally, the step of fusing the text emotional state and the audio emotional state to obtain a speech emotional state includes:
inputting the text emotion state and the audio emotion state into a preset classification model, and sequentially passing through a full connection layer and a logistic regression layer in the preset classification model;
inquiring classification results corresponding to the preset classification model, and acquiring a text probability value and an audio probability value associated with each classification result;
and calculating the sum of the numerical value of the text probability value and the audio probability value associated with each classification result, and taking the classification result with the largest sum of the numerical values as the voice emotion state.
Optionally, after the step of fusing the text emotional state and the audio emotional state to obtain the speech emotional state, the method includes:
determining a target dialogue state according to the text information and the voice emotion state, and searching a target utterance with highest matching degree with the target dialogue state in a preset dialogue database;
scoring the target utterance to obtain a matching score;
and if the matching score is greater than a preset threshold, outputting the target utterance.
Optionally, after the step of scoring the target utterance to obtain a matching score, the method includes:
if the matching score is smaller than or equal to a preset threshold value, inputting the current dialogue state into a preset voice response model to generate response voice;
and outputting the response voice.
In addition, to achieve the above object, the present invention also provides a physical and mental health monitoring apparatus including: the system comprises a memory, a processor and a physical and mental health monitoring program which is stored in the memory and can run on the processor, wherein the physical and mental health monitoring program realizes the steps of the physical and mental health monitoring method when being executed by the processor.
In addition, in order to achieve the above object, the present invention further provides a computer-readable storage medium, on which a physical and mental health monitoring program is stored, which when executed by a processor, implements the steps of the physical and mental health monitoring method as described above.
The embodiment of the invention provides a physical and mental health monitoring method, physical and mental health monitoring equipment and a computer readable storage medium. After receiving voice information generated by a user, the physical and mental health monitoring program converts the voice information into text information, and further generates a corresponding text emotion state according to the text information, further, the physical and mental health monitoring program extracts audio features of the voice information, generates an audio emotion state according to the extracted audio features, then fuses the text emotion state and the audio emotion state to finally obtain the voice emotion state, the obtained voice emotion state can represent psychological health conditions of the user, and finally, the voice emotion state and the biological index are combined to generate and output physical and mental health monitoring information. According to the voice information generated by the user and the biological index of the user, the invention realizes the comprehensive monitoring of the physical and mental health of the user under the condition of not increasing the burden of the user.
Drawings
Fig. 1 is a schematic hardware structure of an implementation manner of a physical and mental health monitoring device according to an embodiment of the present invention;
FIG. 2 is a flow chart of a first embodiment of the physical and mental health monitoring method of the present invention;
FIG. 3 is a schematic diagram of a physical and mental health monitoring process according to a first embodiment of the physical and mental health monitoring method of the present invention;
fig. 4 is a flowchart of a second embodiment of the physical and mental health monitoring method of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In the following description, suffixes such as "module", "component", or "unit" for representing elements are used only for facilitating the description of the present invention, and have no specific meaning per se. Thus, "module," "component," or "unit" may be used in combination.
The physical and mental health monitoring terminal (also called terminal, equipment or terminal equipment) of the embodiment of the invention can be a terminal with information processing function such as a PC, a smart phone, a tablet personal computer, a portable computer and the like, and can also be various patch type sensors capable of acquiring bioelectric signals and equipment with recording function.
As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Optionally, the terminal may also include a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and so on. Among other sensors, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal moves to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the mobile terminal is stationary, and the mobile terminal can be used for recognizing the gesture of the mobile terminal (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, and the like, which are not described herein.
It will be appreciated by those skilled in the art that the terminal structure shown in fig. 1 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 1, an operating system, a network communication module, a user interface module, and a physical and mental health monitoring program may be included in a memory 1005, which is a computer storage medium.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to invoke a physical and mental health monitoring program stored in the memory 1005, which when executed by the processor, implements the operations in the physical and mental health monitoring method provided in the embodiments described below.
Based on the hardware structure of the physical and mental health monitoring equipment, the embodiment of the physical and mental health monitoring method is provided.
Referring to fig. 2, in a first embodiment of the physical and mental health monitoring method of the present invention, the physical and mental health monitoring method includes:
step S10, receiving voice information, converting the voice information into text information, and generating a text emotion state according to the text information.
The physical and mental health monitoring method in the embodiment is applied to a physical and mental health monitoring terminal, wherein the physical and mental health monitoring terminal comprises information processing equipment (hereinafter indicated by a smart phone) such as a personal computer, a smart phone and the like, various patch sensors capable of acquiring bioelectric signals (such as heart rate, blood pressure and the like), and equipment (such as a smart sound box, a smart television with a voice interaction function and the like) provided with a recorder capable of acquiring audio information. The voice information in the embodiment is acquired by a preset audio acquisition unit and is sent to equipment provided with a physical and mental health monitoring program, wherein the preset audio acquisition unit is a component capable of acquiring the audio information, and the component has a recording function.
The embodiment provides a specific application scene, and an existing intelligent television with a voice interaction function is internally provided with an audio acquisition unit, when the audio acquisition unit is in a power-on state, audio information can be acquired in real time, the acquired audio information comprises near-field audio information and far-field audio information, wherein the near-field audio information refers to audio generated by the television, and the far-field audio information refers to audio information far away from the audio acquisition unit (generally 2 to 8 meters). The near-field audio information in the audio information acquired by the audio acquisition unit can be filtered through the existing active noise reduction technology, so that far-field audio information is obtained, voice information in the far-field audio information (namely, the voice information in the embodiment) is further acquired, the voice information is converted into text information through the existing voice-to-text technology, semantic information (comprising fields, intentions, topics and the like) corresponding to the text information is acquired, and the emotion states (namely, the text emotion states in the embodiment) represented by the text information can be determined through the semantic information, wherein the emotion states comprise happiness, heart injury, neutrality, surprise, aversion, fear, non-neutrality and the like. For example, a text message converted from a piece of voice message is "i am happy today", and by acquiring semantic information corresponding to the text message, it can be determined that the emotional state of the text represented by the text message is happy.
Step S20, extracting the audio characteristics of the voice information, and generating an audio emotion state according to the audio characteristics.
It is known that, while converting voice information into text information, the physical and mental health monitoring program may also extract audio features of the voice information, where the extracted audio features include a sound speed, a tone, and a tone color, and through analysis of the audio features, an emotional state represented by the audio features (i.e., an audio emotional state in this embodiment) may be determined, for example, if the extracted audio features are a sound speed, a tone high, and a tone hoarseness, it may be determined how to determine the sound speed or the tone high, and by setting a threshold, when a time distance between two adjacent phonemes in the obtained voice information is greater than the threshold, the sound speed may be determined.
And step S30, fusing the text emotion state and the audio emotion state to obtain a voice emotion state.
It is known that the generated text emotional state and the audio emotional state also need to be fused, so as to generate a final speech emotional state, i.e. the emotional state represented by the speech information. In general, the generated text emotional state and the generated audio emotional state are the same, and when the generated text emotional state and the generated audio emotional state are different, the physical and mental health monitoring program may select the text emotional state or the audio emotional state as a final generated speech emotional state according to priorities of the text emotional state and the audio emotional state.
It is known that, whether it is a text or audio emotional state, the representable emotional states include happy, wounded, neutral, surprised, aversion, fear, non-neutral, etc., and in most cases, the generated text or audio emotional states are the same, and when the generated text or audio emotional states are not the same, the physical and mental health monitoring program will acquire the priorities of the text or audio emotional states to determine to select the text or audio emotional state as the finally generated speech emotional state according to the priorities of the text or audio emotional states. If the priority of the text emotion state is higher than that of the audio emotion state, representing that the emotion represented by the text information is stronger, and the physical and mental health monitoring program selects the text emotion state as the finally generated voice emotion state; if the priority of the audio emotional state is lower than that of the text emotional state, the emotion represented by the representative audio feature is more intense, and the physical and mental health monitoring program selects the audio emotional state as the finally generated voice emotional state. It is known that the text emotional state and the audio emotional state may also be scored, and finally, the text emotional state or the audio emotional state is determined to be selected as the finally generated speech emotional state according to the magnitude of the score.
Step S40, biological indexes are obtained, and the voice emotion states are combined with the biological indexes to generate physical and mental health monitoring information.
As can be appreciated, the physical and mental health monitoring program in this embodiment may further obtain various biological indicators of the user, such as blood pressure and heart rate, through the patch sensor, where the patch sensor is a special type of sensor that can be attached to a chair or a bed, and the sensor has no burden on the user compared with an attached sensor (such as a smart bracelet and a smart watch), and finally, the physical and mental health monitoring program may periodically generate physical and mental health monitoring information formed by combining the voice emotion state and the biological indicators, and the physical and mental health monitoring information may also be sent to the smart phone of the user for the user to view.
Specifically, the step of refining in step S10 includes:
and a1, performing word segmentation processing on the text information to obtain a target vocabulary.
And a step a2 of obtaining emotion classification results corresponding to the preset voice classification model and a text database associated with each emotion classification result.
Step a3, calculating the existence proportion of each target word in each text database and the sum of the existence proportion of all target words in each text database.
And a4, taking the emotion classification result associated with the text database with the largest sum of the existing proportions as the text emotion state.
The text information in this embodiment is text information converted from speech information, and is mostly a complete sentence, it can be understood that, by decomposing the complete sentence, it is easy to obtain a phrase (i.e., a target word in this embodiment) forming the complete sentence, where the phrase forming the complete sentence includes a subject, a predicate, an object, etc. on a grammar structure, it is obvious that the subject has no reference value for generating an emotion represented by a text, and the reference value of the predicate and the object in terms of generating an emotion are different, where the reference value of the object is relatively high, a preset speech classification model (i.e., a model used in the semantic understanding module in fig. 3) determines multiple emotion classification results in modeling, each emotion classification result corresponds to a text database (i.e., a text database associated with each emotion classification result in this embodiment), and most of words in the text database are objects, and words in the text database are split by predetermined complete sentences, all words in the text are split, all words in the text database are stored in a very good-for a state, and the text database is better than the text database is more popular, and the text database is better than the text database is in the state, and the text database is better than the text database is more popular than the text database in the text database, the target vocabulary "true" and the target vocabulary "fun" are obtained, wherein the target vocabulary "true" is a secondary word with a larger proportion (for example, happy and wounded) in the two polarized emotions and a smaller proportion (for example, neutral emotion) in the neutral emotion, the target vocabulary "fun" has a larger relevance to the happy emotion state, as shown in table 1, the figure in the table is the existence proportion of the target vocabulary in the text database associated with each emotion state, and the emotion state with the largest sum of the existence proportion of the true and the fun is happy.
Qi generating Open heart Injury of heart Neutral Surprise (surprise) Aversion to Fear of fear Non-neutral
"true" 0.05 0.05 0.05 0.001 0.05 0.05 0.05 0.005
"fun" 0.003 0.1 0.001 0.01 0.008 0.002 0.001 0.005
TABLE 1
Specifically, the step of refining in step S10 further includes:
and b1, carrying out vectorization processing on the text information to obtain a text vector.
And b2, inputting the text vector to a preset text emotion sensor to obtain a text emotion state corresponding to the text information.
Specifically, the step of refining in the step b2 includes:
and c1, inputting the text vector into a coding module of the closed recurrence model to obtain a text coding vector.
And c2, decoding the text coding vector through a decoding module of the closed recurrence model to obtain emotion characteristics.
And c3, inputting the emotion characteristics into the logistic regression model to perform emotion classification processing, and generating a text emotion state.
In this embodiment, when text information converted from speech information is obtained, semantic information corresponding to the text information is further obtained, and it can be understood that the process of obtaining the semantic information corresponding to the text information is a process of splitting the text information, and split classification labels include fields, intentions, topics and the like. The preset text emotion sensor in this embodiment is a tool capable of generating a text emotion state according to text information, wherein a multi-layer GRU (gated recurrent unit, closed recurrent unit) model (a language model for judging whether sentences are reasonable) is adopted in the tool, the GRU model comprises a coding module and a decoding module, the coding module adopts a bidirectional GRU, inputs as text vectors, outputs as decoding vectors, and the decoding module adopts the bidirectional GRU to combine dot product attention to obtain emotion characteristics, wherein the dot product attention refers to that when a word is predicted or inferred, the dot product attention is used, the word in the text can be judged to have a strong relevance with the word, and then the weighted text vectors are summed to obtain the predicted or inferred word. It can be known that, after inputting the text vector to the coding module of the closed recurrence model, obtaining the text coding vector, then decoding the text coding vector by the decoding module of the closed recurrence model, where multiple obtained emotion features may exist, further, inputting the obtained emotion features to the logistic regression model for emotion classification processing, and finally obtaining the text emotion features.
Specifically, the step of refining in step S20 includes:
and d1, extracting the audio features of the voice information, and carrying out vectorization processing on the audio features to obtain an audio vector.
And d2, inputting the audio vector to a coding module of a preset sequence coding and decoding model to obtain an audio coding vector.
And d3, decoding the audio coding vector through a decoding module of the preset sequence coding and decoding model to generate an audio emotion state.
The preset sequence codec model in this embodiment is a model used in a speech emotion sensor (i.e. the emotion sensing module in fig. 3), and the speech emotion sensor is a tool capable of generating an audio emotion state according to audio features, and the preset sequence codec model also includes an encoding module and a decoding module, and it is known that after the physical and mental health monitoring program extracts the audio features of the speech information, the audio features are vectorized to obtain audio vectors, the obtained audio vectors are input to the encoding module of the preset sequence codec model to obtain audio encoded vectors, and then the audio encoded vectors are decoded by the decoding module of the preset sequence codec model to obtain an audio emotion state.
Specifically, the step of refining S30 in the table includes:
and e1, inputting the text emotion state and the audio emotion state into a preset classification model, and sequentially passing through a full-connection layer and a logistic regression layer in the preset classification model.
And e2, inquiring the classification results corresponding to the preset classification model, and acquiring a text probability value and an audio probability value associated with each classification result.
And e3, calculating the sum of the numerical value of the text probability value and the audio probability value associated with each classification result, and taking the classification result with the largest sum of the numerical values as the voice emotion state.
The preset classification model in this embodiment is a model for processing multiple classification problems, the text emotion state and the audio emotion state obtained in the above embodiment are sequentially input into the preset classification model, and as known, the preset classification model corresponds to multiple classification results, for example, a happy, a careless, a neutral, a surprise, an aversion, a fear, a non-neutral, and the like, the full connection layer and the logistic regression layer in this embodiment belong to the preset classification model, wherein the full connection layer has the function of establishing the association between the input parameters (i.e., the text emotion state and the audio emotion state) and the classification results, and the logistic regression layer has the function of calculating the probability that the input parameters are the same as the classification results, and when the text emotion state and the audio emotion state are sequentially input into the preset classification model, the preset classification model outputs two sets of probability numbers respectively corresponding to the probability sets of the text emotion state under the classification results and the probability sets of the audio emotion state under the classification results, and the sum of all the probability numbers in each set of probability sets equals to 1.
In this embodiment, a specific application scenario is given, assuming that the text emotion state is a and the audio emotion state is b, after inputting a and b into a preset classification model, the obtained output results are shown in table 2 and table 3, where table 2 and table 3 respectively represent classification results of the text emotion state and the audio emotion state, probability values under the same result are added to obtain a sum of probability values, and then the largest number is selected from the sum of all probability values to be used as the output result of the preset classification model (i.e., the speech emotion state in this embodiment).
Results Qi generating Open heart Injury of heart Neutral Surprise (surprise) Aversion to Fear of fear Non-neutral
Probability of 0.02 0.01 0.8 0.01 0.01 0.01 0.13 0.01
TABLE 2
Results Qi generating Open heart Injury of heart Neutral Surprise (surprise) Aversion to Fear of fear Non-neutral
Probability of 0.02 0.01 0.82 0.02 0.01 0.01 0.11 0.01
TABLE 3 Table 3
In this embodiment, after receiving voice information generated by a user, the physical and mental health monitoring program converts the voice information into text information, and further generates a corresponding text emotion state according to the text information, further, the physical and mental health monitoring program also extracts audio features of the voice information, generates an audio emotion state according to the extracted audio features, then fuses the text emotion state and the audio emotion state, finally obtains the voice emotion state, the obtained voice emotion state can represent psychological health condition of the user, and finally, combines the voice emotion state and the biological index through obtaining biological index of the user, generates physical and mental health monitoring information and outputs the physical and mental health monitoring information. According to the voice information generated by the user and the biological index of the user, the invention realizes the comprehensive monitoring of the physical and mental health of the user under the condition of not increasing the burden of the user.
Further, referring to fig. 4, a second embodiment of the physical and mental health monitoring method of the present invention is presented on the basis of the above-described embodiment of the present invention.
This embodiment is a step subsequent to step S30 in the first embodiment, and differs from the above-described embodiment of the present invention in that:
and S50, determining a target dialogue state according to the text information and the voice emotion state, and searching a target utterance with the highest matching degree with the target dialogue state in a preset dialogue database.
It is known that, in this embodiment, the target dialogue state refers to that, for different speech emotion states, the physical and mental health monitoring program will respond differently, where if the speech emotion states are negative emotion (such as a mind, an aversion, etc.), the physical and mental health monitoring program will perform a soothing type reply, and if the speech emotion states are positive emotion (such as a mind), the physical and mental health monitoring program will respond in response, where the difference of the target dialogue states is substantially different from the speech emotion of the user that the user is replied, it is known that the different target dialogue states are matched with different dialogue databases (i.e., preset dialogue databases in this embodiment), a plurality of utterances for replying are stored in the preset dialogue databases, and the reply utterance with the highest matching degree with the target dialogue states (i.e., the target utterances in this embodiment) may be selected in advance, for example, in a product test stage, the research personnel determines the best sentence through a large number of speech replies, and when the best dialogue states are matched with the target dialogue models such as the most-life-state, the best dialogue models are found as the target-state models, and the best dialogue models are found as the best dialogue models, and the best dialogue models are the best matching models are the target dialogue states are the best models, and the best dialogue models are found as the best dialogue models are the best matching models, and the best dialogue models are the best matching models, and the best dialogue states are well-state is the best matching conditions.
And step S60, scoring the target words to obtain matching scores.
It is known that the determined target utterance is not necessarily the best reply utterance, and the receiving degree of each person on the same sentence is different in consideration of different personal differences, so that the target utterance needs to be scored after the target utterance is determined, the scoring basis may be user feedback, and if the scoring (i.e. the matching score) of the target utterance is made smaller than a certain value according to the user feedback, prompt information is output for a program developer to update each conversation in the preset conversation database.
And step S70, outputting the target utterance if the matching score is greater than a preset threshold.
After the voice emotion characteristics are generated, the voice sent by the user can be replied through the equipment with the voice interaction function, so that the physical and mental health of the user can be comprehensively protected. For example, according to the emotional state of the voice, the current emotional state of the user is determined to be angry, in which case the physical and mental health monitoring program will query the preset dialogue database and obtain the dialogue with the highest matching degree with the current dialogue state (i.e. the target utterance in the present embodiment) from the preset dialogue database, for example, a text message converted from voice information is "the day has failed-! According to the method, through obtaining semantic information corresponding to the text information and audio characteristics of the voice information, the voice emotion state corresponding to the voice information can be determined to be a hurt, the target words queried by the physical and mental health monitoring program from the preset dialogue database are "ten years old and the same ship is revised for one hundred years old and the same pillow is revised for one hundred years", the physical and mental health monitoring program can score the target words to obtain matching scores, when the matching scores are larger than a preset threshold value, the target words are output, and can be combined with user feedback, namely, the scoring of the target words can be adjusted according to feedback conditions of users.
Specifically, the steps following step S60 include:
and c1, if the matching score is smaller than or equal to a preset threshold value, inputting the current dialogue state into a preset voice response model to generate response voice.
And c2, outputting the response voice.
It is known that if the score of the target speech is lower than the preset threshold, it indicates that there may not be a matching answer in the preset dialogue database, in which case, the physical and mental health monitoring program may also input the current dialogue state into the preset voice response model to generate the response voice, where the preset voice response model is a machine learning model, and the response voice can be easily obtained by using the existing artificial intelligence technology, but the response voice also has a certain disadvantage that the answer is relatively neutral, and may not play a role in adjusting the physical and mental health of the user, and the output mode of the target speech or the response voice may be played through the smart television.
In the embodiment, after the voice emotion state is generated, the physical and mental health monitoring can also reply the physical and mental health state of the user reflected by the voice emotion state, so that the physical and mental health of the user is comprehensively protected.
In addition, the embodiment of the invention also provides a physical and mental health monitoring device, which comprises:
the text emotion state generation module is used for receiving the voice information, converting the voice information into text information and generating a text emotion state according to the text information;
the extraction module is used for extracting the audio characteristics of the voice information and generating an audio emotion state according to the audio characteristics;
the fusion module is used for fusing the text emotion state and the audio emotion state to obtain a voice emotion state;
the generation module is used for acquiring biological indexes, combining the voice emotion states with the biological indexes and generating physical and mental health monitoring information.
Optionally, the text emotional state generation module includes:
the word segmentation unit is used for carrying out word segmentation processing on the text information to obtain a target word;
the emotion classification result acquisition unit is used for acquiring emotion classification results corresponding to the preset voice classification model and a text database associated with each emotion classification result;
a presence ratio calculation unit for calculating a sum of a presence ratio of each target vocabulary in each text database and a presence ratio of all target vocabulary in each text database;
and the selection unit is used for taking the emotion classification result associated with the text database with the largest sum of the existing proportions as the text emotion state.
Optionally, the text emotional state generation module includes:
the vectorization processing unit is used for vectorizing the text information to obtain a text vector;
the first input unit is used for inputting the text vector to a preset text emotion sensor to obtain a text emotion state corresponding to the text information.
Optionally, the first input unit includes:
the second input unit is used for inputting the text vector to the coding module of the closed recurrence model to obtain a text coding vector;
the first decoding unit is used for decoding the text coding vector through a decoding module of the closed recurrence model to obtain emotion characteristics;
and the third input unit is used for inputting the emotion characteristics into the logistic regression model to perform emotion classification processing and generate a text emotion state.
Optionally, the extraction module includes:
the extraction unit is used for extracting the audio characteristics of the voice information and carrying out vectorization processing on the audio characteristics to obtain audio vectors;
the fourth input unit is used for inputting the audio vector to a coding module of a preset sequence coding and decoding model to obtain an audio coding vector;
and the second decoding unit is used for decoding the audio coding vector through a decoding module of the preset sequence coding and decoding model to generate an audio emotion state.
Optionally, the fusion module includes:
the input unit is used for inputting the text emotion state and the audio emotion state into a preset classification model and sequentially passing through a full-connection layer and a logistic regression layer in the preset classification model;
the query unit is used for querying the classification result corresponding to the preset classification model and acquiring a text probability value and an audio probability value associated with each classification result;
and the computing unit is used for computing the sum of the numerical value of the text probability value and the audio probability value associated with each classification result, and taking the classification result with the largest sum of the numerical values as the voice emotion state.
Optionally, the physical and mental health monitoring device further includes:
the searching unit is used for determining a target dialogue state according to the text information and the voice emotion state and searching a target utterance with highest matching degree with the target dialogue state in a preset dialogue database;
the scoring unit is used for scoring the target words to obtain matching scores;
and the first output unit is used for outputting the target utterance if the matching score is larger than a preset threshold value.
Optionally, the physical and mental health monitoring device further includes:
a fifth input unit, configured to input the current dialogue state to a preset voice response model to generate response voice if the matching score is less than or equal to a preset threshold;
and the second output unit is used for outputting the response voice.
The methods performed by the program modules may refer to various embodiments of the methods according to the present invention, and are not described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above, including several instructions for causing a terminal device (which may be a mobile phone, a computer, a tablet computer, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (9)

1. A method for monitoring physical and mental health, characterized in that the method for monitoring physical and mental health comprises the following steps:
acquiring audio information in real time through an audio acquisition unit, filtering near-field audio information in the audio information through an active noise reduction technology, and acquiring far-field audio information, wherein the audio acquisition unit is arranged in equipment outside a human body;
acquiring voice information in the far-field audio information, converting the voice information into text information, and generating a text emotion state according to the text information;
extracting audio features of the voice information, and generating an audio emotion state according to the audio features;
fusing the text emotion state and the audio emotion state to obtain a voice emotion state;
acquiring biological indexes, and combining the voice emotion states with the biological indexes to generate physical and mental health monitoring information;
wherein the step of generating a text emotional state according to the text information comprises:
word segmentation is carried out on the text information to obtain a target word, and the grammar structure of the target word comprises a subject, a predicate and an object;
acquiring emotion classification results corresponding to a preset voice classification model and text databases associated with each emotion classification result, wherein words in the text databases comprise objects and predicates;
calculating the existence proportion of each target word in each text database and the sum of the existence proportion of all target words in each text database;
and taking the emotion classification result associated with the text database with the largest sum of the existing proportions as the text emotion state.
2. The physical and mental health monitoring method according to claim 1, wherein the step of generating a text emotional state from the text information comprises:
vectorizing the text information to obtain a text vector;
and inputting the text vector to a preset text emotion sensor to obtain a text emotion state corresponding to the text information.
3. The physical and mental health monitoring method according to claim 2, wherein the preset text emotion sensor comprises:
a closed recurrence model and a logistic regression model;
the step of inputting the text vector to a preset text emotion sensor to obtain a text emotion state corresponding to the text information comprises the following steps:
inputting the text vector to a coding module of the closed recurrence model to obtain a text coding vector;
decoding the text encoding vector through a decoding module of the closed recurrence model to obtain emotion characteristics;
and inputting the emotion characteristics into the logistic regression model to carry out emotion classification processing, and generating a text emotion state.
4. The physical and mental health monitoring method according to claim 1, wherein the step of extracting audio features of the voice information and generating an audio emotional state according to the audio features comprises:
extracting the audio features of the voice information, and carrying out vectorization processing on the audio features to obtain audio vectors;
inputting the audio vector to an encoding module of a preset sequence encoding and decoding model to obtain an audio encoding vector;
and decoding the audio coding vector by a decoding module of the preset sequence coding and decoding model to generate an audio emotion state.
5. The physical and mental health monitoring method according to claim 1, wherein the step of fusing the text emotional state and the audio emotional state to obtain a speech emotional state comprises:
inputting the text emotion state and the audio emotion state into a preset classification model, and sequentially passing through a full connection layer and a logistic regression layer in the preset classification model;
inquiring classification results corresponding to the preset classification model, and acquiring a text probability value and an audio probability value associated with each classification result;
and calculating the sum of the numerical value of the text probability value and the audio probability value associated with each classification result, and taking the classification result with the largest sum of the numerical values as the voice emotion state.
6. The physical and mental health monitoring method according to claim 1, wherein after the step of fusing the text emotional state and the audio emotional state to obtain a speech emotional state, the method comprises:
determining a target dialogue state according to the text information and the voice emotion state, and searching a target utterance with highest matching degree with the target dialogue state in a preset dialogue database;
scoring the target utterance to obtain a matching score;
and if the matching score is greater than a preset threshold, outputting the target utterance.
7. The method of physical and mental health monitoring as recited in claim 6, wherein after the step of scoring the target utterance to obtain a matching score, the method comprises:
if the matching score is smaller than or equal to a preset threshold value, inputting the target dialogue state into a preset voice response model to generate response voice;
and outputting the response voice.
8. A physical and mental health monitoring device, characterized in that the physical and mental health monitoring device comprises:
memory, a processor and a physical and mental health monitoring program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the physical and mental health monitoring method according to any one of claims 1 to 7.
9. A computer-readable storage medium, characterized in that a physical and mental health monitoring program is stored on the computer-readable storage medium, which when executed by a processor implements the steps of the physical and mental health monitoring method according to any one of claims 1 to 7.
CN202010925877.7A 2020-09-03 2020-09-03 Physical and mental health monitoring method, equipment and computer readable storage medium Active CN112002329B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010925877.7A CN112002329B (en) 2020-09-03 2020-09-03 Physical and mental health monitoring method, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010925877.7A CN112002329B (en) 2020-09-03 2020-09-03 Physical and mental health monitoring method, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112002329A CN112002329A (en) 2020-11-27
CN112002329B true CN112002329B (en) 2024-04-02

Family

ID=73468790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010925877.7A Active CN112002329B (en) 2020-09-03 2020-09-03 Physical and mental health monitoring method, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112002329B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749895A (en) * 2021-01-12 2021-05-04 深圳前海微众银行股份有限公司 Guest group index management method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120117041A (en) * 2011-04-14 2012-10-24 한국과학기술원 Method and system of synthesizing emotional speech based on personal prosody model and recording medium
CN107714056A (en) * 2017-09-06 2018-02-23 上海斐讯数据通信技术有限公司 A kind of wearable device of intellectual analysis mood and the method for intellectual analysis mood
CN110085221A (en) * 2018-01-26 2019-08-02 上海智臻智能网络科技股份有限公司 Speech emotional exchange method, computer equipment and computer readable storage medium
CN110555204A (en) * 2018-05-31 2019-12-10 北京京东尚科信息技术有限公司 emotion judgment method and device
CN111223498A (en) * 2020-01-10 2020-06-02 平安科技(深圳)有限公司 Intelligent emotion recognition method and device and computer readable storage medium
WO2020135194A1 (en) * 2018-12-26 2020-07-02 深圳Tcl新技术有限公司 Emotion engine technology-based voice interaction method, smart terminal, and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102385858B (en) * 2010-08-31 2013-06-05 国际商业机器公司 Emotional voice synthesis method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120117041A (en) * 2011-04-14 2012-10-24 한국과학기술원 Method and system of synthesizing emotional speech based on personal prosody model and recording medium
CN107714056A (en) * 2017-09-06 2018-02-23 上海斐讯数据通信技术有限公司 A kind of wearable device of intellectual analysis mood and the method for intellectual analysis mood
CN110085221A (en) * 2018-01-26 2019-08-02 上海智臻智能网络科技股份有限公司 Speech emotional exchange method, computer equipment and computer readable storage medium
CN110555204A (en) * 2018-05-31 2019-12-10 北京京东尚科信息技术有限公司 emotion judgment method and device
WO2020135194A1 (en) * 2018-12-26 2020-07-02 深圳Tcl新技术有限公司 Emotion engine technology-based voice interaction method, smart terminal, and storage medium
CN111368609A (en) * 2018-12-26 2020-07-03 深圳Tcl新技术有限公司 Voice interaction method based on emotion engine technology, intelligent terminal and storage medium
CN111223498A (en) * 2020-01-10 2020-06-02 平安科技(深圳)有限公司 Intelligent emotion recognition method and device and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Emotional Voice Conversion Using Multitask Learning with Text-To-Speech;Kim, Tae-Ho and Cho, Sungjae and Choi, Shinkook and Park, Sejik and Lee, Soo-Young;ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);20200809;7774-7778 *
面向情绪分类的情绪词向量学习;杜漫;徐学可;杜慧;伍大勇;刘悦;程学旗;;山东大学学报(理学版);20170614(第07期);56-62+69 *

Also Published As

Publication number Publication date
CN112002329A (en) 2020-11-27

Similar Documents

Publication Publication Date Title
US10977452B2 (en) Multi-lingual virtual personal assistant
US20210081056A1 (en) Vpa with integrated object recognition and facial expression recognition
US20210233521A1 (en) Method for speech recognition based on language adaptivity and related apparatus
US20180018985A1 (en) System and method for detecting repetitive speech
Chandrasekar et al. Automatic speech emotion recognition: A survey
KR102216768B1 (en) System and Method for Analyzing Emotion in Text using Psychological Counseling data
CN113241096B (en) Emotion monitoring device and method
US10783329B2 (en) Method, device and computer readable storage medium for presenting emotion
US11568853B2 (en) Voice recognition method using artificial intelligence and apparatus thereof
CN111696559A (en) Providing emotion management assistance
CN112632242A (en) Intelligent conversation method and device and electronic equipment
US10529333B2 (en) Command processing program, image command processing apparatus, and image command processing method
CN112002329B (en) Physical and mental health monitoring method, equipment and computer readable storage medium
CN114708869A (en) Voice interaction method and device and electric appliance
KR102297480B1 (en) System and method for structured-paraphrasing the unstructured query or request sentence
CN107943299B (en) Emotion presenting method and device, computer equipment and computer readable storage medium
KR20110087742A (en) System and apparatus into talking with the hands for handicapped person, and method therefor
CN110781329A (en) Image searching method and device, terminal equipment and storage medium
Ghorpade et al. ITTS model: speech generation for image captioning using feature extraction for end-to-end synthesis
Formolo et al. Extracting interpersonal stance from vocal signals
Li Application of an Improved LSTM Model to Emotion Recognition
Song et al. Towards realizing sign language to emotional speech conversion by deep learning
Song et al. A deep learning based framework for converting sign language to emotional speech
Abbas Improving Arabic Sign Language to support communication between vehicle drivers and passengers from deaf people
JP7135358B2 (en) Pronunciation learning support system, pronunciation learning support device, pronunciation learning support method, and pronunciation learning support program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant