CN106548788B

CN106548788B - Intelligent emotion determining method and system

Info

Publication number: CN106548788B
Application number: CN201510613689.XA
Authority: CN
Inventors: 刘振虎; 许玲玲
Original assignee: China Mobile Group Shandong Co Ltd
Current assignee: China Mobile Group Shandong Co Ltd
Priority date: 2015-09-23
Filing date: 2015-09-23
Publication date: 2020-01-07
Anticipated expiration: 2035-09-23
Also published as: CN106548788A

Abstract

The embodiment of the invention provides an intelligent emotion determining method and system, which are used for acquiring audio information of a conversation between a person to be detected and a user; determining abnormal emotion audio information segments from the audio information segments forming the audio information, wherein the abnormal emotion audio information segments are audio information segments which contain preset audio information used for representing abnormal emotion of a person to be detected and meet corresponding preset conditions; and when the abnormal emotion audio information segment can be determined, determining that abnormal emotion exists in the person to be detected corresponding to the audio information segment. The method is more objective, and more accurate in analyzing whether the person to be detected has abnormal emotion. The invention relates to the technical field of computers.

Description

Intelligent emotion determining method and system

Technical Field

The invention relates to the technical field of computers, in particular to an intelligent emotion determining method and system.

Background

With the development of society and the advancement of technology, rich material life is brought, and the social competition is also intensified. People inevitably generate negative emotions in violent competition, influence not only work but also personal health, and take customer service personnel answering the call as an example below.

The customer service staff solves the problems for the user by answering the call, the shape and color of the user in long working time are large, the psychological and physical stress is difficult to relieve, and abnormal emotion is easy to generate, so that the working efficiency is possibly reduced, the communication with the user is not smooth, and the like. In the prior art, methods and channels for relieving pressure of customer service staff are still to be developed, and emotion management of the customer service staff is mainly evaluated through a traditional communication mode of face-to-face interview between people and a psychological state in a paper questionnaire form, so that the real psychological dynamics of the staff cannot be reflected really.

In order to solve this problem, the prior art proposes a method for guiding emotion recording analysis, which mainly includes the following steps: reading mood data recorded and issued by customer service personnel through an emotion management platform; acquiring physiological data such as heart rate, body temperature and the like of a user through an intelligent watch; and analyzing the data to obtain whether the customer service staff currently has abnormal emotion.

The emotion recording, analyzing and guiding method in the prior art realizes the analysis and guiding of abnormal emotions, but the abnormal emotion data is acquired by the emotion recording, analyzing and guiding method depending on the recording of customer service personnel and the acquisition information of physiological data by professional equipment, the mood data recorded by the customer service personnel is difficult to avoid over subjective, and the physiological data are different according to different physical conditions of the customer service personnel, so that the abnormal emotions obtained by the analysis are not accurate enough.

Disclosure of Invention

The embodiment of the invention provides an intelligent emotion determining method and system, which are used for solving the problem that abnormal emotion is determined inaccurately in the prior art.

Based on the above problem, an intelligent emotion determining method provided by the embodiment of the present invention includes:

acquiring audio information of a conversation between a person to be detected and a user;

determining abnormal emotional audio information pieces from among the pieces of audio information constituting the audio information,

the abnormal emotion audio information segment is an audio information segment which contains preset audio information used for representing the abnormal emotion of the person to be detected and accords with corresponding preset conditions;

and when the abnormal emotion audio information segment can be determined, determining that abnormal emotion exists in the person to be detected corresponding to the audio information segment.

The embodiment of the invention provides an intelligent emotion determining system, which comprises:

the audio collection module is used for acquiring audio information of the conversation between the person to be detected and the user;

the voice waveform analysis module is used for determining abnormal emotion audio information segments from the audio information segments forming the audio information, wherein the abnormal emotion audio information segments are audio information segments which contain preset audio information used for representing the abnormal emotion of the person to be detected and meet corresponding preset conditions; and when the abnormal emotion audio information segment can be determined, determining that abnormal emotion exists in the person to be detected corresponding to the audio information segment.

The embodiment of the invention has the beneficial effects that:

the embodiment of the invention provides an intelligent emotion determining method and system, which are used for acquiring audio information of a conversation between a person to be detected and a user; determining abnormal emotion audio information segments from the audio information segments forming the audio information, wherein the abnormal emotion audio information segments are audio information segments which contain preset audio information used for representing abnormal emotion of a person to be detected and meet corresponding preset conditions; and when the abnormal emotion audio information segment can be determined, determining that abnormal emotion exists in the person to be detected corresponding to the audio information segment. According to the intelligent emotion determining method provided by the embodiment of the invention, whether the abnormal emotion exists in the person to be detected is determined by acquiring the audio information of the call of the person to be detected and analyzing the audio information, and the current emotion can be reflected by the audio information objectively because the language of the person always reflects the current emotion, so that compared with the prior art that the abnormal emotion data is acquired by depending on the input of a customer service person and the acquisition information of physiological data by professional equipment, the method is more objective, and whether the abnormal emotion exists in the person to be detected is more accurate by analyzing. In addition, in the prior art, the acquisition of abnormal emotion data needs to be completed by assistance of professional external equipment, so that the cost is increased.

Drawings

Fig. 1 is a flowchart of an intelligent emotion determining method according to an embodiment of the present invention;

fig. 2 is a flowchart of an intelligent emotion determining method provided in embodiment 1 of the present invention;

fig. 3 is a flowchart of an intelligent emotion guidance method provided in embodiment 2 of the present invention;

fig. 4 is a flowchart illustrating the working of the intelligent communication module, the communication injection module, and the sensitive language collection module according to the embodiment of the present invention;

fig. 5 is a schematic structural diagram of one of the intelligent emotion determining systems provided by the embodiments of the present invention;

fig. 6 is a schematic structural diagram of a second intelligent emotion determining system according to an embodiment of the present invention.

Detailed Description

The invention provides an intelligent emotion determining method and system, and the following description is made in conjunction with the accompanying drawings of the specification to illustrate and explain the preferred embodiments of the invention. And the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

An embodiment of the present invention provides an intelligent emotion determining method, as shown in fig. 1, including:

s101, audio information of the conversation between the person to be detected and the user is obtained.

S102, determining abnormal emotion audio information segments from the audio information segments forming the audio information obtained in S101,

the abnormal emotion audio information segment is an audio information segment which contains preset audio information used for representing abnormal emotion of the person to be detected and accords with corresponding preset conditions.

S103, when the abnormal emotion audio information segment can be determined, determining that abnormal emotion exists in the person to be detected corresponding to the audio information segment.

The method and the related equipment provided by the invention are described in detail by using specific embodiments in the following combined with the attached drawings.

Example 1:

in embodiment 1 of the present invention, an intelligent emotion determining method is provided, as shown in fig. 2, which specifically includes the following steps:

s201, obtaining audio information of the conversation between the person to be detected and the user.

Taking a customer service person answering a call as an example, the customer service person is taken as a person to be detected, the audio input of a terminal seat microphone of the customer service person can be acquired in real time as the acquired audio information of the call between the person to be detected and a user, and a voice file containing the audio information is obtained.

Further, the embodiment of the invention realizes an intelligent emotion determining system, the system can be composed of a plurality of functional modules, and the execution subject of the step can be an audio collecting module in the system.

S202, converting the audio information acquired in the S201 into a structured audio text file.

In this step, a text index can be established by the existing voice transcription mode, and the unstructured voice file is converted into structured text information, i.e. an audio text file, so that a foundation is laid for the subsequent analysis and processing of the audio information.

Further, for the intelligent emotion determining system provided by the embodiment of the present invention, the execution subject of this step may be an audio collection module. The subsequent step of analyzing the audio information can be executed by a voice waveform analysis module in the system, so that the audio collection module can send the audio text file generated in the step to the voice waveform analysis module, and can also send information such as employee job number, login time, working accumulated time, historical emotion level and the like corresponding to the personnel to be detected to the voice waveform analysis module, so that the voice waveform analysis module can be used for analyzing the audio information.

Furthermore, as the audio information of the person to be detected is analyzed, and different dialects exist in different places, the embodiment of the invention also considers the problem of local accents, and when the intelligent emotion determining method provided by the embodiment of the invention is actually used, in order to enable the audio information to be converted into a structured audio text file more accurately, adaptation needs to be carried out by combining the local accents in various areas to optimize an acoustic model, so that the acoustic model can widely cover the local accents, and voice model optimization needs to be carried out by combining business knowledge and hotline service range to improve the accuracy rate of voice transcription.

S203, analyzing the audio text file obtained in S202, determining preset audio information contained in each audio text segment corresponding to each audio information segment in the audio text file,

the preset audio information comprises at least one of the following information: keyword information, emotion detection information, and mute duration information.

Further, for the intelligent emotion determining system provided by the embodiment of the present invention, the execution subject of this step may be a speech waveform analysis module. The voice waveform analysis module can detect the variation amplitude of fundamental frequency, pitch and the like in certain audio output, provide prediction of possible emotional fluctuation and position the emotional fluctuation audio in the position information of the whole voice; the change of the speech rate, the mute time, etc. are detected and analyzed.

In a specific implementation, the information may be generated into an index file in an XML format, and the index file may include one or more of the following information:

the voice of the to-be-detected person and the voice of the user of the two parties of the call are contained in the audio text file;

short-time speech rate information, namely the speech rate exceeding the average speed in a certain audio text segment of both parties of a call;

the speech endpoint information and the average speech rate information of the call, that is, the starting and ending time and the speech rate information (the unit can be word/second) of each sentence spoken by both parties of the call in any audio text segment, and the average speech rate information (the unit can be word/second) of both parties of the call in any audio text segment;

information of highest amplitude and/or frequency of occurrence of highest amplitude in the audio text segment;

the tone information in the audio text segment can be information of the highest frequency and the maximum volume;

the base frequency information in the audio text segment.

Furthermore, each audio information segment, that is, the audio text segment, constituting the audio information may be divided according to actual needs, and an audio segment with a set duration may be used as one audio information segment, or one call may be used as one audio information segment.

After the index file is generated, the preset audio information contained in each audio text segment can be determined from the index file. That is, the voice waveform analysis module may retrieve from the index file for preset audio information (one or more of keyword information, emotion detection information, and long-term silence information) that needs to be analyzed and retrieved, and return audio information of interest that may have abnormal emotion.

The specific implementation may perform the following steps:

step one, when preset audio information contains keyword information, aiming at each audio information segment in each audio information segment, comparing an audio text segment corresponding to the audio information segment with a preset keyword; determining preset keywords contained in the audio text segment, and the starting time and the ending time of the appearance of the preset keywords;

in the first step, the preset keywords can be words, phrases and the like, and words representing abnormal emotions can be set as the preset keywords according to actual needs. When a plurality of keywords are set, a keyword list can be generated, and each audio text segment is compared with the keywords in the list to obtain an audio text segment list containing any one or more keywords in the list, and the starting time and the ending time of each keyword appearing in the audio text segment.

Step two, when the preset audio information contains emotion detection information, determining an index value of one or more indexes which characterize the emotion detection index in the audio text segment corresponding to each audio information segment according to each audio information segment in the audio information segments: speech rate information, amplitude information, frequency information, volume information and fundamental frequency information of both parties of the communication; determining the index of which the index value in the audio text segment reaches the corresponding index threshold value and the starting time and the ending time which occur in the audio text segment when the index value of the index reaches the corresponding index threshold value;

in the second step, the emotion detection index can be characterized by one or more of the following indexes: the speech rate information, the amplitude information, the frequency information, the volume information, the fundamental frequency information and the like of both parties of the call are obtained, each index has a corresponding index threshold value, and each index threshold value can be set according to the historical audio data of the person to be detected, namely the historical audio data is used as an experience condition to set an alarm threshold. Then, the emotion safety range of the audio frequency of the person to be detected under the non-abnormal emotion can be represented through the index threshold (the fundamental frequency can refer to the basic tone of the sound, the fundamental frequency comparison can refer to the change comparison of two sections of basic tones of the voice, the change duration can refer to the total duration of the fundamental frequency when the fundamental frequency changes, for example, the fundamental frequency continues for 10 minutes at a time, the fundamental frequency changes for once after 2 hours, and the fundamental frequency continues for 15 minutes), then the index values of all indexes of all audio text sections in the index file are compared with the corresponding index threshold to determine whether all the audio text sections are in the emotion safety range, and if the index values exceed the emotion safety range, the emotion is considered to be abnormal.

Step three, when the preset audio information comprises mute duration information, aiming at each audio information segment in each audio information segment, determining the mute duration contained in the audio corresponding to the to-be-detected person in the audio text segment according to the start time, the end time and the duration of the audio corresponding to each sentence of both parties in the audio text segment corresponding to the audio information segment; and determining the mute time length contained in the audio text segment and conforming to the preset time length, and the starting time and the ending time of the occurrence of the contained mute time length.

Further, in this step, the preset audio information includes one or more of keyword information, emotion detection information, and mute duration information, and the index representing the emotion detection index may be one or more of speech rate information, amplitude information, frequency information, volume information, and fundamental frequency information of both parties of the call, and thus, when the audio information of abnormal emotion is retrieved, the above information may be logically combined as different retrieval conditions according to actual needs, for example: audio information including the keyword a and not including the abnormal emotion of the keyword B, and the like.

S204, determining abnormal emotion audio information segments from the audio information segments forming the audio information,

the abnormal emotion audio information segment is an audio information segment which contains preset audio information used for representing the abnormal emotion of the person to be detected and accords with corresponding preset conditions.

Further, according to the result in S203, in this step, the abnormal emotion audio information segment may be determined by the speech waveform analysis module. The preset audio information used for representing the abnormal emotion of the person to be detected can be the preset audio information.

This step can be embodied as:

determining audio information segments containing preset keywords from the audio information segments; and/or

Determining audio information segments when index values of preset indexes representing emotion detection indexes reach corresponding index threshold values from the audio information segments; and/or

And determining the audio information sections with the mute time length according with the preset time length from the audio information sections.

S205, when the abnormal emotion audio information segment can be determined, determining that abnormal emotion exists in the person to be detected corresponding to the audio information segment.

Example 2:

correspondingly to embodiment 1, embodiment 2 of the present invention provides an intelligent emotion guidance method, in embodiment 1, when it is determined that a person to be detected has an abnormal emotion, in the prior art, the abnormal emotion of the person to be detected is guided and analyzed by sending a picture of a smile or a joke to the person to be detected, so that a source of the abnormal emotion of the person to be detected cannot be obtained fundamentally, and deep interactive guidance cannot be performed on the person to be detected. Therefore, embodiment 2 of the present invention provides an intelligent emotion guidance method, which performs deep interactive guidance on a person to be detected in an intelligent interactive manner when it is determined that the person to be detected has abnormal emotion, so as to find the source of the abnormal emotion of the person to be detected in time and well alleviate the abnormal emotion of the person to be detected.

The intelligent emotion guidance method provided by embodiment 2 of the present invention, as shown in fig. 3, includes the following steps:

s301, audio information of the conversation between the person to be detected and the user is obtained.

S302, converting the audio information acquired in S301 into a structured audio text file.

S303, analyzing the audio text file obtained in the S302, determining preset audio information contained in each audio text segment respectively corresponding to each audio information segment in the audio text file,

the preset audio information comprises at least one of the following information: keyword information, emotion detection information and mute duration information;

s304, determining abnormal emotion audio information segments from the audio information segments forming the audio information,

Further, the detailed implementation of step S301 to step S304 can be seen in step S201 to step S204 in example 1.

S305, determining the credibility corresponding to each abnormal emotion audio information segment,

when the abnormal emotion audio information segment is in the abnormal emotion audio information segment, the more indexes of which corresponding index values accord with preset conditions, the higher the corresponding credibility is, and the preset conditions comprise at least two conditions as follows: the method comprises the steps that preset keywords are included, the mute duration accords with the preset duration, the speech speed information of both parties of a call reaches a corresponding index threshold, the amplitude information reaches a corresponding index threshold, the frequency information reaches a corresponding index threshold, the volume information reaches a corresponding index threshold, and the fundamental frequency information reaches a corresponding index threshold.

In this step, the credibility corresponding to the abnormal emotion audio information segment may be the credibility of the abnormal emotion audio information segment representing the abnormal emotion, and then the more the satisfied conditions are among the conditions set for determining the abnormal emotion audio information segment, the higher the credibility of the audio information segment as the abnormal emotion audio information segment is.

Further, for the intelligent emotion determining system provided in the embodiment of the present invention, an execution subject of this step may be an early warning pushing module. After the voice waveform analysis module determines the abnormal emotion audio information segment, the audio text segment corresponding to the determined abnormal emotion audio information segment can be sent to the early warning pushing module, and the early warning pushing module determines the reliability of each received audio text segment.

And S306, determining the optimal abnormal emotion audio information segment from the abnormal emotion audio information segments on the basis of the corresponding credibility of each abnormal emotion audio information segment.

Further, for the intelligent emotion determining system provided in the embodiment of the present invention, an execution subject of this step may be an early warning pushing module. The early warning pushing module can determine the optimal abnormal emotion audio information segment according to the audio text segment corresponding to the received abnormal emotion audio information segment, so that the content intelligently communicated with the person to be detected can be determined according to the optimal abnormal emotion audio information segment.

This step can be embodied as the following steps:

judging whether one abnormal emotion audio information segment exists in each abnormal emotion audio information segment or not so that the credibility of the abnormal emotion audio information segment is the highest;

if the abnormal emotion audio information segment exists, determining the abnormal emotion audio information segment as an optimal abnormal emotion audio information segment;

otherwise, if a plurality of abnormal emotion audio information segments exist to enable the credibility of the abnormal emotion audio information segments to be equal and the highest, judging whether one abnormal emotion audio information segment exists in the abnormal emotion audio information segments to enable the accumulated working time corresponding to the occurrence time of the abnormal emotion audio information segment to be the longest;

otherwise, if a plurality of abnormal emotion audio information segments exist, so that the corresponding accumulated working time of the abnormal emotion audio information segments is equal and longest, judging the abnormal emotion audio information segment with the worst emotion represented by the historical emotion level in the corresponding time segment range in the occurrence time of the abnormal emotion audio information segments;

and determining the abnormal emotion audio information segment as an optimal abnormal emotion audio information segment.

Further, when determining the optimal abnormal emotion audio information segment, firstly, according to the credibility, if the credibility of the plurality of audio information segments is equal and highest, the accumulated working time of the plurality of audio information segments can be considered, the accumulated working time can be the continuous working time from the last work start of the person to be detected to the occurrence time of the corresponding audio information segment, if a first audio information segment and a second audio information segment exist, the credibility of the first audio information segment and the second audio information segment is equal and highest, the occurrence time of the first audio information segment is 9:00 to 9:30 in the morning (the morning work time is 9:00, the corresponding accumulated working time is 0 to half hour), the occurrence time of the second audio information segment is 4:00 to 4:20 in the afternoon (the afternoon work time is 1:00, the corresponding accumulated working time is 3 to 3 hours and 20 minutes), then the judgment is performed according to the accumulated working time, the second audio information segment is an optimal audio information segment, if the accumulated working time corresponding to the multiple audio information segments is also equal, the situation of the emotion represented by the historical emotion level in the corresponding time segment range in the occurrence time of the multiple abnormal emotion audio information segments can be considered, the first audio information segment and the second audio information segment are assumed to have equal and highest credibility and have the same accumulated working time, the historical emotion level in the time range (9: 00-10: 00) corresponding to the occurrence time of the first audio information segment is lower than the historical emotion level in the time range (4: 00-6: 00) corresponding to the occurrence time of the second audio information segment, and the second audio information segment representing the worst historical emotion is the optimal abnormal emotion audio information segment.

And S307, determining the content of intelligent communication according to the type of the abnormal emotion represented by the index of which the corresponding index value meets the preset condition in the optimal abnormal emotion audio information segment determined in the S306.

Further, the different indicators characterize different kinds of abnormal emotions, such as: the keyword index may represent that abnormal emotion of the person to be detected comes from the user, and the mute duration index may represent that the abnormal emotion of the person to be detected comes from fatigue, and the like, so that the content of intelligent communication can be determined according to the type of the abnormal emotion represented by the index, of which the corresponding index value meets the preset condition, in the optimal abnormal emotion audio information segment.

Further, for the intelligent emotion determining system provided in the embodiment of the present invention, an execution subject of this step may be an early warning pushing module.

S308, sending an intelligent communication dialog box with the determined intelligent communication content to a platform where the person to be detected with abnormal emotion is located, and receiving a response message of the person to be detected.

Further, for the intelligent emotion determining system provided by the embodiment of the present invention, the execution main body of the step may be an intelligent communication module, the early warning pushing module determines the content of the intelligent communication, and the intelligent communication module communicates with the person to be detected.

The intelligent communication conversation can be realized through a rich language database, and the intelligent response to the input of the employee to be detected is realized through a keyword retrieval technology. The same keyword in the language library corresponds to multiple reply data, and when intelligent response is carried out, the keyword can be randomly selected and output from the multiple reply data.

The communication form of the intelligent communication module can comprise various forms, such as: the method comprises the following steps that a character communication mode and a voice communication mode are confirmed according to selection of customer service staff during specific implementation, the current idle state of a staff to be detected in non-working state is determined before the step is executed, a disturbance-free button can be provided during specific implementation, and the staff to be detected can indicate whether the staff to be detected is in the working state or the idle state through the disturbance-free button so as to avoid influencing normal customer service continuous working during voice output.

Furthermore, the intelligent communication conversation starting mode corresponding to the method provided by the embodiment of the invention is a passive mode, is automatically pushed by the emotion early warning module, and can also be started as required, and is automatically selected to enter by the person to be detected, so that the emotion self-service management is realized.

S309, for each received response message, determining whether the response message includes a specified keyword.

Further, the specified keyword in this step and the preset keyword in the above step may be the same or different. For each response message received, a comparison may be made with the specified keywords.

Further, for the intelligent emotion determining system provided by the embodiment of the present invention, the execution subject of this step may be a sensitive language collection module.

And S310, after the intelligent communication is finished, determining the emotion level of the current time for the person to be detected according to the severity and the times of the specified keywords included in the received response message.

In the intelligent communication process, the system may receive response messages sent by a plurality of to-be-detected persons, and may determine the emotion level of the current time for the to-be-detected persons according to the severity and times of the specified keywords included in the plurality of response messages, so as to serve as the historical emotion level in the next abnormal emotion determination process.

Furthermore, the intelligent emotion determining system provided by the embodiment of the present invention may further include an exchange injection module, where the exchange injection module provides an exchange injection mode, that is, a background authority user is allowed to select whether to perform injection during an intelligent conversation process between a person to be detected and the system, and after the injection operation, the background authority user may perform real-time hidden communication with the person to be detected by manually replacing the intelligent exchange system, and automatically switch back to the intelligent exchange system by selecting to quit the injection. The person to be detected does not sense in the whole communication process.

Further, in the process of intelligent communication with the person to be detected, when the response of the person to be detected is a problem needing program processing (for example, a problem of consulting date, time and the like), a corresponding program can be called to process the problem, and a program processing result is returned to the person to be detected.

Further, a flow of the cooperative work of the intelligent communication module, the communication injection module, and the sensitive language collecting module may be shown in fig. 4, as shown in fig. 4, fig. 4 is a process of performing intelligent communication with a user, step S401 may be equivalent to a step of receiving a response message of a person to be detected in step S308, step S402 may be equivalent to step S309, step S403 and step S404 may be implemented by the sensitive language collecting module, step S405 and step S408 may be implemented by the sensitive language collecting module, step S406 and step S407 may be implemented by the communication injection module, and steps S409 to S414 may be implemented by the intelligent communication module, where a problem requiring program processing in step S409 is, for example: date and time of consultation.

Based on the same inventive concept, the embodiment of the invention also provides an intelligent emotion determining system, and as the principle of the problems solved by the devices and the systems is similar to that of the intelligent emotion determining method, the implementation of the devices and the systems can refer to the implementation of the method, and repeated parts are not repeated.

One of the intelligent emotion determining systems provided in the embodiments of the present invention, as shown in fig. 5, includes:

the audio collecting module 501 is used for acquiring audio information of a conversation between a person to be detected and a user;

a voice waveform analysis module 502, configured to determine an abnormal emotion audio information segment from among audio information segments constituting the audio information, where the abnormal emotion audio information segment is an audio information segment that includes preset audio information used for representing abnormal emotion of the person to be detected and meets a corresponding preset condition; and when the abnormal emotion audio information segment can be determined, determining that abnormal emotion exists in the person to be detected corresponding to the audio information segment.

Further, the audio collection module 501 is further configured to convert the audio information into a structured audio text file before the speech waveform analysis module 502 determines an abnormal emotion audio information segment from the audio information segments constituting the audio information; and analyzing the audio text file, and determining preset audio information contained in each audio text segment corresponding to each audio information segment in the audio text file, wherein the preset audio information contains at least one of the following information: keyword information, emotion detection information and mute duration information;

the voice waveform analysis module 502 is specifically configured to determine, from the audio information segments, an audio information segment containing a preset keyword; and/or determining the audio information segments when the index values of the preset indexes representing emotion detection indexes reach the corresponding index threshold values from the audio information segments; and/or determining the audio information sections with the mute time length according with the preset time length from the audio information sections.

Further, the audio collecting module 501 is specifically configured to, when the preset audio information includes keyword information, compare, for each audio information segment in the audio information segments, an audio text segment corresponding to the audio information segment with a preset keyword; determining preset keywords contained in the audio text segment, and the starting time and the ending time of the appearance of the preset keywords;

when the preset audio information contains emotion detection information, aiming at each audio information segment in each audio information segment, determining an index value of one or more of the following indexes which characterize emotion detection indexes in an audio text segment corresponding to the audio information segment: speech rate information, amplitude information, frequency information, volume information and fundamental frequency information of both parties of the communication; determining the index of which the index value in the audio text segment reaches the corresponding index threshold value and the starting time and the ending time which occur in the audio text segment when the index value of the index reaches the corresponding index threshold value;

when the preset audio information contains mute duration information, aiming at each audio information segment in each audio information segment, determining the mute duration contained in the audio corresponding to the to-be-detected person in the audio text segment according to the starting time, the ending time and the duration of the audio corresponding to each word of both parties in the audio text segment corresponding to the audio information segment; and determining the mute time length contained in the audio text segment and conforming to the preset time length, and the starting time and the ending time of the occurrence of the contained mute time length.

Further, the system further comprises: an early warning pushing module 503;

the early warning pushing module 503 is configured to determine, at the voice waveform analyzing module 502, a reliability corresponding to an abnormal emotion audio information segment after determining the abnormal emotion audio information segment from among the audio information segments constituting the audio information, and for each abnormal emotion audio information segment, determine a reliability corresponding to the abnormal emotion audio information segment, where, in the abnormal emotion audio information segment, the more indexes whose corresponding index values meet a preset condition, the higher the corresponding reliability is, and the preset condition includes at least two conditions as follows: the method comprises the steps that preset keywords are included, the mute duration accords with the preset duration, the speech speed information of both parties of a call reaches a corresponding index threshold, the amplitude information reaches a corresponding index threshold, the frequency information reaches a corresponding index threshold, the volume information reaches a corresponding index threshold, and the fundamental frequency information reaches a corresponding index threshold; and determining the optimal abnormal emotion audio information segment from the abnormal emotion audio information segments on the basis of the credibility corresponding to each abnormal emotion audio information segment.

Further, the early warning pushing module 503 is specifically configured to determine whether an abnormal emotion audio information segment exists in the abnormal emotion audio information segments, so that the reliability of the abnormal emotion audio information segment is the highest; if the abnormal emotion audio information segment exists, determining the abnormal emotion audio information segment as an optimal abnormal emotion audio information segment; otherwise, if a plurality of abnormal emotion audio information segments exist to enable the credibility of the abnormal emotion audio information segments to be equal and the highest, judging whether one abnormal emotion audio information segment exists in the abnormal emotion audio information segments to enable the accumulated working time corresponding to the occurrence time of the abnormal emotion audio information segment to be the longest; if the abnormal emotion audio information segment exists, determining the abnormal emotion audio information segment as an optimal abnormal emotion audio information segment; otherwise, if a plurality of abnormal emotion audio information segments exist, so that the corresponding accumulated working time of the abnormal emotion audio information segments is equal and longest, judging the abnormal emotion audio information segment with the worst emotion represented by the historical emotion level in the corresponding time segment range in the occurrence time of the abnormal emotion audio information segments; and determining the abnormal emotion audio information segment as an optimal abnormal emotion audio information segment.

Further, the system further comprises: an intelligent communication module 504 and a sensitive language collection module 505;

the early warning pushing module 503 is further configured to determine, after determining the optimal abnormal emotion audio information segment, content of intelligent communication according to a category of an abnormal emotion represented by an index, in the optimal abnormal emotion audio information segment, of which a corresponding index value meets the preset condition;

the intelligent communication module 504 is configured to send an intelligent communication dialog box with the determined content of intelligent communication to the platform where the to-be-detected person with the abnormal emotion is located, and receive a response message of the to-be-detected person;

the sensitive language collecting module 505 is configured to, for each received response message, determine whether the response message includes a specified keyword; and after the intelligent communication is finished, determining the emotion level of the current time for the person to be detected according to the severity and the times of the specified keywords included in the received response message.

As shown in fig. 6, the second intelligent emotion determining system provided in the embodiment of the present invention, compared with the first intelligent emotion determining system provided in fig. 5, further includes: and the communication injection module 601 is used for receiving the voice of the user with the authority in the background and performing real-time interaction with the personnel to be detected when the injection is selected to be started in the intelligent communication process, and performing real-time interaction with the personnel to be detected through the intelligent communication system when the injection is not selected or quitted.

The functions of the above units may correspond to the corresponding processing steps in the flows shown in fig. 1 to 4, and are not described herein again.

Through the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present invention may be implemented by hardware, or by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.

Those skilled in the art will appreciate that the drawings are merely schematic representations of one preferred embodiment and that the blocks or flow diagrams in the drawings are not necessarily required to practice the present invention.

Those skilled in the art will appreciate that the modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, and may be correspondingly changed in one or more devices different from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, it is intended that all such modifications and variations of the invention be covered by the claims and their equivalents

Within the scope of the same technology, it is intended that the present invention encompass such changes and modifications.

Claims

1. An intelligent emotion determination method, comprising:

when the abnormal emotion audio information segment can be determined, determining that abnormal emotion exists in the person to be detected corresponding to the audio information segment;

wherein, after determining an abnormal emotion audio information piece from among the pieces of audio information constituting the audio information, further comprising:

determining the credibility corresponding to each abnormal emotion audio information segment,

when the abnormal emotion audio information segment is in a state that the more indexes of which corresponding index values meet preset conditions, the higher the corresponding credibility is, wherein the preset conditions comprise at least two conditions as follows: the method comprises the steps that preset keywords are included, the mute duration accords with the preset duration, the speech speed information of both parties of a call reaches a corresponding index threshold, the amplitude information reaches a corresponding index threshold, the frequency information reaches a corresponding index threshold, the volume information reaches a corresponding index threshold, and the fundamental frequency information reaches a corresponding index threshold;

and determining the optimal abnormal emotion audio information segment from the abnormal emotion audio information segments on the basis of the credibility corresponding to each abnormal emotion audio information segment.

2. The method according to claim 1, before determining an abnormal emotional audio-information piece from among pieces of audio information constituting the audio information, further comprising:

converting the audio information into a structured audio text file;

analyzing the audio text file, and determining preset audio information contained in each audio text segment corresponding to each audio information segment in the audio text file, wherein the preset audio information contains at least one of the following information: keyword information, emotion detection information and mute duration information;

determining abnormal emotion audio information segments from the audio information segments forming the audio information, and specifically comprising the following steps:

Determining audio information segments of which the index values of preset indexes representing emotion detection indexes reach corresponding index threshold values from the audio information segments; and/or

And determining the audio information sections with the mute time length in accordance with the preset time length from the audio information sections.

3. The method according to claim 2, wherein parsing the audio text file to determine preset audio information included in each audio text segment corresponding to each audio information segment in the audio text file includes:

when the preset audio information contains keyword information, aiming at each audio information segment in each audio information segment, comparing the audio text segment corresponding to the audio information segment with the preset keyword; determining preset keywords contained in the audio text segment, and the starting time and the ending time of the appearance of the preset keywords;

4. The method according to claim 1, wherein determining an optimal abnormal emotion audio information segment from the abnormal emotion audio information segments based on the respective corresponding credibility of each abnormal emotion audio information segment specifically comprises:

judging whether one abnormal emotion audio information segment exists in the abnormal emotion audio information segments or not so that the credibility of the abnormal emotion audio information segment is the highest;

5. The method of claim 1, after determining the optimal segment of aberrant emotional audio information, further comprising:

determining the content of intelligent communication according to the type of the abnormal emotion represented by the index with the corresponding index value meeting the preset condition in the optimal abnormal emotion audio information segment;

sending an intelligent communication dialog box with determined intelligent communication content to a platform where the person to be detected with abnormal emotion is located, and receiving a response message of the person to be detected;

judging whether each received response message includes a specified keyword or not;

and after the intelligent communication is finished, determining the emotion level of the current time for the person to be detected according to the severity and the times of the specified keywords included in the received response message.

6. An intelligent emotion determination system, comprising:

the voice waveform analysis module is used for determining abnormal emotion audio information segments from the audio information segments forming the audio information, wherein the abnormal emotion audio information segments are audio information segments which contain preset audio information used for representing the abnormal emotion of the person to be detected and meet corresponding preset conditions; when the abnormal emotion audio information segment can be determined, determining that abnormal emotion exists in the person to be detected corresponding to the audio information segment;

wherein, still include: an early warning pushing module;

the early warning pushing module is configured to determine, at the voice waveform analysis module, a reliability corresponding to an abnormal emotion audio information segment after the abnormal emotion audio information segment is determined from the audio information segments constituting the audio information, for each abnormal emotion audio information segment, where, in the abnormal emotion audio information segment, the more indexes whose corresponding index values meet a preset condition, the higher the corresponding reliability is, and the preset condition includes at least two conditions as follows: the method comprises the steps that preset keywords are included, the mute duration accords with the preset duration, the speech speed information of both parties of a call reaches a corresponding index threshold, the amplitude information reaches a corresponding index threshold, the frequency information reaches a corresponding index threshold, the volume information reaches a corresponding index threshold, and the fundamental frequency information reaches a corresponding index threshold; and determining the optimal abnormal emotion audio information segment from the abnormal emotion audio information segments on the basis of the credibility corresponding to each abnormal emotion audio information segment.

7. The system of claim 6, wherein the audio collection module is further configured to convert the audio information into a structured audio text file before the speech waveform analysis module determines abnormal emotional audio information segments from among the audio information segments constituting the audio information; and analyzing the audio text file, and determining preset audio information contained in each audio text segment corresponding to each audio information segment in the audio text file, wherein the preset audio information contains at least one of the following information: keyword information, emotion detection information and mute duration information;

the voice waveform analysis module is specifically configured to determine an audio information segment containing a preset keyword from the audio information segments; and/or determining the audio information segments when the index values of the preset indexes representing emotion detection indexes reach the corresponding index threshold values from the audio information segments; and/or determining the audio information sections with the mute time length according with the preset time length from the audio information sections.

8. The system of claim 7, wherein the audio collection module is specifically configured to, when the preset audio information includes keyword information, compare, for each of the audio information segments, an audio text segment corresponding to the audio information segment with a preset keyword; determining preset keywords contained in the audio text segment, and the starting time and the ending time of the appearance of the preset keywords;

9. The system of claim 6, wherein the early warning pushing module is specifically configured to determine whether there is one abnormal emotion audio information segment among the abnormal emotion audio information segments, so that the confidence level of the abnormal emotion audio information segment is highest; if the abnormal emotion audio information segment exists, determining the abnormal emotion audio information segment as an optimal abnormal emotion audio information segment; otherwise, if a plurality of abnormal emotion audio information segments exist to enable the credibility of the abnormal emotion audio information segments to be equal and the highest, judging whether one abnormal emotion audio information segment exists in the abnormal emotion audio information segments to enable the accumulated working time corresponding to the occurrence time of the abnormal emotion audio information segment to be the longest; if the abnormal emotion audio information segment exists, determining the abnormal emotion audio information segment as an optimal abnormal emotion audio information segment; otherwise, if a plurality of abnormal emotion audio information segments exist, so that the corresponding accumulated working time of the abnormal emotion audio information segments is equal and longest, judging the abnormal emotion audio information segment with the worst emotion represented by the historical emotion level in the corresponding time segment range in the occurrence time of the abnormal emotion audio information segments; and determining the abnormal emotion audio information segment as an optimal abnormal emotion audio information segment.

10. The system of claim 6, further comprising: the system comprises an intelligent communication module and a sensitive language collection module;

the early warning pushing module is further used for determining the content of intelligent communication according to the type of the abnormal emotion represented by the index of which the corresponding index value accords with the preset condition in the optimal abnormal emotion audio information segment after the optimal abnormal emotion audio information segment is determined;

the intelligent communication module is used for sending an intelligent communication dialog box with determined intelligent communication content to a platform where the person to be detected with abnormal emotion is located and receiving a response message of the person to be detected;

the sensitive language collecting module is used for judging whether each received response message comprises a specified keyword or not; and after the intelligent communication is finished, determining the emotion level of the current time for the person to be detected according to the severity and the times of the specified keywords included in the received response message.