CN105976820B

CN105976820B - Voice emotion analysis system

Info

Publication number: CN105976820B
Application number: CN201610415352.2A
Authority: CN
Inventors: 陈晓群; 陈志坤; 陈卫东
Original assignee: SHANGHAI LIANGXIANG ELECTRONIC TECHNOLOGY Co Ltd; Shanghai Liangzhi Intelligent Equipment Co Ltd; Shanghai Liangxiang Intelligent Engineering Co Ltd
Current assignee: SHANGHAI LIANGXIANG ELECTRONIC TECHNOLOGY Co Ltd; Shanghai Liangzhi Intelligent Equipment Co Ltd; Shanghai Liangxiang Intelligent Engineering Co Ltd
Priority date: 2016-06-14
Filing date: 2016-06-14
Publication date: 2019-12-31
Anticipated expiration: 2036-06-14
Also published as: CN105976820A

Abstract

The invention provides a speech emotion analysis system, which is characterized by comprising the following components: the voice transcription equipment is connected with the PC, and the PC is operated with: the voice signal processing unit is used for removing background noise in the voice signal acquired by the PC; and the voice emotion analysis unit is used for analyzing the emotion represented by the voice signal obtained after the processing of the voice signal processing unit and displaying the emotion information through the display unit. The voice emotion analysis system provided by the invention can be used for recording the voice data of the suspect at any time and any place in a concealed manner, and obtaining the conclusion whether the suspect lies or not through the analysis of the voice data at a later stage.

Description

Voice emotion analysis system

Technical Field

The invention relates to a system for analyzing emotion types corresponding to voice signals.

Background

The lie detector used at present is an instrument for recording multiple physiological responses, and can be used for assisting detection in criminal investigation to know the psychological condition of a suspect under inquiry so as to judge whether the suspect is related to a criminal case. Modern science proves that the physiology of people actually changes when lying, and some people can observe a series of unnatural human body actions such as scratching ears, shaking legs and feet and the like by naked eyes. Still other physiological changes are not readily perceptible, such as: respiratory rate and blood volume abnormalities, the appearance of respiratory depression and breath-hold; the pulse is accelerated, the blood pressure is increased, the blood output is increased, and the components are changed, so that the skin on the face and the neck is obviously pale or reddish; increased secretion of the subcutaneous sweat glands leads to sweating of the skin, between the eyes or the upper lip first, and especially pronounced sweating of the fingers and palms; enlarging the pupil of the eye; gastric contraction, abnormal secretion of digestive juice, resulting in dry mouth, tongue, and lip; muscle tension, trembling, resulting in speech loss. These physiological parameters are governed by the autonomic nervous system, and are generally not controlled by human consciousness, but are autonomous in movement, and a series of conditioned reflex phenomena occur under external stimulation.

To catch the above phenomenon, lie detectors have been inoculated. The modern lie detector consists of a sensor, a host and a microcomputer. The sensor is connected with the body surface of a person and used for collecting the change information of the physiological parameters of the person; the host is an electronic component and converts analog signals acquired by the sensor into digital signals through processing; the microcomputer stores and analyzes the input digital signal to obtain the lie detection result. Therefore, the existing lie detector has the defect of large volume, and the lie detection can be implemented only by bringing a suspect to a specific place. When the lie detection is implemented, a sensor needs to be installed on the suspect, and the concealed lie detection cannot be realized.

Disclosure of Invention

The invention aims to provide a system capable of carrying out hidden lie detection on a suspect at any time and any place.

In order to achieve the above object, a technical solution of the present invention is to provide a speech emotion analysis system, including:

the voice transcription device is provided with a voice data recording end and two voice leading-out ends, the voice data recording end is connected with a voice output end of a receiver on the telephone, one of the two voice leading-out ends is connected with a voice input end of a host of the telephone, the other voice leading-out end is connected with a PC, and voice signals collected by the receiver are led out to the PC through one voice leading-out end on one hand and are led out to the host of the telephone through the other voice leading-out end on the other hand;

running on the PC are:

the voice signal processing unit is used for removing background noise in the voice signal acquired by the PC;

and the voice emotion analysis unit is used for analyzing the emotion represented by the voice signal obtained after the processing of the voice signal processing unit and displaying the emotion information through the display unit.

Preferably, the voice transcription device further comprises a microphone, wherein the microphone is connected with a PC (personal computer), and the PC collects voice data obtained by the voice transcription device or the microphone.

Preferably, the voice recording device is further included, and the voice signal in a period of time is acquired and stored by the voice recording device and is transmitted to the voice signal processing unit running on the PC.

Preferably, the speech emotion analyzing unit includes:

the voice database firstly defines different emotions, then obtains reference voice signals given by people of different ages and different sexes under different emotions, and records the waveform and the frequency of each reference voice signal in the voice database;

and the voice signal comparison unit is used for obtaining the waveform and the frequency of the voice signal processed by the voice signal processing unit, comparing the obtained waveform with the frequency and the waveform and the frequency of each reference voice signal which are stored in the voice database and correspond to the age group and the gender of the sender of the voice signal to obtain a reference voice signal which is most matched with the current voice signal, and giving the emotion contained in the current voice signal according to the emotion to which the obtained reference voice signal belongs.

Preferably, the speech emotion analyzing unit includes:

the validity analysis unit is used for carrying out emotion parameter validity analysis on the voice signal provided by the voice signal processing unit so as to extract voice emotion characteristic parameters;

the feature parameter classifier is used for dividing the speech emotion feature parameters obtained by the effectiveness analysis unit into short-time features and long-time features, wherein the pitch frequency, the short-time energy, two Mel frequency cepstrum coefficients and five Mel frequency sub-band energies are used as the short-time features, and the Mel energy spectrum dynamic coefficient is used as the long-time feature;

the short-time feature processing unit is used for training and identifying short-time features by using a hidden Markov model method so as to obtain the emotion corresponding to the current voice information;

and the long-term feature processing unit is used for training and identifying the long-term features by using a method of a support vector machine so as to obtain the emotion corresponding to the current voice information.

Preferably, a survey analysis unit is also running on the PC for determining the emotion of the user by questionnaire analysis of the exact presence or absence.

The voice emotion analysis system provided by the invention can be used for recording the voice data of the suspect at any time and any place in a concealed manner, and obtaining the conclusion whether the suspect lies or not through the analysis of the voice data at a later stage.

Drawings

FIG. 1 is a system block diagram of an embodiment of a speech emotion analysis system provided by the present invention.

Detailed Description

In order to make the invention more comprehensible, preferred embodiments are described in detail below with reference to the accompanying drawings.

Example 1

The speech emotion analysis system provided by the invention can work in three different modes, which are respectively as follows:

mode one, survey mode.

In this mode, the user needs to use the survey analysis unit running on the PC to complete the questionnaire displayed on the PC with an exact answer, and then the survey analysis unit analyzes and judges the current emotion of the user according to the answer given by the user.

Mode two, line mode

In this mode, the system is configured as shown in fig. 1, and includes a recording apparatus and a PC. The voice signal is acquired and stored by the recording device over a period of time and transmitted to the PC. The recording device is a portable recording pen or a recording device connected with a telephone communication line.

A voice signal processing unit and a voice emotion analyzing unit are operated on the PC. The voice signal processing unit is used for removing background noise in the voice signal collected by the PC;

in this embodiment, referring to fig. 1, the speech emotion analysis unit includes a speech database and a speech signal comparison unit.

For the voice database, different emotions are defined firstly, reference voice signals given by people of different ages and different sexes under different emotions are obtained, and the waveform and the frequency of each reference voice signal are recorded in the voice database.

For the voice signal comparison unit, the waveform and the frequency of the voice signal processed by the voice signal processing unit are obtained, the obtained waveform and the frequency are compared with the waveform and the frequency of each reference voice signal stored in the voice database and corresponding to the age group and the gender of the sender of the voice signal, a reference voice signal which is most matched with the current voice signal is obtained, and the emotion contained in the current voice signal is given according to the emotion to which the obtained reference voice signal belongs.

Mode three, online mode

In this mode, the recording device is replaced by a speech transcription device or a microphone. The voice transcription equipment is provided with a voice data recording end and two voice leading-out ends, the voice data recording end is connected with a voice output end of a receiver on the telephone, one of the two voice leading-out ends is connected with a voice input end of a host of the telephone, the other voice leading-out end is connected with a PC, and voice signals collected by the receiver are led out to the PC through one voice leading-out end on one hand and are led out to the host of the telephone through the other voice leading-out end on the other hand.

The microphone is connected with the PC, and the PC collects voice data obtained by the voice transcription equipment or collects voice data obtained by the microphone.

And after the voice data is collected by the voice transcription equipment or the microphone, the subsequent processing process is the same as that of the mode two.

Example 2

The present embodiment differs from embodiment 1 in that the speech emotion analyzing unit has a different configuration, and in the present embodiment, the speech emotion analyzing unit includes:

Other principles and operation of this embodiment are the same as those of embodiment 1.

Claims

1. A speech emotion analysis system, comprising:

running on the PC are:

the speech emotion analysis unit analyzes emotion represented by the speech signal obtained after the processing of the speech signal processing unit and displays the emotion information through the display unit, and the speech emotion analysis unit comprises:

2. The system of claim 1, further comprising a microphone, wherein the microphone is connected to a PC, and the PC collects either the speech data obtained by the speech transcription device or the speech data obtained by the microphone.

3. The system of claim 1, further comprising a recording device, wherein the recording device acquires and stores the speech signal for a period of time and transmits the speech signal to the speech signal processing unit running on the PC.

4. The system of claim 1, wherein a survey analysis unit is further operated on the PC for judging the emotion of the user through questionnaire analysis of exact yes or no.