CN110033778B

CN110033778B - Real-time identification and correction system for lie state

Info

Publication number: CN110033778B
Application number: CN201910376615.7A
Authority: CN
Inventors: 周燕; 陈杰; 史斌斌; 张旭; 胡志峰; 曾子涵
Original assignee: Suzhou Vocational University
Current assignee: Suzhou Vocational University
Priority date: 2019-05-07
Filing date: 2019-05-07
Publication date: 2021-07-23
Anticipated expiration: 2039-05-07
Also published as: CN110033778A

Abstract

The invention relates to a lie state real-time recognition and correction system, which comprises a voice input module, a sample classification module, a semantic recognition module, a lie state recognition module, a voice dictionary, an identification module, a correction module and an output module, wherein the voice input module is used for inputting a voice; the voice input module is used for acquiring voice signals; the sample classification module is in communication connection with the voice input module; the semantic recognition module is in communication connection with the sample classification module; the lie state recognition module is in communication connection with the sample classification module; the voice dictionary is in communication connection between the semantic recognition module and the lie state recognition module; the identification module is respectively connected with the semantic recognition module and the lie state recognition module in a communication way; the correction module is respectively in communication connection with the identification module and the voice dictionary; the output module is in communication connection with the correction module, can recognize and identify the reliability level of the words, corrects and generates new problems, and is strong in effectiveness and accurate in acquired information.

Description

Real-time identification and correction system for lie state

Technical Field

The invention relates to voice and lie state detection, in particular to a lie state real-time recognition and correction system.

Background

Some lie often appears in life, a good lie can avoid getting into a greater trouble or danger, but a malicious lie can bring great harm to others or society. Lie is a typical psychological state that affects a person's behavior, facial expressions, physiological changes, and language expressions, among others. For people with higher psychological qualities, this effect is extremely subtle and difficult for normal people to perceive, and thus their lie cannot be quickly and effectively uncovered. With the development of science and technology, part of information can be accurately collected through equipment, and the machine identification lies state is laid a foundation.

Since the nineties of the last century, lie detection devices have developed sufficiently in China to successfully assist in the detection of more than 2000 difficult cases. Therefore, the ministry of public security puts this item as the "eight five" key item, and during the "nine five" period, it is put as the national science and technology project. However, the lie detection device still stays at the stage of collecting information on the site of the device and judging the information in a manual background, so that the timeliness is poor, the judgment period is long, and the opportunity is easily delayed; when the information with low reliability appears, the question is inconvenient to be asked again aiming at the information; and in the process of judgment, the requirement on professional quality of personnel is high, and the wide popularization cannot be realized.

Disclosure of Invention

The invention aims to solve the technical problem of providing a lie state real-time recognition and correction system which can recognize and identify the credibility level of a speech, correct the speech with lower credibility level and generate new problems with pertinence, and has strong effectiveness and accurate acquired information.

In order to solve the technical problem, the invention provides a lie state real-time recognition and correction system which comprises a voice input module, a sample classification module, a voice recognition module, a lie state recognition module, a voice dictionary, an identification module, a correction module and an output module; the voice input module is used for acquiring voice signals; the sample classification module is in communication connection with the voice input module and is used for distinguishing voice signals of a questioner and a questioned person and transmitting the voice signals of the questioned person to the semantic recognition module and the lie state recognition module after noise reduction; the semantic recognition module is in communication connection with the sample classification module and is used for recognizing the semantic meaning of the semantic signal and splitting the whole sentence into characters and words; the lie state recognition module is in communication connection with the sample classification module and is used for recognizing the credibility of the semantic signals and grading the credibility of each character, word, sentence and situation; the voice dictionary is in communication connection between the semantic recognition module and the lie state recognition module and is used for storing an acoustic model and a language model; the identification module is respectively connected with the semantic recognition module and the lie state recognition module in a communication mode, and can be coupled with the word output by the semantic recognition module and the credibility level output by the lie state recognition module in a time sequence to obtain the credibility level of each word; the correction module is respectively in communication connection with the identification module and the voice dictionary and is used for carrying out antisense replacement on verbs and adjectives with lower credibility level output by the identification module and modifying the whole sentences and situations with lower credibility level; the output module is in communication connection with the correction module and is used for outputting the voice signal of the correction module in real time.

Preferably, the semantic recognition module comprises a voice processing unit, an acoustic matching unit and a language processing unit which are sequentially in communication connection; the voice processing unit is used for extracting acoustic characteristics of characters and words of the voice signal and endowing each character and word with a unique label; the acoustic matching unit is used for matching the acoustic features extracted by the voice processing unit with the acoustic features in the voice dictionary to obtain the components and the meanings of each word and each word; the language processing unit is used for correcting the components and the meanings of the words and the phrases according to the grammar structure, the semantic structure and the language model in the voice dictionary.

Preferably, the components are sentence components including a subject, a predicate, an object, an animal, a predicate, a subject, a complement, and a core; the meaning of the words and phrases are intended to have a special meaning.

Preferably, the lie state identification module comprises a preprocessing unit, a sparse representation unit, a deep learning unit and a dynamic sparse bayesian time sequence modeling unit which are sequentially in communication connection; the preprocessing unit is used for sequentially performing framing, windowing, pre-emphasis, endpoint detection and PCA (principal component analysis) dimensionality reduction on the voice signals; the sparse representation unit is used for decomposing the preprocessed voice signal into an over-complete dictionary and a corresponding sparse coefficient matrix; the deep learning unit is a machine learning model constructed by multiple hidden layers and used for extracting deep structure information of the voice signal; the dynamic sparse Bayesian time sequence modeling unit is used for extracting overall characteristic information and detail characteristic information of the voice signal.

Preferably, the overcomplete sparse dictionary is a feature subspace of a feature set of the speech signal; the sparse coefficient matrix is a characteristic parameter of a projection intrinsic structure of the voice signal on the characteristic subspace.

Preferably, the sparse representation method of the sparse representation unit may be based on a fast K-SVD sparse representation algorithm.

Preferably, the depth structure information includes a language feature, a physiological feature, an acoustic feature, and a psycho-perceptual feature.

A real-time lie state recognition and correction system comprises a voice input module, a sample classification module, a voice recognition module, a lie state recognition module, a voice dictionary, an identification module, a correction module, an output module, a manual correction module, a big database and a problem generation module; the voice input module is used for acquiring voice signals; the sample classification module is in communication connection with the voice input module and is used for distinguishing voice signals of a questioner and a questioned person, transmitting the voice signals of the questioned person to the semantic recognition module and the lie state recognition module after noise reduction, and transmitting all input voice signals to the large database; the semantic recognition module is in communication connection with the sample classification module and is used for recognizing the semantic meaning of the semantic signal and splitting the whole sentence into characters and words; the lie state recognition module is in communication connection with the sample classification module and is used for recognizing the credibility of the semantic signals and grading the credibility of each character, word, sentence and situation; the voice dictionary is in communication connection between the semantic recognition module and the lie state recognition module and is used for storing an acoustic model and a language model; the identification module is respectively connected with the semantic recognition module and the lie state recognition module in a communication mode, and can be coupled with the word output by the semantic recognition module and the credibility level output by the lie state recognition module in a time sequence to obtain the credibility level of each word; the correction module is respectively in communication connection with the identification module and the voice dictionary and is used for carrying out antisense replacement on verbs and adjectives with lower credibility level output by the identification module and modifying the whole sentences and situations with lower credibility level; the output module is in communication connection with the correction module and is used for outputting voice signals of the correction module and the problem generation module in real time; the big database is respectively connected to the sample classification module and the lie state recognition module in a communication mode and used for storing voice signals generated in the questioning process and information of questioned persons acquired from the outside; the manual correction module is in communication connection between the correction module and the big database, and is used for artificially correcting the credibility grade of the voice signal of the questioned person; the question generation module is in communication connection between the big database and the output module, and is used for extracting characters, words, sentences and situations with low credibility levels by combining the big data and outputting the extracted characters, words, sentences and situations to the questioner by the output module after generating corresponding questions.

Preferably, the questioned person information includes, but is not limited to, personal basic information, activity time and scope information, credit information and social relationship information.

Preferably, the large database comprises a primary database and a correction database, the primary database is used for storing the voice information generated in the conversation process, and the correction database is used for storing the corrected voice information of the questioned person and keeping the credibility level of all the voice information.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention can detect the lie state of each word spoken by a questioner, mark the credibility level of each character, word, sentence and situation, and modify the characters, words, sentences and situations with lower credibility to ensure that the characters, words, sentences and situations have more authenticity; after the modification is finished, the information with higher credibility grade can be output in real time, so that a questioner can know which information is correct and which information is wrong more easily, and the questioning efficiency and the authenticity of the acquired information are improved;

2. the invention can combine all the relevant information of the questioned person, extracts the characters, words, sentences and situations with lower credibility grade, generates corresponding new problems, and outputs the new problems to the questioning person by the output module, which can generate circulation in the system and out of the system to gradually increase the credibility of all the information, and after multiple circulation, all the required real information can be obtained, the information can be obtained quickly and accurately, a large amount of time is saved, and the interference of wrong information is avoided.

Drawings

FIG. 1 is a schematic structural view of example 1;

fig. 2 is a schematic structural view of embodiment 2.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

referring to fig. 1, the embodiment discloses a lie state real-time recognition and correction system, which includes a voice input module, a sample classification module, a voice recognition module, a lie state recognition module, a voice dictionary, an identification module, a correction module, and an output module.

A voice input module:

the voice input module is a microphone. The voice input module is used for collecting voice signals.

A sample classification module:

the sample classification module is in communication connection with the voice input module. The sample classification module is used for distinguishing the voice signals of the questioner and the questioned person and simultaneously reducing noise. The voice signals of the questioned person extracted by the sample classification module can be simultaneously transmitted to the semanteme recognition module and the lie state recognition module.

Specifically, the sample classification module may distinguish the voice signal samples by way of manual keying, timing and/or timbre.

A semantic recognition module:

the semantic recognition module is in communication connection with the sample classification module. The semantic meaning recognition module is used for recognizing the semantic meaning of the voice signal and splitting the whole sentence into characters and words.

Specifically, the semantic recognition module includes a voice processing unit, an acoustic matching unit and a language processing unit, which are sequentially connected in a communication manner.

The voice processing unit is used for extracting acoustic characteristics of characters and words of a whole sentence of the voice signal and endowing each character and word with a unique label.

The acoustic matching unit is used for matching the acoustic features extracted by the voice processing unit with the acoustic features in the voice dictionary to obtain the components and the meanings of each word and each word.

The language processing unit is used for correcting the components and the meanings of the words and the phrases according to the grammar structure, the semantic structure and the language model in the voice dictionary.

The above-mentioned components are sentence components including subject, predicate, object, animal, fixed, object, complement and central; to the extent that a term or phrase is intended to have a special meaning, the term or phrase is intended to be interpreted in a generic and descriptive sense.

Lie state identification module:

the lie state recognition module is in communication connection with the sample classification module. The lie state recognition unit is used for recognizing the credibility of the voice signals and grading the voice signals. When the lie state identification module identifies and classifies the credibility, the external sensor can be used for collecting facial expression information, emotion change information and physiological change information of the questioned person, so that the accuracy of classification is further improved.

Specifically, the lie state identification module comprises a preprocessing unit, a sparse representation unit, a deep learning unit and a dynamic sparse Bayesian time sequence modeling unit which are sequentially in communication connection.

The preprocessing unit is used for performing framing, windowing, pre-emphasis, endpoint detection and PCA dimension reduction processing on the voice signals in sequence.

The sparse representation unit is used for decomposing the preprocessed voice signal into an over-complete dictionary and a corresponding sparse coefficient matrix. The overcomplete sparse dictionary is a feature subspace of a feature set of the speech signal; the sparse coefficient matrix is a characteristic parameter of the projection intrinsic structure of the voice signal on the characteristic subspace. The sparse representation unit can carry out sparse representation on the original data of the voice signal, the structure is clear, the efficiency is high, the processing cost is low, and convenience is provided for subsequent processing. The method of sparse representation of the sparse representation unit may be based on a fast K-SVD sparse representation algorithm.

The deep learning unit is a machine learning model constructed by multiple hidden layers and used for extracting deep structure information of the voice signal. The depth structure information includes linguistic features, physiological features, acoustic features, and psycho-perceptual features. The extraction method of the deep learning unit comprises the following steps: the approximation of a complex function is completed by utilizing the good multi-dimensional data linear decomposition capability of sparse representation and the deep structure advantage of multilayer nonlinear mapping; gradually mapping input data of the deep learning network through an unsupervised sparse automatic coding algorithm; and extracting the depth structure information of the voice signal. The deep learning unit improves the accuracy and timeliness of voice data analysis. The machine learning model can be a sparse stack self-encoder network (SSAE) model by utilizing signal data, and the machine learning model is simple in construction idea and network structure and easy to operate.

The dynamic sparse Bayesian time sequence modeling unit is used for extracting overall characteristic information and detail characteristic information of the voice signal. The extraction method of the dynamic sparse Bayesian time sequence modeling unit comprises the following steps: dynamically reflecting the change situation of the lying psychological state along with time through a network topological structure of sparse Bayes; thereby calculating the incidence relation and the time sequence relation among the characteristics corresponding to each variable in the network. The dynamic sparse Bayes time sequence modeling unit can deeply express the lie state by extracting the characteristics of different time scales, and the detection accuracy is high.

A speech dictionary:

the voice dictionary is in communication connection between the semantic recognition module and the lie state recognition module. The speech dictionary is used to store acoustic models and language models to provide a data base.

An identification module:

the identification module is respectively connected with the semantic recognition module and the lie state recognition module in a communication mode. The identification module can couple the words output by the meaning recognition module and the confidence level output by the lie state recognition module in time sequence, so that the confidence level of each word and each word is obtained.

A correction module:

the correction module is respectively connected with the identification module and the voice dictionary in a communication way. The correction module can carry out antisense replacement on verbs and adjectives with lower credibility level output by the identification module, modify whole sentences and situations with lower credibility level and obtain corresponding voice signals from the voice dictionary.

An output module:

the output module is in communication connection with the correction module. The output module is used for outputting the voice signal of the correction module in real time. The output signal of the output module may be a sound signal or a text signal.

The optimized structure can accurately identify and mark the lie state of the voice signal of the questioned person, and output the corrected result to the questioner, so that the questioner can acquire the authenticity of the words of the question asking person in real time, and then ask questions continuously in a more targeted manner, thereby acquiring real information, and having high accuracy and timeliness.

Example 2:

referring to fig. 2, the embodiment discloses a lie state real-time recognition and correction system, which includes a voice input module, a sample classification module, a voice recognition module, a lie state recognition module, a voice dictionary, an identification module, a correction module, an output module, a manual correction module, a big database and a problem generation module.

A voice input module:

A sample classification module:

the sample classification module is in communication connection with the voice input module. The sample classification module is used for distinguishing the voice signals of the questioner and the questioned person and simultaneously reducing noise. The sample classification module can convey the extracted voice signals of the questioned person to the semantic recognition module and the lie state recognition module, and convey all the input voice signals to the large database.

A semantic recognition module:

Lie state identification module:

A speech dictionary:

An identification module:

A correction module:

An output module:

the output module is in communication connection with the correction module. The output module is used for outputting the voice signals of the correction module and the problem generation module in real time. The output signal of the output module may be a sound signal or a text signal.

A large database:

the big databases are respectively connected to the sample classification module and the lie state identification module in a communication mode. The large database is used for storing voice signals generated in the questioning process and questioned person information acquired from the outside. Questioned person information includes, but is not limited to, personal basic information, activity time and scope information, credit information, and social relationship information. The big database can provide a data base for semantic recognition and lie states.

The large database comprises a primary database and a correction database. The raw database is used for storing voice information generated in the conversation process. The correction database is used for storing the corrected voice information of the questioned person and keeping the credibility levels of all the voice information.

A manual correction module:

the manual correction module is in communication connection between the correction module and the big database. The manual correction module is used for artificially modifying the credibility grade of the voice signal of the questioned person. When the questioner can be sure that the questioner answers true, the confidence level can be manually increased to speed up the questioning and answering process.

A problem generation module:

the problem generation module is in communication connection between the large database and the output module. The question generation module is used for extracting characters, words, sentences and situations with lower credibility levels by combining big data, and outputting the characters, words, sentences and situations to the questioner by the output module after generating corresponding new questions. The problem generation module facilitates obtaining information with a higher level of confidence.

The optimized structure can generate circulation in and out of the system, gradually increase the credibility of all information, and obtain all required real information after multiple circulation. The information acquisition is quick and accurate, a large amount of time is saved, and the interference of error information is avoided.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein

It should be noted that the above-mentioned embodiments are not intended to limit the scope of the present invention, and all equivalent modifications and substitutions based on the above-mentioned technical solutions are within the scope of the present invention as defined in the claims.

Claims

1. A real-time lie state recognition and correction system is characterized by comprising a voice input module, a sample classification module, a semantic recognition module, a lie state recognition module, a voice dictionary, an identification module, a correction module and an output module;

the voice input module is used for acquiring voice signals;

the sample classification module is in communication connection with the voice input module and is used for distinguishing voice signals of a questioner and a questioned person and transmitting the voice signals of the questioned person to the semantic recognition module and the lie state recognition module after noise reduction;

the semantic recognition module is in communication connection with the sample classification module and is used for recognizing the semantic meaning of the voice signal and splitting the whole sentence into characters and words;

the lie state recognition module is in communication connection with the sample classification module and is used for recognizing the credibility of the voice signal and grading the credibility of each character, word, sentence and situation;

the voice dictionary is in communication connection between the semantic recognition module and the lie state recognition module and is used for storing an acoustic model and a language model;

the identification module is respectively connected with the semantic recognition module and the lie state recognition module in a communication mode, and can be coupled with the word output by the semantic recognition module and the credibility level output by the lie state recognition module in a time sequence to obtain the credibility level of each word;

the correction module is respectively in communication connection with the identification module and the voice dictionary and is used for carrying out antisense replacement on verbs and adjectives with lower credibility level output by the identification module and modifying the whole sentences and situations with lower credibility level;

the output module is in communication connection with the correction module and is used for outputting the voice signal of the correction module in real time; the semantic recognition module comprises a voice processing unit, an acoustic matching unit and a language processing unit which are sequentially in communication connection;

the voice processing unit is used for extracting acoustic characteristics of characters and words of the voice signal and endowing each character and word with a unique label;

the acoustic matching unit is used for matching the acoustic features extracted by the voice processing unit with the acoustic features in the voice dictionary to obtain the components and the meanings of each word and each word;

the language processing unit is used for correcting the components and the meanings of the words and the phrases according to the grammar structure, the semantic structure and the language model in the voice dictionary;

the components are sentence components comprising subjects, predicates, objects, actors, determinants, subjects, complements and hearts; the meaning of the words and phrases are intended to have a special meaning.

2. The lie state real-time identification and correction system according to claim 1, wherein the lie state identification module comprises a preprocessing unit, a sparse representation unit, a deep learning unit and a dynamic sparse bayesian timing modeling unit which are sequentially connected in communication;

the preprocessing unit is used for sequentially performing framing, windowing, pre-emphasis, endpoint detection and PCA (principal component analysis) dimensionality reduction on the voice signals;

the sparse representation unit is used for decomposing the preprocessed voice signal into an over-complete dictionary and a corresponding sparse coefficient matrix;

the deep learning unit is a machine learning model constructed by multiple hidden layers and used for extracting deep structure information of the voice signal;

the dynamic sparse Bayesian time sequence modeling unit is used for extracting overall characteristic information and detail characteristic information of the voice signal.

3. The lie state real-time recognition correction system according to claim 2, wherein the overcomplete sparse dictionary is a feature subspace of a feature set of a speech signal; the sparse coefficient matrix is a characteristic parameter of a projection intrinsic structure of the voice signal on the characteristic subspace.

4. The lie state real-time identification correction system according to claim 3, wherein the sparse representation method of the sparse representation unit is based on a fast K-SVD sparse representation algorithm.

5. The lie state real-time recognition correction system according to claim 4, wherein the depth structure information includes a language feature, a physiological feature, an acoustic feature and a psycho-perceptual feature.

6. A real-time lie-speaking state recognition and correction system is characterized by comprising a voice input module, a sample classification module, a voice recognition module, a lie-speaking state recognition module, a voice dictionary, an identification module, a correction module, an output module, a manual correction module, a large database and a problem generation module;

the voice input module is used for acquiring voice signals;

the sample classification module is in communication connection with the voice input module and is used for distinguishing voice signals of a questioner and a questioned person, transmitting the voice signals of the questioned person to the semantic recognition module and the lie state recognition module after noise reduction, and transmitting all input voice signals to the large database;

the output module is in communication connection with the correction module and is used for outputting voice signals of the correction module and the problem generation module in real time;

the big database is respectively connected to the sample classification module and the lie state recognition module in a communication mode and used for storing voice signals generated in the questioning process and information of questioned persons acquired from the outside;

the manual correction module is in communication connection between the correction module and the big database, and is used for artificially correcting the credibility grade of the voice signal of the questioned person;

the question generation module is in communication connection between the big database and the output module, and is used for extracting characters, words, sentences and situations with low credibility levels by combining the big data and outputting the extracted characters, words, sentences and situations to the questioner by the output module after generating corresponding questions.

7. The lie state real-time identification correction system according to claim 6, wherein the questioned person information includes personal basic information, activity time and scope information, credit information and social relationship information.

8. The lie state real-time recognition correction system according to claim 7, wherein the large database comprises a raw database and a correction database, the raw database is used for storing the voice information generated during the conversation, the correction database is used for storing the corrected voice information of the questioned person, and the credibility level of all the voice information is preserved.