CN115658933A

CN115658933A - Psychological state knowledge base construction method and device, computer equipment and storage medium

Info

Publication number: CN115658933A
Application number: CN202211688048.7A
Authority: CN
Inventors: 张伟; 姚佳; 张思迈; 何行知; 李宏伟; 文凤; 刘斌
Original assignee: Sichuan Provincial Prison Administration; West China Hospital of Sichuan University
Current assignee: Sichuan Provincial Prison Administration; West China Hospital of Sichuan University
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-01-31
Anticipated expiration: 2042-12-28
Also published as: CN115658933B

Abstract

The embodiment of the invention discloses a method and a device for constructing a mental state knowledge base, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring an initial multi-mode sample data set of prisoners; performing data preprocessing on each initial multi-modal sample data set to obtain a target multi-modal sample data set with vocabulary as basic granularity; extracting features in a target multi-modal sample data set from a multi-modal time sequence dimension and a global dimension, and identifying the features based on an attention weight identification model to obtain a psychological state evaluation result of the prisoner; and mining high-frequency and low-frequency terms in the psychological state evaluation result according to a preset frequent term mining rule, and constructing a psychological state knowledge base based on the high-frequency and low-frequency terms. The invention obtains the alignment sample data of the vocabulary granularity, and then mines the psychological state knowledge based on the attention mechanism, and can accurately express the multi-modal knowledge into the knowledge with understandable psychological state.

Description

Psychological state knowledge base construction method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a mental state knowledge base construction method and device, computer equipment and a storage medium.

Background

At present, according to a scheme for acquiring the Psychological state knowledge of prisoners in prisons, relatively mature scales such as Chinese officer Psychological Assessment Personality Assessment test (COPA-PI for short) are mainly adopted for Assessment, but the scales have a delayed characteristic in Assessment, so that the Psychological state of the prisoners is difficult to continuously track, and the accuracy of the scheme for acquiring the Psychological state knowledge of the prisoners is easily influenced.

At present, most advanced researches are developing in the direction of constructing a multi-modal emotion knowledge base, and most current methods for constructing a psychological state knowledge base have the following two problems:

the psychological state recognition model has insufficient mobility, has extremely poor adaptability to a new task, and often needs to collect a large amount of marking data of the new task to retrain the model; the psychological state recognition model recognizes that the psychological state of the person taking a criminal is not interpretable.

Therefore, a method for constructing a multi-modal mental state knowledge base, which is adaptable and capable of outputting interpretable mental state knowledge, is needed.

Disclosure of Invention

In order to solve the above technical problem, embodiments of the present application provide a method, an apparatus, a computer device, and a storage medium for constructing a mental state knowledge base, and the specific scheme is as follows:

in a first aspect, an embodiment of the present application provides a method for constructing a mental state knowledge base, including:

acquiring an initial multi-mode sample data set and psychological assessment personality assessment data of prisoners;

performing data preprocessing on each initial multi-mode sample data set to obtain a target multi-mode sample data set, wherein the target multi-mode sample data set is a multi-mode alignment data set with vocabulary as basic granularity;

extracting interpretable features in the target multi-modal sample data set from a text time sequence dimension, a voice time sequence dimension, an image time sequence dimension and a global dimension respectively to obtain multi-modal features of each vocabulary time period and multi-modal features of the global time period, wherein the multi-modal features comprise text features, voice features and image features;

inputting the psychological assessment personality evaluation data, the multi-modal characteristics of the global time period and the multi-modal characteristics of each vocabulary time period into an attention weight recognition model according to a time sequence to obtain a psychological state assessment result of the prisoner;

and mining high frequency division and low frequency division and multiplication items in the psychological state evaluation result according to a preset frequent item mining rule, and constructing a psychological state knowledge base based on the high frequency division and multiplication items and the low frequency division and multiplication items.

According to a specific implementation manner of the embodiment of the present application, the acquiring an initial multi-modal sample data set and psychological assessment personality assessment data of a prisoner includes:

all prisoners are sampled hierarchically through identity characteristic information to obtain a target prisoner queue, wherein the identity characteristic information comprises a crime name, an age and a criminal period duration;

acquiring an initial multi-mode sample data set and psychological assessment personality evaluation data of each prisoner in the target prisoner queue, wherein the psychological assessment personality comprises evaluation scores corresponding to a lying dimension, a true dimension, an extroversion dimension, a smart dimension, a sympathy dimension, a subordinate dimension, a fluctuation dimension, an impulsion dimension, a abstinence dimension, an autobase dimension, an anxiety dimension, a violence tendency dimension, a metamorphosis psychological dimension and a crime thinking dimension.

According to a specific implementation manner of the embodiment of the present application, the initial multi-modal sample data includes text sample data, audio sample data, and video sample data, and the performing data preprocessing on each initial multi-modal sample data set to obtain a target multi-modal sample data set includes:

performing text cutting on the text sample data to obtain all vocabularies in the text sample data;

acquiring vocabulary time periods corresponding to all vocabularies based on the starting time and the ending time of each vocabulary;

and performing data alignment on the text sample data, the audio sample data and the video sample data based on the vocabulary time period to obtain the target multi-modal sample data set.

According to a specific implementation manner of the embodiment of the present application, the extracting interpretable features in the target multimodal sample data set from a text time sequence dimension, a speech time sequence dimension, an image time sequence dimension, and a global dimension respectively to obtain multimodal features in each vocabulary time period and multimodal features in a global time period includes:

respectively acquiring text interpretable features, voice interpretable features and image interpretable features in the target multi-modal sample data set in each vocabulary time period;

acquiring text interpretable feature change conditions of all vocabulary time periods based on the text interpretable features of each current vocabulary time period and the next vocabulary time period;

acquiring voice interpretable feature change conditions of all vocabulary time periods based on the voice interpretable features of each current vocabulary time period and the next vocabulary time period;

acquiring image interpretable feature change conditions of all vocabulary time periods based on the image interpretable features of each current vocabulary time period and the next vocabulary time period;

respectively acquiring global text features, global voice features and global image features of the target multi-modal sample data set based on a global time period;

and obtaining the multi-modal characteristics of each vocabulary time period according to the text interpretable characteristics and the change conditions thereof, the voice interpretable characteristics and the change conditions thereof, and the image interpretable characteristics and the change conditions thereof, and obtaining the multi-modal characteristics of the global time period according to the global text characteristics, the global voice characteristics and the global image characteristics.

According to a specific implementation manner of the embodiment of the present application, the obtaining of the voice interpretable feature change of each vocabulary time segment based on the voice interpretable feature of each current vocabulary time segment and the next vocabulary time segment thereof includes:

carrying out normalization and grade classification processing on the voice interpretable features of each vocabulary time period to obtain a voice grade of each vocabulary time period;

and acquiring the voice interpretable feature change condition of each vocabulary time period based on the voice grade corresponding to the voice interpretable feature of each current vocabulary time period and the next vocabulary time period.

According to a specific implementation of an embodiment of the present application, the obtaining of image interpretable feature changes for each vocabulary time period based on the image interpretable feature for each current vocabulary time period and its next vocabulary time period comprises:

normalizing and grade classifying the image interpretable features of each vocabulary time period to obtain an image grade of each vocabulary time period;

and acquiring the image interpretable feature change condition of each vocabulary time period based on the image grade corresponding to the image interpretable feature of each current vocabulary time period and the next vocabulary time period.

According to a specific implementation manner of the embodiment of the present application, the mental state assessment result includes a mental assessment personality score corresponding to each modal dimension and a vocabulary time period weight corresponding to each mental assessment personality score;

the mining of the high frequency and low frequency terms in the psychological state evaluation result according to the preset frequent term mining rule comprises the following steps:

acquiring vocabulary time period weights corresponding to the psychological assessment personality scores, wherein the vocabulary time period weights are smaller than a first score threshold value or larger than a second score threshold value;

dividing the vocabulary time period with the weight larger than a preset weight threshold into target vocabulary time periods;

and mining multi-modal characteristic frequent items in the target vocabulary time period based on a preset Aprior algorithm, dividing the multi-modal characteristic frequent items with the psychological assessment personality less than a first score threshold into low-frequency frequent items, and dividing the multi-modal characteristic frequent items with the psychological assessment personality greater than a second score threshold into high-frequency frequent items.

In a second aspect, an embodiment of the present application provides an apparatus for building a mental state knowledge base, including:

the acquisition module is used for acquiring an initial multi-mode sample data set and psychological assessment individual evaluation data of prisoners;

the preprocessing module is used for performing data preprocessing on each initial multi-mode sample data set to obtain a target multi-mode sample data set, wherein the target multi-mode sample data set is a multi-mode alignment data set with granularity based on vocabularies;

the characteristic extraction module is used for extracting interpretable characteristics in the target multi-modal sample data set from a text time sequence dimension, a voice time sequence dimension, an image time sequence dimension and a global dimension respectively to obtain multi-modal characteristics of each vocabulary time period and multi-modal characteristics of the global time period, wherein the multi-modal characteristics comprise text characteristics, voice characteristics and image characteristics;

the attention recognition module is used for inputting the psychological assessment individual evaluation data, the multi-modal characteristics of the global time period and the multi-modal characteristics of each vocabulary time period into an attention weight recognition model according to a time sequence so as to obtain a psychological state assessment result of the prisoner;

and the knowledge base construction module is used for mining high-frequency-division-frequency items and low-frequency-division-frequency items in the psychological state evaluation result according to a preset frequent item mining rule, and constructing a psychological state knowledge base based on the high-frequency-division-frequency items and the low-frequency-division-frequency items.

In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores a computer program, and the computer program, when executed on the processor, executes the mental state knowledge base construction method according to any one of the first aspect and the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program runs on a processor, the computer program performs the mental state knowledge base building method according to the first aspect and any one of the embodiments of the first aspect.

The embodiment of the application provides a mental state knowledge base construction method, a mental state knowledge base construction device, computer equipment and a readable storage medium, and the method comprises the following steps: acquiring an initial multi-mode sample data set and psychological assessment personality assessment data of prisoners; performing data preprocessing on each initial multi-modal sample data set to obtain a target multi-modal sample data set with vocabulary as basic granularity; extracting features in a target multi-modal sample data set from a multi-modal time sequence dimension and a global dimension, and identifying the features based on an attention weight identification model to obtain a psychological state evaluation result of the prisoner; and mining high frequency division and low frequency division and multiplication items in the psychological state evaluation result according to a preset frequent item mining rule, and constructing a psychological state knowledge base based on the high frequency division and multiplication items and the low frequency division and multiplication items. The method preprocesses the multi-mode sample data to obtain the alignment sample data of the vocabulary granularity, and then mines the psychological state knowledge in the alignment sample data based on the attention mechanism, so that the multi-mode knowledge can be accurately expressed into the knowledge with understandable psychological state.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the present invention. Like components are numbered similarly in the various figures.

FIG. 1 is a schematic flow chart of a method for constructing a mental state knowledge base according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram illustrating an application of a method for constructing a mental state knowledge base to perform a data preprocessing step on each initial multi-modal sample data set according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating an application of an attention weight recognition model in a mental state knowledge base construction method according to an embodiment of the present application;

fig. 4 is a second schematic view illustrating an application of an attention weight recognition model of a mental state knowledge base construction method according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating an application of frequent item mining steps of a mental state knowledge base construction method according to an embodiment of the present application;

fig. 6 shows a schematic device module diagram of a mental state knowledge base building device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another, and are not to be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as terms defined in a commonly used dictionary) will be construed to have the same meaning as the contextual meaning in the related art and will not be construed to have an idealized or overly formal meaning unless expressly so defined in various embodiments of the present invention.

Referring to fig. 1, a method flow diagram of a method for constructing a mental state knowledge base provided in an embodiment of the present application is shown, and as shown in fig. 1, the method for constructing a mental state knowledge base provided in the embodiment of the present application includes:

step S101, obtaining an initial multi-mode sample data set and psychological assessment and individual evaluation data of prisoners;

in a particular embodiment, the initial multi-modal set of sample data comprises audio samples, video samples and text samples.

The initial sample data set can be collected by constructing a preset number of open questions in advance and recording open answers of the prisoners to the open questions through a camera, a recorder and other equipment.

In the process of collecting the initial sample data set, the openness question is used for a prisoner to express a recent mood state, and the openness question may be a question for guiding the prisoner to perform openness expression, such as "how you have been today", "what you have happened today", and "how you have been in recent mood". The openness problem can be set adaptively according to the actual application scene.

In the process of answering the open questions by the prisoners, audio and video data of the prisoners are synchronously recorded through equipment such as a camera and a recorder, and the audio data are converted into text data through a text conversion program, so that the audio data, the video data and the text data are in one-to-one correspondence.

The Psychological Assessment Personality Assessment data can be measured through a preset Chinese person Psychological Assessment Personality test (COPA-PI) scale so as to obtain Assessment scores of 14 dimensions of the person serving criminals in the lying dimension, the consummation dimension, the camber dimension, the clever dimension, the sympathy dimension, the subordinate dimension, the fluctuation dimension, the impulsion dimension, the abstinence dimension, the self-inferior dimension, the anxiety dimension, the violence tendency dimension, the metamorphosis psychology dimension and the crime thinking dimension.

Specifically, the prisoner can also be replaced by other people to acquire mental state knowledge, and the mental state knowledge base construction method provided by the embodiment can be selectively applied according to actual application scenes.

acquiring an initial multi-modal sample data set and psychological assessment personality score evaluation data of each prisoner in the target prisoner queue, wherein the psychological assessment personality score comprises evaluation scores corresponding to a lie dimension, an attentive dimension, an extroversion dimension, a clever dimension, a sympathy dimension, a subordinate dimension, a fluctuation dimension, an impulsive dimension, a abstinence dimension, an inferior dimension, an anxiety dimension, a violence tendency dimension, an allergy psychological dimension and a crime thinking dimension.

In a specific embodiment, when obtaining the relevant data of the prisoners, the prisoners may be first arranged in groups according to the identity characteristic information of the prisoners.

By means of preset criminal name information, age information and criminal period duration information, layered sampling is conducted from a criminal person database, and a target criminal person queue with multiple criminal names, multiple years of period and multiple criminal period durations can be obtained.

By acquiring the target prisoner queue, the psychological state knowledge acquired by the psychological state knowledge base can cover a large range of prisoners, and the device has the specialty of the prisoners aiming at various identity information, and is convenient for follow-up monitoring and research of the psychological state knowledge of the prisoners specifying the names of the crimes, the annual period or the duration of the criminal period.

Step S102, performing data preprocessing on each initial multi-mode sample data set to obtain a target multi-mode sample data set, wherein the target multi-mode sample data set is a multi-mode alignment data set with vocabulary as basic granularity;

in a specific embodiment, after the initial multi-modal sample data set of each prisoner is obtained, since the initial multi-modal sample data set includes an audio sample, a video sample, and a text sample, the embodiment further performs data processing of vocabulary segmentation and sample alignment on each modal sample to obtain a multi-modal aligned data set based on vocabulary.

In a specific embodiment, any word segmentation tool may be used to perform word segmentation on the text sample in the initial multi-modal sample data set, so as to obtain each word of each sentence of each open question answer by a prisoner.

For example, as shown in fig. 2, for the open question "what has been done today", the criminal answers "return, i.e. breakfast is a little bit more common, and then quarrel with a friend in the morning", by text cutting the text sample, one can get "return", "is", "breakfast", "click", "common", "then", "morning", "with a friend" and "quarrel".

In this embodiment, after all vocabularies corresponding to a text sample are obtained, time alignment processing of multimodal data is performed based on the vocabularies as a basic granularity in a manner of assisting time segmentation and cutting by artificial intelligence.

Specifically, based on each vocabulary, the audio data and the video data are combined to obtain the start time and the end time corresponding to any vocabulary, and then the start time and the end time of each vocabulary are utilized to perform time dotting in the audio data and the video data so as to align the modal data. The starting time is a time point for starting to express the vocabulary, and the ending time is a time point for stopping expressing the vocabulary.

The present embodiment may employ a speech aligner speed-aligner or the like to perform the vocabulary time point aligning operation.

According to a specific implementation manner of the embodiment of the application, after the multi-modal sample data is aligned based on the speech aligner speed-aligner, the aligned multi-modal sample data can be returned to the user interface, so that a user can fine-tune the start time and the end time of each vocabulary through a manual means, and it is ensured that the vocabulary corresponding to the start time and the end time can be clearly heard.

And performing data preprocessing on the initial multi-modal sample data to obtain a target multi-modal sample data set with vocabulary as basic granularity. In the target multi-modal sample data, voice data and video data of corresponding time segments can be obtained through each word of each sentence spoken by a prisoner.

Step S103, interpretable features in the target multi-modal sample data set are extracted from a text time sequence dimension, a voice time sequence dimension, an image time sequence dimension and a global dimension respectively to obtain multi-modal features of each vocabulary time period and multi-modal features of the global time period, wherein the multi-modal features comprise text features, voice features and image features;

in particular, the interpretable feature is an interpretable feature that enables the mental state knowledge to be clearly interpretable, facilitating subsequent research of the acquired mental state knowledge.

The embodiment extracts interpretable features of each vocabulary time period and the global time period from multiple dimensions respectively to obtain various interpretable features so as to acquire mental state knowledge.

According to a specific implementation manner of the embodiment of the present application, the extracting interpretable features in the target multimodal sample data set from a text time sequence dimension, a speech time sequence dimension, an image time sequence dimension, and a global dimension respectively to obtain multimodal features of each vocabulary time segment and multimodal features of a global time segment includes:

and obtaining the multi-modal characteristics of each vocabulary time period according to the text interpretable characteristic and the change situation thereof, the voice interpretable characteristic and the change situation thereof, the image interpretable characteristic and the change situation thereof, and obtaining the multi-modal characteristics of the global time period according to the global text characteristic, the global voice characteristic and the global image characteristic.

In a specific embodiment, the method for obtaining interpretable features from dimensions can be split into the following steps:

step one, interpretable features in the target multi-mode sample data set are obtained from text time sequence dimensions.

In order to construct an interpretable knowledge base, aiming at each vocabulary in the target multi-modal sample data set, extracting corresponding text interpretable features, wherein the text interpretable features comprise vocabulary topics and vocabulary positivity and negativity, the vocabulary topics are obtained by means of kmeans clustering by utilizing open source vocabulary vectors, the categories of the vocabulary topics comprise 100 categories, the vocabulary positivity and negativity comprise 3 categories which are positive, negative and neutral, and the text interpretable features comprise 100+3 features.

Meanwhile, text time sequence characteristics formed by two continuous words are further extracted, the text time sequence characteristics are also called text interpretable characteristic change conditions and comprise two time sequence characteristics of theme change and positive and negative change, and the total number of the time sequence characteristics is 1000+ 3. Specifically, the feature quantity of the theme change may be adaptively selected according to an actual application scenario, and the first 1000 theme changes are selected from the full data in this embodiment.

And secondly, acquiring interpretable features in the target multi-modal sample data set from a voice time sequence dimension.

In order to construct an interpretable knowledge base, aiming at the voice of each vocabulary time period in the target multi-modal sample data set, extracting corresponding voice interpretable characteristics, wherein the voice interpretable characteristics comprise 12 characteristics of root mean square energy, attack time, zero crossing rate, autocorrelation, spectral centroid, mel Frequency Cepstrum Coefficient (MFCC), spectral flatness, spectral flux, fundamental Frequency f0, detuning degree, loudness and sharpness.

According to a specific implementation manner of the embodiment of the present application, the obtaining of the voice-interpretable feature change of each vocabulary time period based on the voice-interpretable feature of each current vocabulary time period and the next vocabulary time period comprises:

In an embodiment, for each of the speech interpretable features in a single vocabulary time period, a corresponding normalization and level classification process is performed to normalize each of the speech interpretable features to a predetermined level. In this embodiment, the level classification process may be 5 levels. The level classification process may be adaptively replaced according to an actual application scenario, and is not limited herein.

After normalization and level classification processing are performed on the voice interpretable features, 12 feature levels including a root mean square energy level, an attack time level, a zero-crossing rate level, an autocorrelation level, a spectrum centroid level, a Mel Frequency Cepstrum Coefficient level (MFCC for short), a spectrum flatness level, a spectrum flux level, a pitch Frequency f0 level, a detuning level, a loudness level and a sharpness level can be obtained.

Aiming at voice interpretable features of two continuous vocabulary time periods, obtaining voice interpretable feature change conditions, wherein the voice interpretable feature change conditions comprise root mean square energy level change, attack time level change, zero-crossing rate level change, autocorrelation level change, spectrum centroid level change, mel Frequency Cepstrum Coefficient level change (MFCC for short), spectrum flatness level change, spectrum flux level change, fundamental tone Frequency f0 level change, detuning level change, loudness level change and sharpness level change.

In this embodiment, the obtained speech interpretable feature and its variation include 5 × 12=300 features.

And thirdly, acquiring interpretable features in the target multi-modal sample data set from an image time sequence dimension.

In order to construct an interpretable knowledge base, corresponding image interpretable features are extracted aiming at video samples in a target multi-modal sample data set of each vocabulary time period, wherein the image interpretable features comprise 201 relative positions of key points of a face, 8 key area areas, 8 key area sizes and 9 emotion indexes, and 226 features are total.

Of these, the 9 mood indicators may include anger, disgust, fear, happiness, bruising, surprise, mouth pounding, face ghosting, and no mood.

The relative positions of the key point positions of the human face comprise the farthest distance, the nearest distance and the average angle of all the point positions relative to the central axis of the face.

According to a specific implementation manner of the embodiment of the present application, the obtaining of the image interpretable feature change of each vocabulary time period based on the image interpretable feature of each current vocabulary time period and the next vocabulary time period comprises:

In a particular embodiment, similar to the above-described speech-interpretable feature, normalization and rank-classification processing is also performed on the image-interpretable feature, and an extraction of image-variation features is formed based on two consecutive vocabulary periods.

For the method for obtaining the grade division and the variation of the interpretable feature of the image, reference may be made to the description of the voice interpretable feature, which is not repeated herein.

It should be noted that in this embodiment, the interpretable feature of the captured image and its variations include 5 × 226 features.

In addition, the grade range of the grade classification processing division of the voice interpretable features and the grade range of the grade classification processing division of the image interpretable features can be the same or different, and a user can carry out self-adaptive setting according to an actual application scene.

And fourthly, acquiring interpretable features in the target multi-modal sample data set from the global dimension.

In order to construct an interpretable knowledge base, for a target multi-modal sample data set in a global time period, the embodiment further performs global feature extraction from three feature dimensions of audio, video and text respectively.

Specifically, in the global feature extraction process, for the continuity dimension, taking the average value as the global feature; for the dimension of the discrete values, the discrete value with the largest occurrence number in each discrete value is taken as the global feature.

As shown in fig. 3, the global text feature is a character feature in a full time period, the global voice feature is a voice feature in a full time period, and the global image feature is an image feature in a full time period.

It should be noted that the feature length of the global feature of the same dimension is equal to the sum of the feature lengths of the features of all the vocabulary time periods of the same dimension.

Step S104, inputting the psychological assessment individual evaluation data, the multi-modal characteristics of the global time period and the multi-modal characteristics of each vocabulary time period into an attention weight recognition model according to a time sequence to obtain a psychological state assessment result of the prisoner;

in a specific embodiment, an Attention weight recognition model capable of analyzing the multi-modal features obtained in the above embodiment needs to be constructed in advance, and the Attention weight recognition model is shown in fig. 3 and 4.

As shown in fig. 3, the CLS, the multi-modal features of the global time segment, the SEQ, and the multi-modal features of each vocabulary time segment are input to the attention weight recognition model, and then a corresponding psychological assessment personality result can be obtained through a sigmoid function.

The attention weight recognition model is combined with the sigmoid function to obtain psychological assessment personality results recognized by all the features, and the psychological assessment personality results comprise lying, identity, camber, clever, sympathy, affiliation, fluctuation, impulsion, abstinence, self-rate, anxiety, violence tendency, metamorphosis psychology and criminal thinking.

Specifically, the input of the CLS may use the psychological assessment personality assessment data obtained in the above embodiments to perform basic classification for the corresponding criminal.

In fig. 3 and 4, the multi-modal features of the global time segment include full-time speech features, full-time image features, and full-time text features. Each vocabulary time period comprises T

And n is an integer.

The single/cross-period voice characteristic is a voice interpretable characteristic and a change condition thereof, the single/cross-period image characteristic is an image interpretable characteristic and a change condition thereof, and the single/cross-period character characteristic is a text interpretable characteristic and a change condition thereof.

in a specific embodiment, as shown in fig. 4, the attention weight recognition model may incorporate a softmax function to output a psychometric personality score for each modal dimension for each vocabulary time period, as well as a vocabulary time period weight.

For example, when the output result is lie 8, the time period weight corresponding to each vocabulary is 50% of the time weight at T +0, 30% of the global time period weight, and 20% of the time weight at T + 3.

And S105, mining high-frequency-division-frequency items and low-frequency-division-frequency items in the psychological state evaluation result according to a preset frequent item mining rule, and constructing a psychological state knowledge base based on the high-frequency-division-frequency items and the low-frequency-division-frequency items.

In particular embodiments, the attention weight recognition model may identify different psychological assessment personality scores for different dimensions of input.

Based on preset screening rules, the psychological assessment personality score in the corresponding range can be selected as the acquisition range of the psychological state knowledge, so that targeted frequent item mining is performed, and accurate psychological state knowledge is obtained.

and mining multi-modal characteristic frequent items in the target vocabulary time period based on a preset Aprior algorithm, dividing the multi-modal characteristic frequent items with the individual psychological assessment scores smaller than a first score threshold into low-frequency frequent items, and dividing the multi-modal characteristic frequent items with the individual psychological assessment scores larger than a second score threshold into high-frequency frequent items.

In this embodiment, the first score threshold may be set to 3, the second score threshold may be set to 6, and the first score threshold and the second score threshold may also be adaptively set according to an actual application scenario, and it is known that the second score threshold is greater than the first score threshold.

According to the score threshold setting of the present embodiment, the vocabulary period weights corresponding to the psychological assessment personality score lower than 3 and the psychological assessment personality score higher than 6 can be screened out.

And performing power reduction arrangement on the obtained all vocabulary time period weights to obtain a target vocabulary time period with the vocabulary time period weight larger than a preset weight threshold, wherein the target vocabulary time period is a high-correlation time period.

Frequent item mining is performed based on the Aprior algorithm, and a frequent item mining result as shown in fig. 5 can be obtained.

As shown in fig. 5, the voice frequent items mined in the target vocabulary period are loudness 3 and loudness 3-2, wherein loudness 3-2 indicates that the loudness changes from level 3 to level 2 in the continuous vocabulary period; the image frequent item is that the eye positive angle and the eye flat angle are changed into positive angles; the text frequent items are breakfast and breakfast-negativity, wherein the breakfast-negativity is that the positive and negative of the vocabulary of the breakfast are negative; the multi-modal frequent term is the canthus positive angle and loudness 3.

Specifically, in the mining process, the frequent items mined based on the psychological assessment personality score below 3 points and the vocabulary time period weight thereof are low-frequency frequent items, and the frequent items mined based on the psychological assessment personality score above 6 points and the vocabulary time period weight thereof are high-frequency frequent items.

The low frequency-division multiple term can be used as low-fraction psychological state knowledge of a certain dimension, and the high frequency-division multiple term can be used as high-fraction psychological state knowledge of a certain dimension, and all frequent terms obtained by combining the mining results can be used for obtaining all psychological state knowledge used for constructing the psychological state knowledge base.

In a specific embodiment, after obtaining the mental state knowledge shown in fig. 5, the mental state knowledge is stored in a preset database, so that the construction of the mental state knowledge base can be completed.

The construction steps of the mental state knowledge base are not limited in this embodiment, and the mental state knowledge base can be constructed in a self-adaptive manner according to an actual application scenario.

In summary, the present embodiment provides a method for constructing a mental state knowledge base, which can ensure the integrity and the fineness of the obtained multi-modal data by constructing a target multi-modal sample data set using vocabularies as basic units, so as to refine the granularity of vocabularies during knowledge mining. By constructing the multi-modal knowledge base mined by the attention screening characteristics of vocabulary time and frequent items, the multi-modal knowledge base construction scheme which has a wide application range and can provide accurate psychological state knowledge of prisoners can be obtained, and the method is favorable for further criminal research based on the psychological state knowledge base.

Referring to fig. 6, a schematic diagram of device modules of an apparatus 600 for constructing a mental state knowledge base according to an embodiment of the present application is provided, and as shown in fig. 6, the apparatus 600 for constructing a mental state knowledge base according to the embodiment of the present application includes:

the acquisition module 601 is used for acquiring an initial multi-mode sample data set and psychological assessment individual evaluation data of prisoners;

a preprocessing module 602, configured to perform data preprocessing on each initial multi-modal sample data set to obtain a target multi-modal sample data set, where the target multi-modal sample data set is a multi-modal aligned data set with a vocabulary as a basic granularity;

a feature extraction module 603, configured to extract interpretable features in the target multimodal sample data set from a text time sequence dimension, a voice time sequence dimension, an image time sequence dimension, and a global dimension, respectively, to obtain multimodal features of each vocabulary time period and multimodal features of the global time period, where the multimodal features include text features, voice features, and image features;

an attention recognition module 604, configured to input the psychological assessment personality evaluation data, the multi-modal features of the global time period, and the multi-modal features of each vocabulary time period into an attention weight recognition model according to a time sequence, so as to obtain a psychological state assessment result of the prisoner;

a knowledge base construction module 605, configured to mine the high frequency-division complex item and the low frequency-division complex item in the mental state evaluation result according to a preset frequent item mining rule, and construct a mental state knowledge base based on the high frequency-division complex item and the low frequency-division complex item.

In addition, the present application further provides a computer device, where the computer device includes a processor and a memory, where the memory stores a computer program, and the computer program executes the mental state knowledge base building method in the foregoing embodiment when running on the processor.

The embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a processor, the method for constructing the mental state knowledge base in the above embodiment is executed.

In addition, for the specific implementation processes of the psychological state knowledge base constructing apparatus, the computer device, and the computer-readable storage medium mentioned in the above embodiments, reference may be made to the specific implementation processes of the above method embodiments, which are not described in detail herein.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part of the technical solution that contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims

1. A mental state knowledge base construction method is characterized by comprising the following steps:

2. The method according to claim 1, wherein said obtaining an initial multi-modal sample dataset of prisoners and psychometric personality assessment data comprises:

3. The method of claim 1, wherein the initial multi-modal sample data comprises text sample data, audio sample data, and video sample data, and the pre-processing each initial multi-modal sample data set to obtain a target multi-modal sample data set comprises:

performing text cutting on the text sample data to obtain all words in the text sample data;

4. The method of claim 3, wherein the extracting interpretable features from the target multimodal sample set of data from a text temporal dimension, a speech temporal dimension, an image temporal dimension, and a global dimension, respectively, to obtain multimodal features for each vocabulary time segment and multimodal features for a global time segment, comprises:

acquiring voice interpretable feature change conditions of the vocabulary time periods based on the voice interpretable features of each current vocabulary time period and the next vocabulary time period;

5. The method of claim 4, wherein obtaining the speech-interpretable feature variations for each vocabulary time segment based on the speech-interpretable feature for each current vocabulary time segment and its next vocabulary time segment comprises:

normalizing and classifying the voice interpretable characteristics of each vocabulary time period to obtain a voice grade of each vocabulary time period;

6. The method of claim 4, wherein said obtaining image interpretable feature changes for each of the current vocabulary time periods and the next vocabulary time period based on the image interpretable feature of the vocabulary time period comprises:

7. The method of claim 1, wherein the mental state assessment results comprise a mental assessment personality score for each modal dimension and a vocabulary time period weight for each mental assessment personality score;

the mining of the high frequency division complex item and the low frequency division complex item in the psychological state evaluation result according to a preset frequent item mining rule comprises the following steps:

acquiring vocabulary time period weight corresponding to the psychological assessment personality score smaller than a first score threshold value or larger than a second score threshold value;

8. A mental state knowledge base construction apparatus, comprising:

the attention recognition module is used for inputting the psychological assessment individual evaluation data, the multi-modal characteristics of the global time period and the multi-modal characteristics of each vocabulary time period into an attention weight recognition model according to a time sequence so as to obtain a psychological state assessment result of the prisoners;

9. A computer device, characterized in that the computer device comprises a processor and a memory, the memory storing a computer program which, when run on the processor, performs the mental state knowledge base construction method of any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when run on a processor, performs the mental state knowledge base construction method of any one of claims 1 to 7.