CN112597271B - Method for predicting attitudes of criminal case trial and appraisal persons in court trial process - Google Patents

Method for predicting attitudes of criminal case trial and appraisal persons in court trial process Download PDF

Info

Publication number
CN112597271B
CN112597271B CN202011103541.9A CN202011103541A CN112597271B CN 112597271 B CN112597271 B CN 112597271B CN 202011103541 A CN202011103541 A CN 202011103541A CN 112597271 B CN112597271 B CN 112597271B
Authority
CN
China
Prior art keywords
trial
data
audio
video
attitudes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011103541.9A
Other languages
Chinese (zh)
Other versions
CN112597271A (en
Inventor
杨亮
曾景杰
李树群
林鸿飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202011103541.9A priority Critical patent/CN112597271B/en
Publication of CN112597271A publication Critical patent/CN112597271A/en
Application granted granted Critical
Publication of CN112597271B publication Critical patent/CN112597271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A criminal case trial reported person attitude prediction method oriented to a court trial process belongs to the field of intelligent judicial and is used for predicting the criminal case trial reported person attitude and is characterized by comprising S1 and acquisition and analysis of court trial data; s2, data are regular and labeling is aligned; s3, extracting and expanding multi-mode information in a court trial process; s4, predicting the attitudes of the criminal case trial and appraisal persons by using the multi-mode information. The method has the advantages that a complete flow framework from the original data acquisition, the characteristic engineering, the model construction and the like is constructed aiming at the court trial process in the intelligent judicial, and the good prediction performance is realized on the criminal case trial informed attitude prediction task.

Description

Method for predicting attitudes of criminal case trial and appraisal persons in court trial process
Technical Field
The invention belongs to the field of data mining, and relates to a floor for intelligent judicial, which can predict attitudes of criminal case trial and appraisal in a court trial process.
Background
With the continuous development of the natural language processing field, in recent years, intelligent judicial as an important research direction thereof presents a gradually rising trend, and the research can effectively help legal staff to liberate from tedious and repeated work, give related staff convenient and quick legal advice, lighten the working pressure of lawyers, and play a role of common law for most common people lacking legal background.
Generally, intelligent judicial includes specific subtasks, such as fine prediction, sentency prediction, legal prediction, etc., and generally, the principal research of the sentency prediction task is currently found to be a text prediction task in natural language processing, and researchers can mine semantic information from the text to solve fine or legal prediction tasks.
However, in the actual court trial process, the situation is complex and changeable. People often say "tank wide, resistant to severe", and the performance of a person in court trial process, such as posture, mood, and mind, affects to some extent his final sentency results, and legal terms also include, for example, despise court criminals. The attitudes and behaviors of the criminal case trial interviewee can be said to reflect the degree of misidentification and countercheck of the criminal case trial interviewee. However, at present, few researches on analyzing attitudes of criminal case trial persons in a court trial process exist, and a multi-mode court trial data set related to the court trial process is lacking. Therefore, the accuracy of the criminal prediction task can be effectively improved by constructing the multimodal court trial data set and analyzing the attitudes of the criminal case trial interviewees based on the data set.
Disclosure of Invention
In order to solve the problem of predicting attitudes of judgment subjects of criminal cases, the invention provides the following technical scheme: a criminal case trial reported person attitude prediction method in court trial process comprises the following steps:
s1, acquiring and analyzing court trial data;
S2, data are regular and labeling is aligned;
s3, extracting and expanding multi-mode information in a court trial process;
s4, predicting attitudes of the criminal case trial and appraisal persons by using the multi-mode information.
Further, the step S1 of acquiring and analyzing the court trial data specifically includes:
A1. data range screening: data are collected from a Chinese referee document network (http:// wenshi. Curt. Gov. Cn/Index) and a Chinese court trial public network (http:// tingshen. Curt. Gov. Cn /).
A2. and (3) data acquisition: determining the acquired court trial case number, and crawling information of the court trial case on the target court hold court trial case number according to the appointed court trial video and the judge document by writing a webpage crawler program;
A3. webpage analysis: analyzing the data in the crawled web page by means of BeautifulSoup tools, and analyzing the related information of the court trial cases through the HTML tags and the attribute information.
Further, the step s2, the data normalization and the label alignment specifically include:
B1. Selecting data;
B2. And (3) data processing: editing the collected video data, extracting the audio matched with the video clip to serve as audio data, extracting matched words from the audio, and storing text information in a TXT format;
B3. And (3) data marking: and a plurality of annotators watch the video clips and carry out data annotation, and the final result of the attrition of the annotating criminal case judgment reported person is obtained according to the votes of the annotators.
Further, the step S3 is extraction and expansion of multi-mode information in the court trial process, and specifically comprises the following steps:
C1. Text feature extraction: extracting the characteristics of the text by using BERT, for each sentence L, obtaining 768-dimensional sentence vector representation after processing, fusing parameters of a transformation former layer of the last three layers, taking an average value of the parameters, selecting a CLS in front of each sentence as the representation of the sentence, wherein the CLS is a special mark symbol and is placed in front of the sentence, and because the CLS does not contain own meaning, the CLS is considered to be expressed as sentence vector information of the sentence, the semantic representation of the sentence is learned, further using BERT-base-Chinese parameters to finely tune the text information, and finally, each sentence is expressed as a unique 768-dimensional vector;
C2. Extracting audio characteristics: accessary punishment the audio features are extracted from the talking of the person to be examined, the audio extraction is used for working pyAudioAnalysis, audio data are firstly loaded, then an input signal is divided into short frames, a plurality of features are calculated for each frame, then a sequence of feature vectors is generated from the whole audio, the sequence is input as a time sequence signal, the sampling rate is 16000 Hz, the frame size is 50 milliseconds, the frame step length is 25 milliseconds, and finally a Numpy matrix of 34 rows and 800 columns is obtained, wherein 800 is the feature vector of the short-term frames recorded by the input audio;
C3. Video feature extraction: the PySlowFast frame was used to extract visual functions from video with pre-trained SlowFast model, the height and width of each extracted frame was set to 256 x 256, and 32 sampling windows were set in the center of each frame, and the average value of the d v =2043-dimensional feature vector U i obtained per frame was calculated, with the number of frames per second being 30 FPS.
Further, step S4, predicting the attitudes of the judgment subjects of the criminal cases by using the multi-mode information, specifically includes:
D1, after processing the data in step S2, the obtained data set includes: video, audio, text, referee document, annotation information;
D2, multi-mode information fusion: the method comprises the steps that characteristics of three modes of video, audio and text are used as input, the text characteristics are extracted from Bert CLS and are 768-dimensional vectors, the audio characteristics are 34 rows and 800 columns of matrixes, the matrixes are flattened by MLP and become 768-dimensional characteristic vectors, the video characteristics are 2304-dimensional vectors, the MLP is used for converting the vectors into 768-dimensional vectors, the three dimensions are combined into 768-dimensional vectors and are connected in series to form 2304-dimensional vectors, and after the vectors pass through the MLP, the attitudes of a criminal case trial person to be informed are classified;
D3, evaluation indexes: the evaluation index used was the F1 value, which was defined as follows: f1 value = correct rate x recall rate x 2/(correct rate + recall rate).
The beneficial effects are that: the invention provides a criminal case judgment reported person attitude prediction model integrating multi-mode information in the intelligent judicial field creatively through a multi-mode method. The method overcomes the defect of insufficient utilization of the prior information to a certain extent, and can not analyze the attitudes of the criminal case trial and appraisal from multiple dimensions. Moreover, the method can effectively identify the attitudes of the criminal case trial reported persons, so that the accuracy of attitudes classification of the criminal case trial reported persons is improved. Compared with the prior methods, the main evaluation index F1-score on the example intentional injury data set is optimal, and the effectiveness of the method for classifying attriting the attritions of the criminal cases in the court trial process is verified.
Drawings
FIG. 1 is a flow chart of a dataset.
Fig. 2 is a schematic diagram of a model of the present invention for conducting criminal case trial and appraisal electrocution prediction.
Detailed Description
The invention aims to provide a complete flow framework, namely a prediction model is formed to predict a situation of a criminal case trial person to be informed in a court trial process by the method, and the method belongs to the technical field of intelligent judicial and is used for predicting the situation of the criminal case trial person to be informed. The specific scheme of the invention is as follows: the criminal case trial and appraisal attitude prediction framework for the judicial court trial process comprises the following steps:
s1, acquiring and analyzing court trial data: by writing a crawler program, the HTML analysis tool is utilized to analyze useful data information in the webpage, and the processing method is as follows:
A1, screening a data range: data were collected from the Chinese referee paperwork (http:// wenshi. Curt. Gov. Cn/Index) and the Chinese court trial disclosure (http:// tingshen. Curt. Gov. Cn /). The Chinese referee document network and the Chinese court trial public network belong to the highest national court of the people's republic of China. The Chinese referee document network records referee documents of cases, and the Chinese court trial public network records videos in the court trial process. Because of the difference between geography and economy, the basic equipment of some local courts is different, so that the video quality of the court trial process obtained from the Chinese court trial public network is far lower than that of the judge document on the Chinese judge document network. If a case is recorded on both the Chinese court trial public network and the Chinese referee document network, a unique case ID number exists, and the ID number indicates that the document corresponds to the court trial process one by one. Because of the existence of some objective factors (such as equipment screen recording quality, audio input quality, background noise, local dialect, case sensitivity and the like), the resolution of partial videos is not high, and the faces of criminal case trial persons can not be seen clearly, so that information such as attitudes of criminal case trial persons can not be accurately analyzed, and the data are not added into a data set to be analyzed.
A2, data acquisition: and determining the acquired court trial case number, and crawling the information of the court trial case on the target court hold court trial case number according to the appointed court trial video and the judge document by writing a webpage crawler program.
A3, webpage analysis: analyzing the data in the crawled web page by means of BeautifulSoup tools, and analyzing related information of court trial cases, such as video and audio of court trial, corresponding judge documents and the like through HTML labels and attribute information.
S2, data normalization and label alignment: the method solves the problems of data selection, and labeling alignment and division of the data, and the specific processing mode is as follows:
B1, data selection: in order to ensure accuracy, the invention establishes strict data selection standards, and the data needs to meet strict conditions, such as definition, smoothness and intelligibility of video, whether to hear clearly speaking, and the like. In addition, the screened data is required to clearly show the facial expression of the criminal case trial interviewee and the standing or sitting posture of the criminal case trial interviewee. Finally, 68 courts are selected from thousands of courts nationwide, and 1510 data of the judge document matched with the judge video are finally screened as a preliminary data set of the invention.
B2, data processing: the data set is used for judging whether the attitudes of the appraised persons influence the criminal investigation. Based on the starting point, the invention clips the video data collected from the court trial network of China, so that the processed video data is changed into fragments, and the attitudes of the appraised persons of criminal case trial can be analyzed conveniently. According to the invention, the fragments of the criminal case trial person for speaking are selected, each fragment is about 20 seconds, and the fragments well show the facial expression, the speech expression and the body posture of the criminal case trial person. The present invention employs FFMPEG, which is a collection of libraries and tools for processing multimedia content (e.g., audio, video, subtitles, and related metadata), as a video cutting tool (https:// gitsub. Com/FFMPEG), which is widely used in various video analytics engineering as an important tool for video cutting, etc. Then the invention extracts the audio matched with the video clip as the audio data, wherein the separation work of the video and the audio also adopts FFMPEG. The invention extracts the audio as the input of another mode, and finally combines all the features. The invention processes the audio into WAV format to facilitate the subsequent processing, and finally extracts matched text from the audio, and stores the text information into TXT format to facilitate the subsequent processing.
B3, data marking: the video information is only used for analyzing the gesture actions of the criminal case trial advocates, the audio information is used for analyzing the intonation of the criminal case trial advocates, and the text information is used for analyzing the expressions of the criminal case trial advocates. After data processing, the invention requires three study students to watch video clips and carry out data labeling, the attitudes (good attitudes, medium attitudes and poor attitudes) of the criminal case trial reported persons are labeled, and the final result is determined according to voting. The method is characterized in that the consistency of the results of three annotators is required to be met, if the consistency of the annotation results cannot be met, the data are discarded (if one video clip annotation result is medium attitude, medium attitude and medium attitude, then the criminal case judgment in the court trial process is marked as medium attitude by an annunciator, and if the annotation result is good attitude, medium attitude and good attitude, then the piece of court trial clip data is discarded). Before the data is marked, all staff participating in marking carry out marking training through distributing marked examples, so that accuracy of marking results is guaranteed.
S3, extracting and expanding multi-mode information in a court trial process: and extracting and expanding three aspects of features, namely video, audio and text, corresponding to the court trial process by combining the ranked time sequence change features and the court trial process field features.
C1, extracting text features: feature extraction of text is performed using BERT. BERT has achieved the best effect on many NLP tasks, for each sentence L, the 768-dimensional sentence vector representation is obtained after processing, the invention merges the parameters of the last three layers of transformers, takes the average value thereof, and selects the CLS in front of each sentence as the representation of the sentence. Wherein the CLS is a special symbol, placed in front of the sentence, and because the CLS does not contain its own meaning, the CLS is considered to be expressed as sentence vector information of the sentence, and the semantic representation of the sentence is learned. The invention is used for trimming the text information of the invention by using the BERT-base-Chinese parameter. Each sentence is ultimately represented as a unique 768-dimensional vector.
C2, extracting audio characteristics: in order to obtain emotion intonation and other voice characteristics of the criminal case trial interviewee, the invention accessary punishment extracts audio characteristics from the speech of the criminal case trial interviewee. The present invention uses the currently popular audio extraction open source effort pyAudioAnalysis. The invention first loads the audio data and then the invention divides the input signal into short frames and calculates a number of features for each frame and then generates a sequence of feature vectors from the whole audio. The input is a time series signal, the sampling rate is 16000 Hz, the frame size is 50 milliseconds, the frame step size is 25 milliseconds (there is an overlap rate of 50%). Finally, the invention can obtain a Numpy matrix of 34 rows and 800 columns, where 800 is the eigenvector of the short-term frame of the input audio recording.
And C3, extracting video features: for video functions, the present invention uses PySlowFast framework to extract visual functions from video with pre-trained SlowFast models. pySlowFast is an open source video understanding code library of FAIR, providing the latest video classification model. The present invention sets the height and width of each extracted frame to 256 x 256 and sets 32 sampling windows in the center of each frame. Finally, the present invention calculates the average value of d v = 2043-dimensional feature vector U i obtained per frame, with a frame number per second of 30 FPS.
S4, predicting attitudes of the criminal case trial reported persons by using multi-mode information: and (3) carrying out experiments on the constructed court trial process data set by using a multi-mode method in deep learning, and predicting the attitudes of the trial subjects of the criminal cases.
D1, processing the sample data through S2, wherein the obtained data is in the following form:
The data set includes: the system comprises video, audio, texts, judge documents and labeling information (attitude labels), wherein the video comprises 1-3 video segments for judging the situation of a criminal, the video segments are used for speaking by a person to be told, each video segment is about 20s, and the audio information, the text information corresponding to the audio information, the judge documents of the situation of the criminal, and finally the labeling information of the person to be told for judging the criminal are extracted from the video.
D2, multi-mode information fusion: it takes advantage of the features of the three modalities as inputs (video, audio, text). Text features are extracted from the Bert CLS, which is 768-dimensional vector, which the present invention retains for fine tuning. The audio feature is a matrix of 34 rows and 800 columns, which becomes 768-dimensional feature vectors after being flattened by the MLP. The video feature is a 2304-dimensional vector, which the present invention also applies MLP to convert to 768-dimensional vectors. The present invention combines these three dimensions into 768-dimensional vectors and concatenates them into 2304-dimensional vectors. After passing through the MLP, the invention classifies attitudes of the criminal case trial interviewees.
D3, evaluation indexes: the evaluation index adopted in the experiment is an F1 value, and the F1 value is defined as follows: in order to evaluate the advantages and disadvantages of different algorithms, the concept of F1 value is provided on the basis of Precision and Recall, and the whole evaluation of the Precision and the Recall is realized. F1 is defined as follows: f1 value = correct rate x recall rate x 2/(correct rate + recall rate), with higher accuracy rates better and higher recall rates better.
In one embodiment, the following preferred embodiment of the present invention is described, and in order to minimize the judicial deviation, the data set constructed by the present invention is selected with an intentional injury type, so far, on the Chinese referee paperwork net (wenchu. Court. Go. Cn), the total number of paperwork is 9,804,023, and the injury crime is 2,128,382, accounting for 21.7% of the total number of cases. The total number of intentional injuries accounts for the first three of the total case numbers, and the invention considers that the number can effectively avoid selection bias. In order to ensure the accuracy of classification, the invention screens data strictly. Stringent condition limitations are imposed on the quality of the data, including, for example, the sharpness, smoothness, intelligibility of the video, whether a clear speech is audible, etc. In addition, it is also required that facial expressions of the criminal case trial signee and body gestures of the criminal case trial signee can be completely captured. And then, through a strict labeling process, three researchers are introduced to watch the court trial fragment and carry out data labeling on the part, the attitudes (good attitudes, medium attitudes and poor attitudes) of the criminal case trial interviewees are labeled, the final result is obtained according to voting, the results of the three markers are required to be consistent, and if the condition that the labeling results are consistent cannot be met, the data are discarded.
Finally, 68 courts are selected from thousands of courts nationwide, 517 multi-mode intentional injury data are obtained through final screening, and the data are divided into a training set and a testing set, wherein the training set is 417 cases, the testing set is 100 cases, and the specific distribution is shown in table 1.
Table 1 size of data set
The criminal case trial reported person attitude prediction model classification method in the court trial process specifically comprises the following steps:
1. After the data set is screened, the invention carries out the finish processing of data from a plurality of mode dimensions by utilizing the technologies of audio processing, text processing and video processing on the videos of the criminal case trial and the judicial court trial process of the appraised person, and carries out the corresponding text technology processing on the judgment document matched with the videos of the court trial process.
2. According to the prior study, the relationship between the attitude of the person and the standing or sitting posture of the person is compact, and the invention focuses on analyzing the attitude of the person to be diagnosed in the court, including such as head-up or head-down, scattered standing or normative, and the like. To achieve this goal, the present invention introduces PySlowFast framework to extract visual functions from a SlowFast video model with pre-training. pySlowFast is an open source video understanding code library of FAIR, providing the latest video classification model. The present invention sets the height and width of each extracted frame to 256 x 256 and sets 32 sampling windows in the center of each frame. Finally, the present invention calculates the average value of d v = 2043-dimensional feature vector U i obtained per frame, with a frame number per second of 30 FPS.
3. Meanwhile, the invention analyzes the voice intonation of the criminal case trial interviewee, and the tone is closely related to the attitude. In order to obtain the voice intonation and other voice characteristics of the criminal case trial interviewee, the invention accessary punishment extracts the audio characteristics in the talking process of the criminal case trial interviewee. The present invention uses the currently popular audio extraction open source effort pyAudioAnalysis. The invention first loads the audio data and then divides the input signal into short frames and calculates a number of features for each frame and then generates a sequence of feature vectors from the whole audio. The input is a time series signal, the sampling rate is 16000 Hz, the frame size is 50 milliseconds, the frame step size is 25 milliseconds (there is an overlap rate of 50%). Finally, the invention can obtain a Numpy matrix of 34 rows and 800 columns, where 800 is the eigenvector of the short-term frame of the input audio recording.
4. In text processing, the invention performs fine adjustment based on the Bert, and uses the Bert to perform characteristic extraction of the text. For each sentence L, 768-dimensional sentence vector representation is obtained after processing, the invention fuses parameters of the transformation former layers of the last three layers, takes an average value of the parameters, and selects the CLS in front of each sentence as the representation of the sentence. Wherein the CLS is a special symbol, placed in front of the sentence, and because the CLS does not contain its own meaning, the CLS is considered to be expressed as sentence vector information of the sentence, and the semantic representation of the sentence is learned. The invention is used for trimming the text information of the invention by using the BERT-base-Chinese parameter. Each sentence is ultimately represented as a unique 768-dimensional vector.
5. The invention fully fuses the characteristics, fuses the multi-modal information to analyze the criminal case trial appraised person, fully utilizes the multi-modal information, and realizes the aims of no information loss and accurate analysis. The method adopts flattening operation for the audio characteristic, then makes the dimension of the audio characteristic become 768-dimensional vector through a neural network, and samples the video with the frame number of 30 FPS per second and the dimension of 2043 dimension are taken for the video, and then the average of the audio characteristic is taken by the video characteristic, and the dimension of the video characteristic becomes 768-bit vector through the neural network. Since the text information itself is 768 dimensions, no change is made to it. Finally, the invention performs splicing operation on information in three dimensions, which is expressed as a characteristic vector of the attitudes of the criminal case trial subjects in the court trial process, the vector is sent into a neural network by the same sample research, the final result is a probability value of attitudes prediction of the criminal case trial subjects, and a logistic regression classifier (logistic regression) with a softmax function is used as a classifier of candidate entities, and the output of the classifier is outputRepresenting the probability distribution of candidate entities over each of the different relationship class labels, where W s is the matrix learned and b s is the bias vector learned. The invention selects the category with the highest probability as the attitudes of the criminal case trial and appraisal subjects, and the specific flow is shown in a second diagram. Notably, the criminal case trial interviewee attitude prediction model in the court trial process is an end-to-end model, so that the workload can be effectively reduced, and the working efficiency of criminal case trial interviewee attitude prediction is improved.
In order to verify the effectiveness of the present invention, the following methods were chosen for comparison:
Majority principle (Majority): this benchmark assigns all instances to well-formed classes.
The random principle: the method randomly generates a predictive label, tests the predictive label on a test set, then performs 20 times and takes an average value.
Document prediction: the baseline uses decision books as input to distinguish attitudes of criminal case trial victims.
Single channel prediction: this reference uses information of each modality as input, respectively.
The evaluation index is marked as F1-score and is a standard evaluation index frequently adopted in relation extraction in the text field, and the definition is as follows:
Where micro_p represents precision, micro_r represents recall (recall), TP i (true positives, true positive) represents the number of positive examples in the positive examples predicted by the classifier as the class i relationship, FP i (false positives, false positive) represents the number of negative examples in the positive examples predicted by the classifier as the class i relationship, and FN i (FALSE NEGATIVES, false negative) represents the number of positive examples in the class i relationship in the positive examples predicted by the classifier as the negative examples. The precision micro_P and the recall micro_R respectively consider the precision and the recall of the algorithm. However, these two metrics do not more fully reflect the performance of a classification system, and therefore, the overall performance of an algorithm is typically evaluated using the Micro_F value, which serves as a balance between accuracy Micro_P and recall Micro_R.
The experimental results are shown in table 2:
table 2: the F1 values of the various methods, which are compared with the schemes of the present invention.
The experimental results show that the paperwork predicts the lowest performance, and the benchmark only gives an F1 value of 18.25%. The invention discovers that the model can not be converged and the effect is not good when only the document is used as the input for judging the attitudes of the judgment subjects of criminal cases. Analysis shows that there is no subjective record of the attitudes of the judgment subjects in the judicial text in order to ensure objectivity. Therefore, the document prediction-based method cannot effectively capture attitudes of criminal case trial victims. The reason why the audio feature results are low is that the feature obtained by audio is time series, and the operation by MLP after simple flattening may lose the timing information, resulting in poor effect. The video processing pre-training function provides the best performance in the modality information. Since video is a series of spatial snapshots of the real world. The video function captures more gestures of the criminal case trial interviewee, and the gestures reflect the attitudes of the criminal case trial interviewee to a great extent. If the standing posture of the criminal case trial and appraised person is diffused when speaking, the misidentification attitude of the criminal case trial and appraised person is shown to be poor to a great extent, and the information can be intuitively captured from the video to assist attitude prediction. The method which fuses the three modal features achieves the best effect, and also verifies the effectiveness of the multi-modal criminal case trial and appraisal court trial attitude prediction framework.
The above is a further detailed description of the invention in connection with specific preferred embodiments, and it is not to be construed as limiting the practice of the invention to these descriptions. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (4)

1. The method for predicting attitudes of criminal case trial reported persons in court trial process is characterized by comprising the following steps:
s1, acquiring and analyzing court trial data;
S2, data are regular and labeling is aligned;
s3, extracting and expanding multi-mode information in a court trial process;
S4, predicting attitudes of criminal case trial reported persons by using multi-mode information, wherein the attitudes of the criminal case trial reported persons comprise good attitudes, medium attitudes and poor attitudes, three researchers watch video clips and carry out data annotation, the attitudes of the criminal case trial reported persons are marked, and a final result is determined according to voting; the result of the three annotators is required to be consistent, and if the result of the annotating cannot be consistent, the data is discarded;
The step S4 specifically comprises the following steps:
D1, after processing the data in step S2, the obtained data set includes: video, audio, text, referee document, annotation information;
D2, multi-mode information fusion: the method comprises the steps that characteristics of three modes of video, audio and text are used as input, the text characteristics are extracted from Bert CLS, the characteristics are 768-dimensional vectors, the audio characteristics are 34 rows and 800 columns of matrixes, the matrixes are changed into 768-dimensional characteristic vectors after being flattened by MLP, the video characteristics are 2304-dimensional vectors, the vectors are converted into 768-dimensional vectors by MLP, the three dimensions are combined into 768-dimensional vectors and are connected in series into 2304-dimensional vectors, and after the vectors pass through MLP, the attitudes of a criminal case trial person are classified;
D3, evaluation indexes: the evaluation index used was the F1 value, which was defined as follows: f1 value = correct rate x recall rate x 2/(correct rate + recall rate).
2. The method for predicting attrition of criminal case trial subjects in court trial process as claimed in claim 1, wherein the step S1 is the acquisition and analysis of court trial data, and specifically comprises the following steps:
A1. data range screening: data are collected from a Chinese referee document network (http:// wenshi. Curt. Gov. Cn/Index) and a Chinese court trial public network (http:// tingshen. Curt. Gov. Cn /).
A2. and (3) data acquisition: determining the acquired court trial case number, and crawling information of the court trial case on the target court hold court trial case number according to the appointed court trial video and the judge document by writing a webpage crawler program;
A3. webpage analysis: analyzing the data in the crawled web page by means of BeautifulSoup tools, and analyzing the related information of the court trial cases through the HTML tags and the attribute information.
3. The method for predicting attrition of criminal cases trial and appraisal in court trial process as claimed in claim 1, wherein the step s2 is regular in data and alignment in labels, and specifically comprises the following steps:
B1. Selecting data;
B2. And (3) data processing: editing the collected video data, extracting the audio matched with the video clip to serve as audio data, extracting matched words from the audio, and storing text information in a TXT format;
B3. And (3) data marking: and a plurality of annotators watch the video clips and carry out data annotation, and the final result of the attrition of the annotating criminal case judgment reported person is obtained according to the votes of the annotators.
4. The method for predicting attrition of criminal cases trial subjects in court trial process as claimed in claim 1, wherein the step S3 is extraction and expansion of multi-mode information in the court trial process, and specifically comprises the following steps:
C1. Text feature extraction: extracting the characteristics of the text by using BERT, for each sentence L, obtaining 768-dimensional sentence vector representation after processing, fusing parameters of a transformation former layer of the last three layers, taking an average value of the parameters, selecting a CLS in front of each sentence as a representation of the sentence, wherein the CLS is a special mark symbol and is placed in front of the sentence, and considering the CLS as sentence vector information of the sentence because the CLS does not contain own meaning, learning semantic representation of the sentence, fine-adjusting the text information by using BERT-base-Chinese parameters, and finally, representing each sentence as a unique 768-dimensional vector;
C2. Extracting audio characteristics: accessary punishment the case judges the audio feature is extracted in the way of the speaking of the person to be told, the audio is used for extracting the open source work pyAudioAnalysis, firstly, the audio data is loaded, then, the input signal is divided into short frames, a plurality of features are calculated for each frame, then, a sequence of feature vectors is generated from the whole audio, the sequence is input as a time sequence signal, the sampling rate is 16000 Hz, the frame size is 50 milliseconds, the frame step size is 25 milliseconds, and finally, a Numpy matrix of 34 rows and 800 columns is obtained, wherein 800 is the feature vector of the short-term frame of the input audio record;
C3. Video feature extraction: the PySlowFast frame was used to extract visual functions from video with pre-trained SlowFast model, the height and width of each extracted frame was set to 256 x 256, and 32 sampling windows were set in the center of each frame, and the average value of the d v =2043-dimensional feature vector U i obtained per frame was calculated, with the number of frames per second being 30 FPS.
CN202011103541.9A 2020-10-15 2020-10-15 Method for predicting attitudes of criminal case trial and appraisal persons in court trial process Active CN112597271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011103541.9A CN112597271B (en) 2020-10-15 2020-10-15 Method for predicting attitudes of criminal case trial and appraisal persons in court trial process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011103541.9A CN112597271B (en) 2020-10-15 2020-10-15 Method for predicting attitudes of criminal case trial and appraisal persons in court trial process

Publications (2)

Publication Number Publication Date
CN112597271A CN112597271A (en) 2021-04-02
CN112597271B true CN112597271B (en) 2024-04-26

Family

ID=75180614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011103541.9A Active CN112597271B (en) 2020-10-15 2020-10-15 Method for predicting attitudes of criminal case trial and appraisal persons in court trial process

Country Status (1)

Country Link
CN (1) CN112597271B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018112134A2 (en) * 2016-12-15 2018-06-21 Analytic Measures Inc. Computer automated method and system for measurement of user energy, attitude, and interpersonal skills
CN110889786A (en) * 2019-12-02 2020-03-17 北明软件有限公司 Legal action insured advocate security use judging service method based on LSTM technology

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018112134A2 (en) * 2016-12-15 2018-06-21 Analytic Measures Inc. Computer automated method and system for measurement of user energy, attitude, and interpersonal skills
CN110889786A (en) * 2019-12-02 2020-03-17 北明软件有限公司 Legal action insured advocate security use judging service method based on LSTM technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于支持向量机的嫌疑人特征预测;李荣岗;孙春华;姬建睿;;计算机工程;20171115(第11期);全文 *

Also Published As

Publication number Publication date
CN112597271A (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN110147726B (en) Service quality inspection method and device, storage medium and electronic device
US10108709B1 (en) Systems and methods for queryable graph representations of videos
CN110728997A (en) Multi-modal depression detection method and system based on context awareness
CN109495766A (en) A kind of method, apparatus, equipment and the storage medium of video audit
US11950020B2 (en) Methods and apparatus for displaying, compressing and/or indexing information relating to a meeting
Şen et al. Multimodal deception detection using real-life trial data
KR102353545B1 (en) Method and Apparatus for Recommending Disaster Response
US20230177835A1 (en) Relationship modeling and key feature detection based on video data
CN113688635B (en) Class case recommendation method based on semantic similarity
CN112287175B (en) Video highlight segment prediction method and system
CN112765974B (en) Service assistance method, electronic equipment and readable storage medium
KR20200052412A (en) Artificial intelligence employment system and employing method of thereof
CN109101883B (en) Depression tendency evaluation device and system
US11238289B1 (en) Automatic lie detection method and apparatus for interactive scenarios, device and medium
CN110705523B (en) Entrepreneur performance evaluation method and system based on neural network
CN112597271B (en) Method for predicting attitudes of criminal case trial and appraisal persons in court trial process
Triantafyllou et al. V-GRAFFER, a system for Visual GRoup AFFect Recognition, Part I: Foundations
CN112560811B (en) End-to-end automatic detection research method for audio-video depression
Galvan et al. Audiovisual affect recognition in spontaneous filipino laughter
Nyhuis et al. Automated video analysis for social science research 1
CN113033536A (en) Work note generation method and device
Hsiao et al. A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training program.
CN110674269A (en) Cable information management and control method and system
KR102671618B1 (en) Method and system for providing user-customized interview feedback for educational purposes based on deep learning
Sharma et al. Classroom student emotions classification from facial expressions and speech signals using deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant