CN116596709B

CN116596709B - Auxiliary judging method, device, equipment and storage medium

Info

Publication number: CN116596709B
Application number: CN202310886908.6A
Authority: CN
Inventors: 王敏
Original assignee: Beijing Babel Technology Co ltd
Current assignee: Beijing Babel Technology Co ltd
Priority date: 2023-07-19
Filing date: 2023-07-19
Publication date: 2024-02-06
Anticipated expiration: 2043-07-19
Also published as: CN116596709A

Abstract

The invention relates to the field of audio detection and discloses a method, a device and equipment for auxiliary judgment based on court trial audio and a storage medium. The method comprises the following steps: acquiring trial recording data of a case to be judged, a case trial type and trial dialect data, and performing spectrum conversion on the trial recording data according to preset segmentation parameters to obtain multi-frame recording spectrum fragments; extracting spectral feature vectors corresponding to the spectral segments of the sound recording, and carrying out attribution clustering of court trial roles on the spectral feature vectors based on the court trial types of the cases to obtain clustering results; extracting court trial features of each court trial role from the clustering result to obtain court trial features, and constructing recording logic data of the to-be-judged case based on the court trial features; based on the court trial dialect data and the recording logic data, the legal characteristics of the to-be-judged case are matched, and an auxiliary judgment result is generated. The method and the device improve the accuracy of extracting the relevant features from the court trial record data by the online court so as to generate more accurate trial results.

Description

Auxiliary judging method, device, equipment and storage medium

Technical Field

The invention relates to the field of audio detection, in particular to a method, a device, equipment and a storage medium for auxiliary judgment based on court trial audio.

Background

Along with the development of social economy and the improvement of the living standard of people, the economic and social interaction between people is increased, and various disputes are inevitably caused among a part of people while the interaction between people is increased, so that a large number of civil judgment cases are generated. In order to better protect legal rights of infringers, a large number of civil judgment cases need to be judged in time, but due to the fact that the cases are often complicated, more related information of the cases, judgment timeliness requirements and the like, all cases cannot be processed in time even if the judgment personnel carry out a shift adding point, and due to time limitation, certain flaws may exist in the judgment results of part of the cases.

At present, part of civil affair trial cases are judged in an automatic trial mode, so that the progress of case trial is accelerated, and the workload of trial personnel is reduced. However, since the civil trial cases often have audio data such as recording and video, the audio data needs to be detected and relevant detection data needs to be extracted, but an automatic trial mode cannot well detect and distinguish audio features of different people in the audio data and extract case features, so that a final trial result is inaccurate, satisfaction of both case parties on the trial result is poor, namely, detection of relevant voice evidence and extraction of relevant features in automatic court trial of an existing court are poor, so that an auxiliary trial result is inaccurate.

Disclosure of Invention

The invention mainly aims to solve the problem that the auxiliary judgment result is inaccurate due to the fact that the detection of relevant voice evidence and the extraction of relevant features are poor in the automatic court trial of the existing online court.

The first aspect of the invention provides a method for assisting in trial based on court trial audio, which comprises the following steps: acquiring trial recording data, trial type and trial dialect data of a case to be judged, and performing spectrum conversion on the trial recording data according to preset segmentation parameters to obtain multi-frame recording spectrum fragments; extracting spectral feature vectors corresponding to the recording spectral fragments, and carrying out attribution clustering of court trial roles on the spectral feature vectors based on the case court trial types to obtain a clustering result; extracting court trial features of each court trial role from the clustering result to obtain a plurality of court trial features, and constructing recording logic data of the to-be-trial case based on the court trial features; and matching the rule features of the to-be-judged case based on the court trial dialect data and the recording logic data, and generating an auxiliary judgment result.

Optionally, in a first implementation manner of the first aspect of the present invention, the performing spectral transformation on the court trial audio record data according to a preset segmentation parameter to obtain multi-frame audio record spectral segments includes: performing time domain waveform conversion on the court trial recording data to obtain time domain court trial recording data, and performing time-frequency conversion on the time domain court trial recording data to obtain frequency domain court trial recording data; and carrying out multi-frame segmentation on the court trial recording data of the frequency domain according to preset segmentation parameters to obtain multi-frame recording frequency spectrum fragments.

Optionally, in a second implementation manner of the first aspect of the present invention, the extracting a spectral feature vector corresponding to each recording spectral slice includes: performing covariance matrix calculation on each recording frequency spectrum segment to obtain a frame rate characteristic value and a frame rate characteristic vector corresponding to each recording frequency spectrum segment, and sequencing the frame rate characteristic values from small to large according to the sequence from large to small to obtain sequenced frame rate characteristic values; sequentially selecting frame rate feature vectors corresponding to a preset number of frame rate feature values based on the sequenced frame rate feature values to obtain selected frame rate feature vectors, and performing rotary transformation on each recording frequency spectrum segment by using the selected frame rate feature vectors to obtain the recording frequency spectrum segments with reduced dimensions; and extracting audio dynamic characteristics from each dimension-reduced recording frequency spectrum segment to obtain frequency spectrum characteristic vectors.

Optionally, in a third implementation manner of the first aspect of the present invention, the clustering method includes performing attribution clustering of the court trial roles by using each of the spectral feature vectors to obtain a clustering result, where the clustering method includes: determining a plurality of court trial roles and the number of corresponding court trial roles in the case to be judged, and determining the number of vectors corresponding to the frequency spectrum feature vectors; constructing an initial clustering matrix by utilizing the number of the court trial roles and the number of the vectors, calculating the similarity distance between the frequency spectrum feature vectors, and constructing a distance measurement matrix based on the similarity distance; and calculating a mean center vector corresponding to each spectrum feature vector by using the initial clustering matrix, and carrying out clustering update on each spectrum feature vector in the initial clustering matrix by using the mean center vector and the distance measurement matrix until the convergence degree of the initial clustering matrix is lower than a preset convergence threshold value to obtain a clustering result.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the extracting the court trial features of each of the court trial roles from the clustering result to obtain a plurality of court trial features includes: clustering and dividing each clustering result according to the court trial roles to obtain a dividing result; performing audio noise reduction processing on the division result to obtain a noise-reduced division result; and extracting dialect features from the noise-reduced division results based on the court trial type of the to-be-checked case, and obtaining the court trial features corresponding to each court trial role.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the constructing, based on the court trial feature, recording logic data of the pending case piece includes: calculating the audio similarity of each court trial feature based on the court trial features; performing logic association degree calculation on the audio similarity by using a preset logic calculation model, and constructing a dialect logic sequence corresponding to each court trial role based on a calculation result; and constructing the recording logic data of the to-be-judged case by utilizing the court trial characteristics of each court trial character based on the dialect logic sequence.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the generating an auxiliary trial result based on the court trial dialect data and the recording logic data and matching the rule feature of the case to be trial includes: extracting a plurality of key court trial features in the court trial dialect data and the recording logic data based on a preset court trial logic sequence; and matching a plurality of legal characteristics of the to-be-judged case based on the key court trial characteristics, and generating an auxiliary judging result corresponding to the court trial role based on the legal characteristics.

The second aspect of the present invention provides an auxiliary trial device based on court trial audio, the auxiliary trial device based on court trial audio comprising: the frequency spectrum conversion module is used for acquiring court trial recording data, the court trial type and the court trial dialect data of the to-be-judged case, and carrying out frequency spectrum conversion on the court trial recording data according to preset segmentation parameters to obtain multi-frame recording frequency spectrum fragments; the clustering module is used for extracting spectral feature vectors corresponding to the recording spectral fragments, and carrying out attribution clustering of court trial roles on the spectral feature vectors based on the case court trial types to obtain a clustering result; the logic construction module is used for extracting court trial features of each court trial role from the clustering result to obtain a plurality of court trial features, and constructing recording logic data of the to-be-checked case piece based on the court trial features; and the rule matching module is used for matching the rule characteristics of the to-be-judged case based on the court trial dialect data and the recording logic data and generating an auxiliary judgment result.

Optionally, in a first implementation manner of the second aspect of the present invention, the spectrum transformation module includes: the time-frequency conversion unit is used for performing time domain waveform conversion on the court trial recording data to obtain time domain court trial recording data, and performing time-frequency conversion on the time domain court trial recording data to obtain frequency domain court trial recording data; and the multi-frame segmentation unit is used for carrying out multi-frame segmentation on the court trial recording data of the frequency domain according to preset segmentation parameters to obtain multi-frame recording frequency spectrum fragments.

Optionally, in a second implementation manner of the second aspect of the present invention, the clustering module includes: the matrix calculation unit is used for calculating covariance matrixes of the recording frequency spectrum fragments to obtain frame rate characteristic values and frame rate characteristic vectors corresponding to the recording frequency spectrum fragments, and sequencing the frame rate characteristic values from small to large according to the sequence from large to small to obtain sequenced frame rate characteristic values; the rotation transformation unit is used for sequentially selecting frame rate feature vectors corresponding to a preset number of frame rate feature values based on the sequenced frame rate feature values to obtain selected frame rate feature vectors, and performing rotation transformation on each recording frequency spectrum segment by utilizing the selected frame rate feature vectors to obtain the recording frequency spectrum segments after dimension reduction; and the dynamic extraction unit is used for extracting the audio dynamic characteristics of each dimension-reduced recording frequency spectrum segment to obtain frequency spectrum characteristic vectors.

Optionally, in a third implementation manner of the second aspect of the present invention, the clustering module further includes: the number determining unit is used for determining a plurality of court trial roles in the case to be judged and the number of corresponding court trial roles and determining the number of vectors corresponding to the frequency spectrum feature vectors; the distance calculation unit is used for constructing an initial clustering matrix by utilizing the number of the court trial roles and the number of the vectors, calculating the similarity distance between the spectrum feature vectors and constructing a distance measurement matrix based on the similarity distance; and the clustering updating unit is used for calculating a mean center vector corresponding to each spectrum feature vector by using the initial clustering matrix, and carrying out clustering updating on each spectrum feature vector in the initial clustering matrix by using the mean center vector and the distance measurement matrix until the convergence degree of the initial clustering matrix is lower than a preset convergence threshold value to obtain a clustering result.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the logic building module includes: the clustering division unit is used for carrying out clustering division on each clustering result according to the court trial roles to obtain division results; the audio noise reduction unit is used for carrying out audio noise reduction on the division result to obtain a noise-reduced division result; and the dialect extracting unit is used for extracting dialect features of the noise-reduced division result based on the court trial type of the to-be-checked case, so as to obtain the court trial features corresponding to each court trial role.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the logic building module further includes: the similarity calculation unit is used for calculating the audio similarity of each court trial feature based on the court trial features; the association degree calculation unit is used for carrying out logic association degree calculation on the audio frequency similarity by using a preset logic calculation model and constructing a dialect logic sequence corresponding to each court trial role based on a calculation result; the logic construction unit is used for constructing the recording logic data of the to-be-judged case by utilizing the court trial characteristics of each court trial role based on the dialect logic sequence.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the rule matching module includes: the feature extraction unit is used for extracting a plurality of key court trial features in the court trial dialect data and the recording logic data based on a preset court trial logic sequence; and the rule matching unit is used for matching a plurality of rule features of the to-be-judged case based on the key court trial features and generating an auxiliary judging result corresponding to the court trial role based on the rule features.

The third aspect of the invention provides auxiliary trial equipment based on court trial audio, which comprises the following components: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the trial audio-based auxiliary trial device to perform the steps of the trial audio-based auxiliary trial method described above.

A fourth aspect of the present invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the steps of the above-described trial audio-based auxiliary trial method.

According to the technical scheme provided by the invention, the court trial recording data of the case to be judged, the court trial type and the court trial dialect data are obtained, and the frequency spectrum of the court trial recording data is converted according to preset segmentation parameters to obtain multi-frame recording frequency spectrum fragments; extracting spectral feature vectors corresponding to the spectral segments of the sound recording, and carrying out attribution clustering of court trial roles on the spectral feature vectors based on the court trial types of the cases to obtain clustering results; extracting court trial features of each court trial role from the clustering result to obtain a plurality of court trial features, and constructing recording logic data of a case to be judged based on the court trial features; based on the court trial dialect data and the recording logic data, the legal characteristics of the to-be-judged case are matched, and an auxiliary judgment result is generated. Compared with the prior art, the method and the device have the advantages that the speaking characteristic data of the court trial roles required in the audio data are extracted through the audio data submitted by the relevant court trial roles in the online court, the logic data stated in the audio data by each court trial role are constructed, and then the corresponding rule characteristics are matched based on the logic data of each court trial role and the court trial protection data, so that the auxiliary judging result of each court trial role is generated. The method and the device realize the detection and extraction of the characteristic data of related personnel in the audio evidence data submitted by court trial personnel, so as to improve the accuracy of the on-line court trial characteristic extraction in the court trial record data, and generate more accurate trial results.

Drawings

Fig. 1 is a schematic diagram of a first embodiment of a method for assisting trial based on a court trial audio according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a second embodiment of a method for trial audio-based auxiliary judgment in an embodiment of the present invention;

FIG. 3 is a schematic diagram of a third embodiment of a method for assisting trial based on audio in an embodiment of the present invention;

FIG. 4 is a schematic diagram of an embodiment of a secondary trial device based on audio of a court trial according to an embodiment of the present invention;

FIG. 5 is a schematic view of another embodiment of a secondary trial device based on audio of a court trial according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an embodiment of a secondary trial device based on a court trial audio according to an embodiment of the present invention.

Description of the embodiments

The embodiment of the invention provides a method, a device, equipment and a storage medium for auxiliary judgment based on court trial audio, wherein the method comprises the following steps: acquiring trial recording data of a case to be judged, a case trial type and trial dialect data, and performing spectrum conversion on the trial recording data according to preset segmentation parameters to obtain multi-frame recording spectrum fragments; extracting spectral feature vectors corresponding to the spectral segments of the sound recording, and carrying out attribution clustering of court trial roles on the spectral feature vectors based on the court trial types of the cases to obtain clustering results; extracting court trial features of each court trial role from the clustering result to obtain court trial features, and constructing recording logic data of the to-be-judged case based on the court trial features; based on the court trial dialect data and the recording logic data, the legal characteristics of the to-be-judged case are matched, and an auxiliary judgment result is generated. The method and the device improve the accuracy of extracting the relevant features from the court trial record data by the online court so as to generate more accurate trial results.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For easy understanding, the following describes a specific flow of an embodiment of the present invention, referring to fig. 1, and a first embodiment of a method for auxiliary judgment based on a court trial audio in the embodiment of the present invention includes:

101. acquiring trial recording data of a case to be judged, the type of the case trial and the data of the trial, and performing spectrum conversion on the trial recording data according to preset segmentation parameters to obtain multi-frame recording spectrum fragments;

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

In this embodiment, the court trial audio data herein refers to related case roles (i.e. case statement parties) of an online court in the court trial process, and provides related audio data for court trial evidence (where the audio data herein refers to audio data recorded by the case statement party itself or others during the recording event occurrence period or when the two parties coordinate, etc.); the case trial type refers to the type of the case to be checked (such as civil cases, economic cases, family cases, labor disputed cases and the like) in the current online court; the court trial dialect data refers to related data of statements of original interviewees, judges and the like in the court trial process; the segmentation parameters herein refer to parameters such as frame length (frame number), frame shift, and cluster number, and the present application describes the frame number as an example.

In practical application, by acquiring recorded audio evidence data provided by each party of a case in a court trial process of an online court, acquiring court trial type of the current court trial case and court trial dialect data stated by all relevant case personnel after the case primary court trial is finished, and performing corresponding detection processing on the court trial recording data, namely performing time domain waveform conversion on the court trial recording data to obtain time domain court trial recording data, and performing time-frequency conversion on the time domain court trial recording data to obtain frequency domain court trial recording data; and then, according to preset segmentation parameters, carrying out multi-frame segmentation on the frequency domain court trial recording data to obtain multi-frame recording frequency spectrum fragments.

102. Extracting spectral feature vectors corresponding to the spectral segments of the sound recording, and carrying out attribution clustering of court trial roles on the spectral feature vectors based on the court trial types of the cases to obtain clustering results;

in this embodiment, the spectral feature vector herein refers to a vector for describing relevant audio features in the corresponding audio spectral segments; the attribution clustering refers to clustering the recording frequency spectrum segments corresponding to different frequency spectrum feature vectors according to the correlation of the corresponding audio data.

In practical application, calculating covariance matrixes of the recording frequency spectrum fragments to obtain frame rate characteristic values and frame rate characteristic vectors corresponding to the recording frequency spectrum fragments, and sequencing the frame rate characteristic values from small to large according to the sequence from large to small to obtain sequenced frame rate characteristic values; further, based on the sequenced frame rate characteristic values, frame rate characteristic vectors corresponding to a preset number of frame rate characteristic values are sequentially selected to obtain selected frame rate characteristic vectors, rotation transformation is carried out on each recording frequency spectrum segment by using the selected frame rate characteristic vectors to obtain recording frequency spectrum segments after dimension reduction, and then extraction of audio dynamic characteristics is carried out on each recording frequency spectrum segment after dimension reduction to obtain frequency spectrum characteristic vectors; further determining a plurality of court trial roles in the case to be judged and the number of corresponding court trial roles, and determining the number of vectors corresponding to the frequency spectrum feature vectors; constructing an initial clustering matrix by utilizing the number of the court trial roles and the number of the vectors, calculating the similarity distance between each spectrum feature vector, and constructing a distance measurement matrix based on each similarity distance; and clustering and updating each spectrum feature vector in the initial clustering matrix by using the average value center vector corresponding to each spectrum feature vector and the average value center vector and the distance measurement matrix until the convergence degree of the initial clustering matrix is lower than a preset convergence threshold value to obtain a clustering result.

103. Extracting court trial features of each court trial role from the clustering result to obtain a plurality of court trial features, and constructing recording logic data of a case to be judged based on the court trial features;

in this embodiment, the court trial roles herein refer to each person included in both parties in the original notice and stakeholders corresponding to the relevant persons; the court trial features refer to voice feature data related to the current court trial cases and used for case trial; the recording logic data refers to logic development sequence and corresponding data of related personnel disputed events in the whole recording audio data, wherein the logic development sequence is related to court trial case statements in the related recording audio data.

In practical application, clustering and dividing are carried out on each clustering result according to court trial roles to obtain division results, audio noise reduction processing is carried out on the division results to obtain noise reduction division results, and then dialect features corresponding to each court trial role are obtained by extracting dialect features of the noise reduction division results based on the court trial type of the case to be judged. Further, based on the court trial features, calculating the audio similarity of each court trial feature, performing logic association calculation on each audio similarity by using a preset logic calculation model, and constructing a dialect logic sequence corresponding to each court trial role based on the calculated result; based on the dialect logic sequence, the recording logic data of the to-be-judged case is constructed by utilizing the court trial characteristics of each court trial role.

104. Based on the court trial dialect data and the recording logic data, the legal characteristics of the to-be-judged case are matched, and an auxiliary judgment result is generated.

In this embodiment, the rule feature refers to feature information related to rules applicable to trial results in a court trial process (i.e. attributes or features having unique identification and important roles in the rule and law text, such as trial type feature, rule application range feature, rule fine rule and penalty measure feature (such as administrative penalty, criminal penalty, civil reimbursement, etc.).

In practical application, a plurality of key court trial features in court trial dialect data and recording logic data are extracted based on a preset court trial logic sequence, a plurality of rule features of a case to be judged are matched based on the key court trial features, and an auxiliary judging result corresponding to a court trial role is generated based on the rule features.

In the embodiment of the invention, the multi-frame recording frequency spectrum fragment is obtained by acquiring the court trial recording data, the court trial type and the court trial dialect data of the case to be judged and carrying out frequency spectrum conversion on the court trial recording data according to the preset segmentation parameters; extracting spectral feature vectors corresponding to the spectral segments of the sound recording, and carrying out attribution clustering of court trial roles on the spectral feature vectors based on the court trial types of the cases to obtain clustering results; extracting court trial features of each court trial role from the clustering result to obtain a plurality of court trial features, and constructing recording logic data of a case to be judged based on the court trial features; based on the court trial dialect data and the recording logic data, the legal characteristics of the to-be-judged case are matched, and an auxiliary judgment result is generated. Compared with the prior art, the method and the device have the advantages that the speaking characteristic data of the court trial roles required in the audio data are extracted through the audio data submitted by the relevant court trial roles in the online court, the logic data stated in the audio data by each court trial role are constructed, and then the corresponding rule characteristics are matched based on the logic data of each court trial role and the court trial protection data, so that the auxiliary judging result of each court trial role is generated. Therefore, the characteristic data extraction of related personnel in the audio evidence data submitted by the court trial personnel is realized, so that the accuracy of extracting related trial characteristics from the court trial record data by the online court is improved, and a more accurate trial result is generated.

Referring to fig. 2, a second embodiment of the method for assisting trial based on audio in the embodiment of the present invention includes:

201. performing time domain waveform conversion on the court trial recording data to obtain time domain court trial recording data, and performing time-frequency conversion on the time domain court trial recording data to obtain frequency domain court trial recording data;

in this embodiment, the time domain waveform conversion refers to uniformly converting signals in different formats in the court trial recording data into analog audio signals with time domain relations; the time-frequency transform herein refers to an audio signal in which a time-series audio signal is converted into a frequency domain by discrete fourier transform or fast fourier transform.

In practical application, after obtaining the trial recording data, the trial type and the trial dialect data of the trial case, the audio data stored by recording may have different data formats, firstly, the trial recording data of different format types are converted into the audio signals in the time sequence form to obtain the time domain trial recording data, and then the time domain trial recording data is subjected to fourier transformation to be converted into the frequency domain trial recording data.

202. According to preset segmentation parameters, carrying out multi-frame segmentation on the frequency domain trial recording data to obtain multi-frame recording frequency spectrum fragments;

In this embodiment, multi-frame segmentation is performed on the frequency domain trial recording data according to preset segmentation parameters (such as frame number), that is, the frequency domain trial recording data is divided into multiple segments according to the corresponding frame number, so as to obtain multi-frame recording frequency spectrum segments.

203. Calculating covariance matrixes of the recording frequency spectrum fragments to obtain frame rate characteristic values and frame rate characteristic vectors corresponding to the recording frequency spectrum fragments, and sequencing the frame rate characteristic values from small to large according to the sequence from large to small to obtain sequenced frame rate characteristic values;

in this embodiment, the number of occurrences of each recording spectral slice in the slice signal and the corresponding mean and variance thereof are calculated to determine the mean and standard deviation of each recording spectral slice, and then the value of each recording spectral slice is subtracted from its mean and divided by its standard deviation, the method comprises the steps of obtaining a standardized data set, transposing the standardized data set, and multiplying the transpose by the standardized data set to obtain a covariance matrix, wherein a covariance matrix formula is C= (1/n) X, wherein C is the covariance matrix, n is the number of samples, and X is the standardized data set; further, the covariance matrix is subjected to eigenvalue decomposition to obtain a frame rate eigenvalue and a frame rate eigenvector, wherein the frame rate eigenvalue represents variances of corresponding recording frequency spectrum segments in the direction of the eigenvector, and the frame rate eigenvector represents which directions the variable is related to other variables; and sequencing the frame rate characteristic values from small to large in sequence from large to small to obtain sequenced frame rate characteristic values.

204. Based on the sequenced frame rate characteristic values, frame rate characteristic vectors corresponding to a preset number of frame rate characteristic values are sequentially selected to obtain selected frame rate characteristic vectors, and rotation transformation is carried out on each recording frequency spectrum segment by using the selected frame rate characteristic vectors to obtain the recording frequency spectrum segments with reduced dimensions;

in this embodiment, based on the ordered frame rate feature values, frame rate feature vectors corresponding to a preset number of frame rate feature values are sequentially selected, that is, feature vectors corresponding to the first k feature values are selected as main components of the first k recording spectrum segments according to the sizes of the feature values, so as to obtain selected frame rate feature vectors, and further, the selected frame rate feature vectors are utilized to perform rotary transformation on each recording spectrum segment, that is, linear transformation is performed according to the selected first k feature vectors and the values of each recording spectrum segment, so as to obtain the recording spectrum segment after dimension reduction.

205. Extracting audio dynamic characteristics from each dimension-reduced recording frequency spectrum segment to obtain frequency spectrum characteristic vectors;

in this embodiment, a Mel filter (Mel filter) is used to decompose subband signals of each reduced-dimension recording spectrum segment to obtain logarithmic recording spectrum segments, and then a DCT (discrete cosine transform) is used to convert the logarithmic recording spectrum segments into a set of MFCC coefficients (Mel Frequency Cepstrum Coefficient, mel frequency cepstrum coefficients) to obtain a spectrum feature vector.

206. Determining a plurality of court trial roles in a case to be judged and the number of corresponding court trial roles, and determining the number of vectors corresponding to the frequency spectrum feature vectors;

in this embodiment, the tone color and the number of corresponding trial roles corresponding to the plurality of trial roles in the current case to be examined are determined, and the number of corresponding vectors in the spectral feature vectors is determined.

207. Constructing an initial clustering matrix by utilizing the number of the court trial character population and the number of the vectors, calculating the similarity distance between each spectrum feature vector, and constructing a distance measurement matrix based on each similarity distance;

in this embodiment, an initial clustering matrix (μmatrix) is constructed by using the number of trial characters and the number of vectors, where the dimension of the initial clustering matrix is kxn, where K is the number of trial characters and N is the number of vectors. Each element μ in the μmatrix _i,j Indicating the membership of data point j to cluster i. And calculating the distance D between the spectral feature vectors (i.e. the distance D can be calculated by Euclidean distance formula), then calculating the reciprocal of the distance D plus 1 to calculate the similarity distance between the spectral feature vectors, and further constructing a distance measurement matrix based on the similarity distances, thereby obtaining an n×n distance matrix D (distance measurement matrix). Wherein D is _ij Representing the distance between the i-th spectral feature vector and the j-th spectral feature vector.

208. Calculating a mean center vector corresponding to each spectrum feature vector by using the initial clustering matrix, and carrying out clustering update on each spectrum feature vector in the initial clustering matrix by using the mean center vector and the distance measurement matrix until the convergence degree of the initial clustering matrix is lower than a preset convergence threshold value to obtain a clustering result;

in this embodiment, based on an initial clustering matrix, K spectral feature vectors are randomly selected as initial clustering center points, each spectral feature vector is allocated to a cluster where a cluster center point closest to the initial clustering center point is located based on the distance metric matrix, and then a mean value center vector of each spectral feature vector is calculated as a new clustering center point, and cluster updating is performed on each spectral feature vector in the initial clustering matrix by using the mean value center vector and the distance metric matrix until the convergence degree of the initial clustering matrix is lower than a preset convergence threshold (i.e., until the cluster center point is not changed any more or the maximum iteration number is reached), so as to obtain a clustering result.

209. Extracting court trial features of each court trial role from the clustering result to obtain a plurality of court trial features, and constructing recording logic data of a case to be judged based on the court trial features;

210. Based on the court trial dialect data and the recording logic data, the legal characteristics of the to-be-judged case are matched, and an auxiliary judgment result is generated.

In the embodiment of the invention, the speaking characteristic data of the court trial roles required in the audio data are extracted through the audio data submitted by the relevant court trial roles on the court, and the logic data stated in the audio data by each court trial role are constructed, so that the corresponding rule characteristics are matched based on the logic data and the court trial protection data of each court trial role, and the auxiliary judging result of each court trial role is generated. Therefore, the characteristic data extraction of related personnel in the audio evidence data submitted by the court trial personnel is realized, so that the accuracy of extracting related trial characteristics from the court trial record data by the online court is improved, and a more accurate trial result is generated.

Referring to fig. 3, a third embodiment of the auxiliary trial method based on the court trial audio in the embodiment of the present invention includes:

301. acquiring trial recording data of a case to be judged, the type of the case trial and the data of the trial, and performing spectrum conversion on the trial recording data according to preset segmentation parameters to obtain multi-frame recording spectrum fragments;

302. Extracting spectral feature vectors corresponding to the spectral segments of the sound recording, and carrying out attribution clustering of court trial roles on the spectral feature vectors based on the court trial types of the cases to obtain clustering results;

303. clustering and dividing each clustering result according to the court trial roles to obtain a dividing result;

in this embodiment, according to the court trial roles, the recording spectrum segments of different clusters in each cluster result are divided into corresponding court trial roles, so as to obtain the division result.

304. Performing audio noise reduction processing on the division result to obtain a noise-reduced division result;

in this embodiment, noise reduction processing is performed on the recording spectrum segments corresponding to different court trial roles in the division result, for example, the environmental noise signal segments therein are used to extract the audio frequency dedicated to the court trial role from the audio segments mixed with other court trial roles according to the tone colors corresponding to different court trial roles, so as to obtain the division result after noise reduction.

305. Based on the court trial type of the case to be judged, extracting dialect features from the noise-reduced division result to obtain court trial features corresponding to each court trial role;

in this embodiment, based on the court trial type (such as administrative penalty, criminal penalty, civil compensation, etc.) of the current trial case, the method extracts the dialect features related to the trial case of the court trial type in the recorded audio of the corresponding court trial role from the division result after noise reduction, and obtains the court trial features corresponding to each court trial role.

306. Calculating the audio similarity of each court trial feature based on the court trial features;

in this embodiment, based on the court trial features, the audio similarity of each court trial role corresponding to the court trial features is calculated, for example, the audio similarity between each segment of dialogue features in the event development statement in the whole recording dialogue.

307. Performing logic association degree calculation on the audio frequency similarity by using a preset logic calculation model, and constructing a dialect logic sequence corresponding to each court trial role based on a calculation result;

in this embodiment, the logical calculation model refers to a model that uses machine learning or deep learning models to construct a model that identifies the logical association degree between audio pieces.

In practical application, the logic association degree calculation is performed on each audio frequency similarity by using a preset logic calculation model, for example, an association network developed by historical recording events is constructed by using a logic calculation model constructed by a plurality of decision trees, the influence and the relation between the events are described, then the logic calculation model is used for combining the events with known recording information, the current audio frequency fragments are predicted and analyzed, namely, the information of dialogue subjects, conversation parties, emotion tendencies and the like in the current audio frequency fragments is identified, the fragments for identifying various corresponding information are formed and converted into a structured logic relation, and further the statement logic sequence of the dialect recording corresponding to each court trial role is constructed based on the calculation result.

308. Based on the dialect logic sequence, utilizing the court trial characteristics of each court trial role to construct recording logic data of the to-be-judged case;

in this embodiment, the recording logic data refers to voice data with logic statement relations of things described by personnel of each party in the recording audio data.

In practical application, based on the dialect logic sequence, the audio data of the records of the to-be-judged case are constructed by utilizing the court trial features of each court trial role, and the audio data have logic dialect and the audio data of the corresponding court trial staff.

309. Extracting a plurality of key court trial features in court trial dialect data and recording logic data based on a preset court trial logic sequence;

in this embodiment, based on a preset court trial logic sequence (i.e., an on-line court trial flow), a plurality of key court trial features in the court trial dialect data and the recording logic data are extracted, that is, the court trial dialect feature data of each court trial role is extracted at each stage in the trial flow, so as to obtain a plurality of key court trial features.

310. Based on the key court trial characteristics, a plurality of legal characteristics of the to-be-trial cases are matched, and based on the legal characteristics, auxiliary trial results corresponding to the court trial roles are generated.

In this embodiment, based on the key court trial features, a plurality of rule features of the current to-be-trial case are matched by combining the history trial record and the rule laws corresponding to the court trial type, and based on the rule features, an auxiliary trial result document corresponding to the court trial role is generated. The method and the device realize the audio detection of the recorded audio data submitted by court trial staff by the online court so as to classify the audio fragments of related staff in the audio data and extract the characteristics of related cases, thereby more accurately and automatically generating a final auxiliary trial result document, ensuring the legal interests of the court trial roles of all parties and providing a better auxiliary trial result.

The above describes the auxiliary judging method based on the court trial audio in the embodiment of the present invention, and the following describes the auxiliary judging device based on the court trial audio in the embodiment of the present invention, referring to fig. 4, one embodiment of the auxiliary judging device based on the court trial audio in the embodiment of the present invention includes:

the frequency spectrum conversion module 401 is used for obtaining court trial recording data, a court trial type and court trial dialect data of a case to be judged, and carrying out frequency spectrum conversion on the court trial recording data according to preset segmentation parameters to obtain multi-frame recording frequency spectrum fragments;

the clustering module 402 is configured to extract spectral feature vectors corresponding to the audio recording spectral segments, and perform attribution clustering of court trial roles on each spectral feature vector based on the case court trial type, so as to obtain a clustering result;

the logic construction module 403 is configured to extract a court trial feature of each court trial role from the clustering result to obtain a plurality of court trial features, and construct recording logic data of the to-be-examined case based on the court trial features;

and the rule matching module 404 is configured to match rule features of the case to be tested based on the court trial dialect data and the recording logic data, and generate an auxiliary trial result.

Referring to fig. 5, another embodiment of the auxiliary trial apparatus based on the court trial audio according to the embodiment of the present invention includes:

Further, the spectrum conversion module 401 includes:

the time-frequency conversion unit 4011 is configured to perform time domain waveform conversion on the court trial recording data to obtain time domain court trial recording data, and perform time-frequency conversion on the time domain court trial recording data to obtain frequency domain court trial recording data; the multi-frame segmentation unit 4012 is configured to perform multi-frame segmentation on the frequency domain trial recording data according to preset segmentation parameters, so as to obtain multi-frame recording frequency spectrum segments.

Further, the clustering module 402 includes:

a matrix calculating unit 4021, configured to perform covariance matrix calculation on each recording spectrum segment to obtain a frame rate feature value and a frame rate feature vector corresponding to each recording spectrum segment, and order each frame rate feature value from small to large in order from large to small, so as to obtain an ordered frame rate feature value; the rotation transformation unit 4022 is configured to sequentially select frame rate feature vectors corresponding to a preset number of frame rate feature values based on the sequenced frame rate feature values, obtain selected frame rate feature vectors, and perform rotation transformation on each recording frequency spectrum segment by using the selected frame rate feature vectors, so as to obtain a recording frequency spectrum segment after dimension reduction; the dynamic extraction unit 4023 is configured to extract audio dynamic features of each reduced audio frequency spectrum segment, so as to obtain a frequency spectrum feature vector.

Further, the clustering module 402 further includes:

the number determining unit 4024 is configured to determine a plurality of court trial roles in the case to be examined and a number of corresponding court trial roles, and determine a number of vectors corresponding to the spectral feature vectors; the distance calculating unit 4025 is configured to construct an initial clustering matrix by using the number of trial characters and the number of vectors, calculate a similarity distance between the spectral feature vectors, and construct a distance metric matrix based on the similarity distances; the cluster updating unit 4026 is configured to calculate a mean center vector corresponding to each spectrum feature vector by using the initial cluster matrix, and perform cluster updating on each spectrum feature vector in the initial cluster matrix by using the mean center vector and the distance metric matrix until the convergence degree of the initial cluster matrix is lower than a preset convergence threshold value, so as to obtain a cluster result.

Further, the logic building module 403 includes:

the clustering division unit 4031 is configured to perform clustering division on each clustering result according to the court trial role to obtain a division result; an audio noise reduction unit 4032, configured to perform audio noise reduction processing on the division result, to obtain a noise-reduced division result; and a dialect extraction unit 4033, configured to extract dialect features of the noise-reduced division result based on the court trial type of the to-be-examined case, so as to obtain court trial features corresponding to each court trial role.

Further, the logic building module 403 further includes:

a similarity calculating unit 4034, configured to calculate an audio similarity of each of the court trial features based on the court trial features; the association degree calculating unit 4035 is configured to perform logic association degree calculation on each audio frequency similarity by using a preset logic calculation model, and construct a dialect logic sequence corresponding to each court trial role based on a calculation result; the logic construction unit 4036 is configured to construct the recording logic data of the to-be-judged case by using the court trial features of each court trial role based on the dialect logic sequence.

Further, the rule matching module 404 includes:

A feature extraction unit 4041, configured to extract a plurality of key court trial features in the court trial dialect data and the recording logic data based on a preset court trial logic sequence; and the rule matching unit 4042 is configured to match a plurality of rule features of the to-be-judged case based on the key court trial features, and generate an auxiliary trial result corresponding to the court trial role based on the rule features.

The above fig. 4 and fig. 5 describe the auxiliary trial device based on the trial audio in the embodiment of the present invention in detail from the point of view of the modularized functional entity, and the following describes the auxiliary trial device based on the trial audio in the embodiment of the present invention in detail from the point of view of hardware processing.

Fig. 6 is a schematic structural diagram of a secondary trial device based on court trial audio according to an embodiment of the present invention, where the secondary trial device 600 based on court trial audio may have relatively large differences due to different configurations or performances, and may include one or more processors (central processing units, CPU) 610 (e.g., one or more processors) and a memory 620, and one or more storage media 630 (e.g., one or more mass storage devices) storing application programs 633 or data 632. Wherein the memory 620 and the storage medium 630 may be transitory or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations on the trial audio-based auxiliary trial apparatus 600. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the trial audio-based auxiliary trial device 600.

The trial audio-based auxiliary trial device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input/output interfaces 660, and/or one or more operating systems 631, such as Windows Serve, mac OS X, unix, linux, freeBSD, or the like. It will be appreciated by those skilled in the art that the trial audio-based auxiliary trial apparatus structure shown in fig. 6 is not limiting and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The invention also provides auxiliary trial equipment based on the court trial audio, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the processor executes the steps of the auxiliary trial method based on the court trial audio in the above embodiments.

The invention also provides a computer readable storage medium which can be a nonvolatile computer readable storage medium, and the computer readable storage medium can also be a volatile computer readable storage medium, wherein the computer readable storage medium stores instructions which when run on a computer cause the computer to execute the steps of the trial audio-based auxiliary trial method.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The auxiliary judgment method based on the court trial audio is characterized by comprising the following steps of: acquiring trial recording data, trial type and trial dialect data of a case to be judged, and performing spectrum conversion on the trial recording data according to preset segmentation parameters to obtain multi-frame recording spectrum fragments;

extracting spectral feature vectors corresponding to the sound recording spectral fragments, constructing a corresponding distance measurement matrix and an initial clustering matrix based on the spectral feature vectors and the number of court trial roles corresponding to the to-be-examined case, and performing attributive clustering on the spectral feature vectors in the initial clustering matrix by utilizing the distance measurement matrix and a mean value center vector corresponding to the spectral feature vectors to obtain a clustering result;

extracting court trial features of each court trial role from the clustering result to obtain a plurality of court trial features, calculating the logic association degree of each court trial feature by using a preset logic calculation model based on the court trial features, calculating the dialect logic sequence corresponding to each court trial role based on the logic association degree, and constructing the recording logic data of the to-be-trial case;

And matching the rule features of the to-be-judged case based on the court trial dialect data and the recording logic data, and generating an auxiliary judgment result.

2. The method for assisting trial based on trial audio according to claim 1, wherein the performing spectral transformation on the trial audio recording data according to preset segmentation parameters to obtain multi-frame audio recording spectral fragments comprises:

performing time domain waveform conversion on the court trial recording data to obtain time domain court trial recording data, and performing time-frequency conversion on the time domain court trial recording data to obtain frequency domain court trial recording data;

and carrying out multi-frame segmentation on the court trial recording data of the frequency domain according to preset segmentation parameters to obtain multi-frame recording frequency spectrum fragments.

3. The method for assisting trial audio-based according to claim 1, wherein the extracting the spectral feature vector corresponding to each of the recorded spectral slices comprises:

performing covariance matrix calculation on each recording frequency spectrum segment to obtain a frame rate characteristic value and a frame rate characteristic vector corresponding to each recording frequency spectrum segment, and sequencing the frame rate characteristic values from small to large according to the sequence from large to small to obtain sequenced frame rate characteristic values;

Sequentially selecting frame rate feature vectors corresponding to a preset number of frame rate feature values based on the sequenced frame rate feature values to obtain selected frame rate feature vectors, and performing rotary transformation on each recording frequency spectrum segment by using the selected frame rate feature vectors to obtain the recording frequency spectrum segments with reduced dimensions;

and extracting audio dynamic characteristics from each dimension-reduced recording frequency spectrum segment to obtain frequency spectrum characteristic vectors.

4. The method for assisting trial based on trial audio according to claim 1, wherein the constructing a corresponding distance metric matrix and an initial clustering matrix based on the spectral feature vector and the number of trial role persons corresponding to the trial case piece, and performing home clustering on each spectral feature vector in the initial clustering matrix by using a mean center vector corresponding to the distance metric matrix and the spectral feature vector to obtain a clustering result includes:

determining a plurality of court trial roles and the number of corresponding court trial roles in the case to be judged, and determining the number of vectors corresponding to the frequency spectrum feature vectors;

constructing an initial clustering matrix by utilizing the number of the court trial roles and the number of the vectors, calculating the similarity distance between the frequency spectrum feature vectors, and constructing a distance measurement matrix based on the similarity distance;

And calculating a mean center vector corresponding to each spectrum feature vector by using the initial clustering matrix, and carrying out clustering update on each spectrum feature vector in the initial clustering matrix by using the mean center vector and the distance measurement matrix until the convergence degree of the initial clustering matrix is lower than a preset convergence threshold value to obtain a clustering result.

5. The trial audio-based auxiliary judgment method according to claim 1, wherein the step of extracting the trial features of each of the trial roles from the clustering result to obtain a plurality of trial features includes:

clustering and dividing each clustering result according to the court trial roles to obtain a dividing result;

performing audio noise reduction processing on the division result to obtain a noise-reduced division result;

and extracting dialect features from the noise-reduced division results based on the court trial type of the to-be-checked case, and obtaining the court trial features corresponding to each court trial role.

6. The method for assisting trial based on court trial audio according to claim 5, wherein the calculating the logic association degree of each court trial feature based on the court trial feature by using a preset logic calculation model, calculating the dialect logic sequence corresponding to each court trial role based on the logic association degree, and constructing the recording logic data of the to-be-trial case comprises: calculating the audio similarity of each court trial feature based on the court trial features;

Performing logic association degree calculation on the audio similarity by using a preset logic calculation model, and constructing a dialect logic sequence corresponding to each court trial role based on a calculation result;

and constructing the recording logic data of the to-be-judged case by utilizing the court trial characteristics of each court trial character based on the dialect logic sequence.

7. The trial audio-based auxiliary judgment method according to claim 1, wherein the matching the legal characteristics of the case to be judged based on the trial court data and the recording logic data to generate the auxiliary judgment result comprises: extracting a plurality of key court trial features in the court trial dialect data and the recording logic data based on a preset court trial logic sequence;

and matching a plurality of legal characteristics of the to-be-judged case based on the key court trial characteristics, and generating an auxiliary judging result corresponding to the court trial role based on the legal characteristics.

8. The auxiliary trial device based on the court trial audio is characterized by comprising:

the frequency spectrum conversion module is used for acquiring court trial recording data, the court trial type and the court trial dialect data of the to-be-judged case, and carrying out frequency spectrum conversion on the court trial recording data according to preset segmentation parameters to obtain multi-frame recording frequency spectrum fragments;

The clustering module is used for extracting the frequency spectrum feature vectors corresponding to the frequency spectrum segments of the recording, constructing a corresponding distance measurement matrix and an initial clustering matrix based on the frequency spectrum feature vectors and the number of court trial roles corresponding to the pending case pieces, and performing attributive clustering on the frequency spectrum feature vectors in the initial clustering matrix by utilizing the distance measurement matrix and the average value center vector corresponding to the frequency spectrum feature vectors to obtain a clustering result;

the logic construction module is used for extracting court trial features of each court trial role from the clustering result to obtain a plurality of court trial features, calculating the logic association degree of each court trial feature by using a preset logic calculation model based on the court trial features, calculating the dialect logic sequence corresponding to each court trial role based on the logic association degree, and constructing the recording logic data of the to-be-trial case;

and the rule matching module is used for matching the rule characteristics of the to-be-judged case based on the court trial dialect data and the recording logic data and generating an auxiliary judgment result.

9. Auxiliary trial equipment based on court trial audio is characterized in that the auxiliary trial equipment based on the court trial audio comprises: a memory and at least one processor, the memory having instructions stored therein;

The at least one processor invokes the instructions in the memory to cause the trial audio-based auxiliary trial apparatus to perform the steps of the trial audio-based auxiliary trial method of any one of claims 1-7.

10. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the trial audio-based auxiliary trial method of any of claims 1-7.