WO2022127042A1

WO2022127042A1 - Examination cheating recognition method and apparatus based on speech recognition, and computer device

Info

Publication number: WO2022127042A1
Application number: PCT/CN2021/097100
Authority: WO
Inventors: 苏雪琦; 王健宗; 程宁
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-12-16
Filing date: 2021-05-31
Publication date: 2022-06-23
Also published as: CN112669820B; CN112669820A

Abstract

An examination cheating recognition method and apparatus based on speech recognition, and a computer device. The method comprises: acquiring, from basic speech information of each examinee, a speech feature parameter of each piece of statement pronunciation, and performing dimension reduction to obtain a dimension-reduced feature parameter; training an initialized voiceprint verification model according to the dimension-reduced feature parameter; using the voiceprint verification model to verify whether a target dimension-reduced feature parameter of speech information to be analyzed is consistent with a dimension-reduced feature parameter of a corresponding examinee; determining whether a cheating word is included in target text information that is obtained by means of performing speech recognition on the speech information to be analyzed; and if a voiceprint verification result indicates inconsistency or a text determination result indicates that a cheating word is included, determining that there is a cheating behavior, and sending alarm prompt information. In the method, speech information to be recognized of a examinee is recognized on the basis of speech recognition, so as to realize real-time and accurate determination of a communication cheating behavior between examinees.

Description

Test cheating recognition method, device and computer equipment based on speech recognition

This application claims the priority of the Chinese patent application filed on December 16, 2020 with the application number 202011490833.2 and the invention titled "Method, Device and Computer Equipment for Exam Cheating Recognition Based on Speech Recognition", the entire content of which has passed Reference is incorporated in this application.

technical field

The present application relates to the field of artificial intelligence technology, and belongs to the application scenario of judging cheating in exams based on speech recognition in smart cities, and in particular, relates to a method, device and computer equipment for recognizing cheating in exams based on speech recognition.

Background technique

More and more selection and evaluation processes are carried out by examination to ensure the fairness of selection or evaluation, such as civil servant selection, CET-4 and CET-6 test evaluation, driver's license test evaluation, etc. In order to prevent candidates from cheating during the examination and affect the fairness of the examination, invigilators will be arranged in the examination room to invigilate the examination. However, the invigilators cannot monitor each candidate at all times, resulting in unsatisfactory invigilation results. In the traditional technical methods, video surveillance is used to assist invigilators in invigilating the exam, and the video is analyzed to determine the specific location of the cheating examinee. However, the inventor found that by analyzing the surveillance video, it is only possible to judge whether the examinee has cheated after the event. The real-time behavior of behavior judgment cannot be guaranteed, and the surveillance video can only analyze whether the examinee's body movements have cheating behaviors through images. If the examinees communicate with each other and the body movements are small, the obtained surveillance video cannot accurately identify the examinee's cheating Behavior. Therefore, the prior art method has the problem of being unable to make real-time and accurate judgments on the cheating behaviors between candidates.

SUMMARY OF THE INVENTION

The embodiments of the present application provide a method, device, and computer equipment for detecting cheating in an exam based on speech recognition, which aim to solve the problem in the prior art that the cheating behavior of exchanges between candidates cannot be judged in real time and accurately.

In a first aspect, an embodiment of the present application provides a method for recognizing cheating in an exam based on speech recognition, which includes:

Obtain the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each section of the sentence from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule;

Perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;

Perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each of the sentences and the preset model training rules to obtain the trained voiceprint verification model;

If receiving the voice information to be analyzed from any of the voice collection terminals, obtain the target dimension reduction feature parameter corresponding to the voice information to be analyzed according to the extraction rule and the feature vector matrix;

According to the preset scoring threshold and the voiceprint verification model, verify whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed to obtain a voiceprint verification result;

Perform voice recognition on the voice information to be analyzed according to a pre-stored voice recognition model to obtain target text information corresponding to the voice information to be analyzed;

Judging whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;

If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued.

In a second aspect, an embodiment of the present application provides a speech recognition-based examination cheating recognition device, which includes:

The voice feature parameter acquisition unit is used to obtain the basic voice information corresponding to each candidate collected by the voice acquisition terminal, and obtains the basic voice information corresponding to each candidate from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule. The speech feature parameters corresponding to the pronunciation;

A dimensionality reduction processing unit, configured to perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;

A model training unit, configured to perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each said sentence and the preset model training rules to obtain the trained voiceprint verification model;

The target dimensionality reduction feature parameter acquisition unit is used to obtain the target reduction corresponding to the to-be-analyzed voice information according to the extraction rule and the feature vector matrix if the to-be-analyzed voice information is received from any of the voice collection terminals. dimensional feature parameters;

The voiceprint verification result obtaining unit is used for verifying whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter of the examinee corresponding to the voice information to be analyzed according to the preset scoring threshold and the voiceprint verification model to obtain the voiceprint verification result;

a target text information acquisition unit, configured to perform speech recognition on the to-be-analyzed speech information according to a pre-stored speech recognition model to obtain target text information corresponding to the to-be-analyzed speech information;

a text judgment result obtaining unit, configured to judge whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;

A prompt information sending unit, configured to determine that cheating behavior exists and issue an alarm prompt message if the voiceprint verification result is inconsistent or the text judgment result contains cheating words.

In a third aspect, an embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer The program implements the speech recognition-based exam cheating recognition method described in the first aspect.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when executed by a processor, the computer program causes the processor to execute the above-mentioned first step. In one aspect, the speech recognition-based exam cheating recognition method is described.

The embodiments of the present application provide a method, device and computer equipment for detecting cheating in an exam based on speech recognition. Obtain the voice feature parameters of each sentence pronunciation from the basic voice information of each candidate and perform dimension reduction to obtain the dimensionality reduction feature parameters. According to the dimensionality reduction feature parameters, the initialized voiceprint verification model is trained, and the voiceprint verification model is used to analyze the speech. Whether the target dimensionality reduction feature parameters of the information are consistent with the dimensionality reduction feature parameters of the corresponding candidates is verified, and whether the target text information obtained by speech recognition from the analyzed voice information contains cheating words. If the voiceprint verification result is inconsistent or text judgment If the result is that the word contains cheating words, it is determined that there is cheating behavior and an alarm prompt message is sent. Through the above method, the speech information to be recognized of the examinee can be recognized based on the speech recognition, so as to realize the real-time and accurate judgment of the cheating behavior of the examinees.

Description of drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.

1 is a schematic flowchart of a speech recognition-based exam cheating recognition method provided by an embodiment of the present application;

2 is a schematic diagram of an application scenario of a speech recognition-based exam cheating recognition method provided by an embodiment of the present application;

3 is a schematic diagram of a sub-flow of a speech recognition-based exam cheating recognition method provided by an embodiment of the present application;

4 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application;

5 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application;

6 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application;

7 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application;

8 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application;

9 is a schematic block diagram of a speech recognition-based examination cheating recognition device provided by an embodiment of the present application;

FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

It is to be understood that, when used in this specification and the appended claims, the terms "comprising" and "comprising" indicate the presence of the described features, integers, steps, operations, elements and/or components, but do not exclude one or The presence or addition of a number of other features, integers, steps, operations, elements, components, and/or sets thereof.

It should also be understood that the terminology used in the specification of the application herein is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural unless the context clearly dictates otherwise.

It should also be further understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items .

Please refer to FIG. 1 and FIG. 2 , FIG. 1 is a schematic flowchart of a method for recognizing cheating in an exam based on speech recognition provided by an embodiment of the present application, and FIG. 2 is a schematic diagram of an application scenario of the method for recognizing cheating in an exam based on speech recognition provided by an embodiment of the present application , the test cheating recognition method based on speech recognition is applied in the user terminal 10, the method is executed by the application software installed in the user terminal 10, and the user terminal 10 and the voice collection terminal 20 of each candidate are connected through the network to perform data analysis For the transmission of information, the user terminal 10 is a terminal device used to perform a speech recognition-based examination cheating identification method to realize the judgment of whether the examinee has cheating in the examination, such as a desktop computer, a notebook computer, a tablet computer or a mobile phone and other terminal equipment, The voice collection terminal 20 is a terminal used for real-time collection of the voice information sent by the examinee, such as a microphone, etc., then each examinee at the test site should wear a corresponding voice collection terminal 20 or each examinee's examination desktop is configured accordingly. A voice collection terminal 20 . As shown in FIG. 1, the method includes steps S110-S180.

S110: Acquire the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each sentence from the pronunciations of multiple sentences contained in each basic voice information according to a preset extraction rule .

Obtain the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each sentence from the pronunciations of multiple sentences contained in each basic voice information according to a preset extraction rule. When entering the answering interface, candidates need to read the test agreement, test room instructions, etc. This reading process is completed before the test starts. During the process of reading the above content, each candidate can pass the corresponding information for each candidate. The voice collection terminal collects the corresponding basic voice information, and the basic voice information of each examinee includes the pronunciation of multiple sentences. Each speech pronunciation corresponds to a sentence spoken by a candidate. The corresponding speech feature parameters can be obtained from the pronunciation of each sentence according to the extraction rules. The speech feature parameters can quantify the audio features of the pronunciation of a sentence. The parameters include audio coefficient information and perceptual coefficient information of the pronunciation of a sentence. The audio coefficient information can be the Mel Frequency Cepstrum Coefficient (MFCC) corresponding to the pronunciation of the sentence, and the perceptual coefficient information can be the perception corresponding to the pronunciation of the sentence. Linear Prediction Coefficient (LPC). The extraction rules include spectral conversion rules, audio coefficient extraction rules, and perceptual coefficient extraction rules. The pronunciation of each sentence can be spectrum converted according to the spectrum conversion rules, and the audio frequency spectrum obtained after the spectrum conversion can be analyzed according to the audio coefficient extraction rules to obtain the audio coefficient information, and the audio frequency spectrum can be analyzed according to the perceptual coefficient extraction rules to obtain the perceptual coefficient information. .

In one embodiment, as shown in FIG. 3 , step S110 includes sub-steps S111 , S112 , S113 and S114 .

S111. Perform frame-by-frame processing on the pronunciation of each sentence to obtain corresponding multi-frame audio information.

Sentence pronunciation is represented in the computer by the spectrogram containing the audio track. The spectrogram contains many frames, each frame corresponds to a time unit, and the audio information of each frame can be obtained from the spectrogram of the sentence pronunciation. , each frame of audio information corresponds to the audio information contained in a time unit in the spectrogram.

S112. Convert the multi-frame audio information corresponding to the pronunciation of each sentence into an audio spectrum according to the spectrum conversion rule.

Fast Fourier Transform (FFT) can be performed on the multi-frame audio information contained in the pronunciation of each sentence according to the spectral conversion rules, and then rotated by 90 degrees to obtain the audio spectrum corresponding to the pronunciation of each sentence. The spectrum represents the relationship between frequency and energy.

S113: Acquire audio coefficient information corresponding to each of the audio frequency spectrums according to the audio coefficient extraction rule.

The audio coefficient information can be extracted from each audio frequency spectrum through the audio coefficient extraction rule. Specifically, the audio coefficient extraction rule includes a frequency conversion formula and an inverse conversion calculation formula.

In one embodiment, as shown in FIG. 4 , step S113 includes sub-steps S1131 and S1132.

S1131. Convert each of the audio frequency spectrum into a corresponding nonlinear audio frequency spectrum according to the frequency conversion formula; S1132. Perform inverse transformation on each of the nonlinear audio frequency spectrum according to the inverse transform calculation formula to obtain a A plurality of audio coefficients corresponding to the nonlinear audio spectrum are used as audio coefficient information of each of the audio spectrums.

According to the frequency conversion formula, the linearly expressed audio spectrum is converted into a nonlinear audio spectrum. The human auditory system is a special nonlinear system, and its sensitivity to different frequency signals is different. The characteristic of sensitivity for perception can simulate the characterization of the audio signal by the human auditory system through the nonlinear audio spectrum, and further obtain the characteristics consistent with the human auditory system. Both the audio spectrum and the nonlinear audio spectrum can be represented by a spectrum curve, and the spectrum curve is composed of multiple continuous spectrum values.

Specifically, the frequency conversion formula can be expressed by formula (1):

mel(f)=2959×log(1+f/700) (1);

where mel(f) is the spectral value of the converted nonlinear audio spectrum, and f is the frequency value of the audio audio.

Each nonlinear audio spectrum can be inversely transformed according to the inverse transform calculation formula. Specifically, a discrete cosine transform (Discrete Cosine Transform, DCT) is performed after taking the logarithm of the obtained nonlinear audio spectrum, and the discrete cosine transform is performed. The 2nd to 13th coefficients are combined to obtain audio coefficients corresponding to the non-linear audio frequency spectrum, and audio coefficient information corresponding to each audio frequency spectrum can be obtained by acquiring the audio coefficients corresponding to each non-linear audio frequency.

S114. Acquire the perceptual coefficient information corresponding to each of the audio frequency spectrums according to the perceptual coefficient extraction rule.

The perceptual coefficient information can be extracted from each audio frequency spectrum through the perceptual coefficient extraction rule. Specifically, the perceptual coefficient extraction rule includes a frequency array and an inverse transform calculation formula.

In one embodiment, as shown in FIG. 5 , step S114 includes sub-steps S1141 and S1142.

S1141. Filter each of the audio frequency spectrums according to a plurality of frequency values included in the frequency array to obtain a frequency band energy vector corresponding to each of the audio frequency spectra.

Specifically, the frequency array contains multiple frequency values, and an equal loudness curve filter can be performed on an audio spectrum according to the multiple frequency values, so as to obtain a frequency band energy vector corresponding to the audio spectrum and each frequency value.

For example, for the frequency range of human voice, 15 frequency values can be selected and combined into a frequency array, then the frequency array can be expressed as {250, 350, 450, 570, 700, 840, 1000, 1170, 1370, 1600, 1850, 2150, 2500, 2900, 3400}.

S1142: Compress the frequency band energy vector corresponding to each of the audio frequency spectrums, and then perform inverse transformation according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.

Perform cube root calculation on the band energy vector of each audio frequency to compress the band energy vector, and perform inverse fast Fourier transform (Inverse Fast Fourier Transform, IFFT) on the band energy vector after the compression calculation according to the inverse transform calculation formula to obtain each audio frequency. A plurality of coefficient values corresponding to the frequency spectrum, from which some of the front coefficient values are obtained as the perceptual coefficient information of each audio frequency spectrum.

For example, the inverse fast Fourier transform of 30 points can be performed on the band energy vector after the compression calculation to obtain the coefficient values corresponding to the 30 points, and the first 15 coefficient values are obtained as the perceptual coefficient information of the corresponding audio spectrum.

By combining the audio coefficient information and perceptual coefficient information of the same audio frequency spectrum, the speech characteristic parameters of the audio frequency spectrum can be obtained.

S120. Perform dimension reduction processing on each of the speech feature parameters according to a preset dimension reduction value to obtain a feature vector matrix and a dimension reduction feature parameter corresponding to the pronunciation of each sentence.

Each of the obtained speech feature parameters contains multiple parameter values corresponding to multiple dimensions. The multiple parameter values of some dimensions in the speech feature parameters are too concentrated, so this part of the parameter values cannot be pronounced for multiple sentences. The difference in the corresponding dimension is clearly reflected, that is, it is difficult for this part of the parameter value to highlight the characteristics of the pronunciation of the corresponding sentence in the dimension corresponding to the parameter value. Each speech feature parameter is subjected to dimensionality reduction processing, and the dimension that cannot highlight the difference in pronunciation of multiple sentences is removed, and the dimensionality reduction feature parameter is obtained. ground analysis. After each speech feature parameter is subjected to dimension reduction processing, a dimension reduction feature parameter is obtained, and the number of dimensions included in the dimension reduction feature parameter is smaller than the number of dimensions included in the speech feature parameter.

In one embodiment, as shown in FIG. 6 , step S120 includes sub-steps S121 , S122 , S123 and S124 .

S121. Integrate all the speech feature parameters into a parameter matrix and calculate a covariance matrix of the parameter matrix.

Each voice feature parameter contains multiple parameter values of the same dimension, and the voice feature parameters can be integrated to obtain a parameter matrix. Specifically, the number of voice feature parameters can be expressed as m, and the number of dimensions in the voice feature parameters can be expressed as is n, the parameter matrix obtained by combination is X _m×n , then the parameter matrix can be represented as a matrix with m rows and n columns, and the values contained in the parameter matrix are the parameter values contained in each speech feature parameter. The covariance matrix of the parameter matrix can be calculated, and the specific calculation can be expressed by formula (2):

in

is the mean vector composed of the mean values of the parameter values of each dimension of the parameter matrix X, T is the conversion symbol, and the obtained covariance matrix can be expressed as S _n×n .

S122. Find the covariance eigenvalues of the covariance matrix and the covariance eigenvectors corresponding to each of the covariance eigenvalues.

The covariance proof represents the feature distribution of multiple speech feature parameters in n directions in the n-dimensional space. By solving the covariance matrix, some specific directions and multiple speech feature parameters can be determined from the n-dimensional space. The feature set is distributed in the specific direction determined, and the size of the feature value reflects the feature difference of multiple speech feature parameters in the direction corresponding to the feature value, delete the dimension corresponding to the direction with the smaller feature value, and keep the main direction corresponding to The dimension of speech feature parameters can be reduced to achieve the purpose of dimensionality reduction. Orthogonal triangular decomposition algorithm (QR decomposition algorithm), Jocobi iteration algorithm, singular value decomposition algorithm (SVD algorithm) and other mathematical calculation methods can be used to solve the n covariance eigenvalues of the covariance matrix and the corresponding n covariance eigenvectors, A covariance eigenvalue corresponds to a covariance eigenvector.

S123. Obtain an eigenvector matrix by selecting the covariance eigenvectors corresponding to a plurality of covariance eigenvalues with the largest covariance eigenvalue and equal to the dimension reduction value; S124, combining the parameter matrix with the eigenvectors The matrices are multiplied to obtain dimension-reduced feature parameters corresponding to the pronunciation of each sentence.

Sort the obtained multiple covariance eigenvalues from large to small, and select the covariance eigenvectors corresponding to the multiple covariance eigenvalues with equal dimension reduction values according to the sorting result, and combine them to obtain an eigenvector matrix.

For example, the dimensionality reduction value is k (k<n), each covariance eigenvector is a vector with n rows and 1 column, and k are selected from the n covariance eigenvectors for combination, and the obtained eigenvector matrix can be expressed as W _n×k ;

By multiplying the parameter matrix by the obtained feature vector matrix, the dimension reduction feature parameters corresponding to the pronunciation of each sentence can be obtained according to the matrix calculation result. The calculation process can be expressed as Z _m×k =X _m×n W _n×k , the matrix calculation result is a matrix Z _m _×k with m rows and k columns, and the parameter value of each row in the matrix Z _m×k can be obtained. The dimensionality reduction feature parameter of each said sentence pronunciation, wherein, the k parameter values of the i-th row are the dimensionality reduction feature parameters corresponding to the inputted i-th speech feature parameter, and the dimensionality reduction feature parameters include k-dimensional parameters value.

S130: Perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameter of each sentence pronunciation and the preset model training rule to obtain a trained voiceprint verification model.

The initialized voiceprint verification model is iteratively trained according to the dimensionality reduction feature parameters of the pronunciation of each sentence and the preset model training rules to obtain the trained voiceprint verification model. The model training rule includes a loss value calculation formula, a gradient calculation formula and a loss threshold. Before using the voiceprint verification model, the pre-stored initial voiceprint verification model can be trained, and the trained voiceprint verification model can be used for voiceprint verification to improve the accuracy of verification. Specific rules for conducting training.

In one embodiment, as shown in FIG. 7 , step S130 includes sub-steps S131 , S132 , S133 , S134 , S135 , S136 and S137 .

S131. Randomly select two dimension-reduction characteristic parameters of the same examinee from the dimension-reduction characteristic parameters as positive samples; S132, randomly select two dimension-reduction characteristic parameters of different examinees from the dimension-reduction characteristic parameters as negative samples.

The obtained dimensionality reduction feature parameters of the pronunciation of each sentence can be used as sample data to train the initialized voiceprint verification model, and the two dimensionality reduction feature parameters corresponding to the pronunciation of two sentences of the same examinee can be obtained from the dimensionality reduction feature parameters as positive values. Samples, obtain two dimensionality reduction feature parameters corresponding to two speech pronunciations of different candidates from the dimensionality reduction feature parameters as negative samples, then you can select from the dimensionality reduction feature parameters to obtain multiple positive samples and multiple negative samples, use one A positive sample or a negative sample can be used to train the initialized voiceprint verification model once.

S133. Input the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information.

The voiceprint verification model is a neural network model constructed based on artificial intelligence. The voiceprint verification model consists of an input layer, multiple intermediate layers and an output layer. The input layer contains multiple input nodes, and the number of input nodes is related to the two. The total number of dimensions contained in the dimensionality reduction feature parameters is equal. If a dimensionality reduction feature parameter contains parameter values of k dimensions, the input layer contains 2k input nodes, the output layer contains two output nodes, and the gap between the input layer and the middle layer is , between the middle layer and other adjacent middle layers, and between the middle layer and the output layer are related by association formulas, each association formula contains corresponding parameters, the process of training the voiceprint verification model is the association The parameter values of the parameters in the formula are adjusted. Input the two dimensionality reduction feature parameters contained in the positive sample or negative sample into the voiceprint verification model for calculation to obtain the model output information. The model output information includes the output node values of the two output nodes. Among them, the value of the first output node is The output node value is the predicted probability value of the two dimensionality reduction feature parameters that are consistent, and the output node value of the second output node is the predicted probability value of the two dimensionality reduction feature parameters that are inconsistent. The value of each output node is taken. The value range is [0, 1].

S134. Calculate the model output information according to the loss value calculation formula to obtain a loss value;

The loss value corresponding to the model output information can be calculated according to the loss value calculation formula. Specifically, if the positive samples are input into the voiceprint verification model, the corresponding loss value calculation formula can be expressed as:

If the negative samples are input into the voiceprint verification model, the corresponding loss value calculation formula can be expressed as:

Among them, f ₁ is the output node value of the first output node in the model output information, and f ₂ is the output node value of the second output node in the model output information.

S135, determine whether the loss value is less than the loss threshold; S136, if the loss value is not less than the loss threshold, calculate each value in the initialization voiceprint verification model according to the gradient calculation formula and the loss value An update value of a parameter is used to update the parameter value of the parameter, and returns to the step of inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information; S137, if so If the loss value is less than the loss threshold, the voiceprint verification model is determined as the trained voiceprint verification model.

Judging whether the loss value is less than the loss threshold, if the loss value is less than the loss threshold, it means that the currently obtained voiceprint verification model can meet the needs of use, and the currently obtained voiceprint verification model is determined as the trained voiceprint verification model; If the loss value is not less than the loss threshold, it indicates that the currently obtained voiceprint verification model cannot meet the usage requirements, and the parameter values of the parameters in the voiceprint verification model need to be adjusted, and the voiceprint verification model based on the adjusted parameter values is calculated again. The new loss value, and repeatedly judge whether the new loss value is less than the loss threshold, until the obtained voiceprint verification model meets the needs of use. The updated value of each parameter in the voiceprint verification model can be calculated according to the gradient calculation formula to update the original parameter value of each parameter. Specifically, the calculated value obtained by calculating a positive sample or a negative sample for a parameter in the voiceprint verification model is input into the gradient calculation formula, and combined with the loss value obtained from the above calculation, the update corresponding to the parameter can be calculated. value, this calculation process is also known as gradient descent calculation.

Specifically, the gradient calculation formula can be expressed as:

in,

is the updated value of the calculated parameter r, ω _r is the original parameter value of the parameter r, η is the preset learning rate in the gradient calculation formula,

is the partial derivative value of the parameter r based on the loss value and the calculated value corresponding to the parameter r (the calculated value corresponding to the parameter needs to be used in this calculation process).

S140. If the to-be-analyzed speech information from any of the speech collection terminals is received, acquire target dimension reduction feature parameters corresponding to the to-be-analyzed speech information according to the extraction rule and the feature vector matrix.

If the to-be-analyzed speech information from any of the speech collection terminals is received, the target dimension reduction feature parameter corresponding to the to-be-analyzed speech information is acquired according to the extraction rule and the feature vector matrix. The process of training the initialized voiceprint verification model can be completed before the start of the test. After the test is started, the voices around each candidate are collected through each voice collection terminal. For the speech information to be analyzed, the target dimension reduction feature parameters corresponding to the speech information to be analyzed can be obtained according to the extraction rule and the feature vector matrix. Specifically, the target speech characteristic parameters are obtained from the speech information to be analyzed according to the extraction rules, and the specific method for obtaining the target speech characteristic parameters is the same as the specific method for obtaining the speech characteristic parameters corresponding to the pronunciation of a sentence, which will not be repeated here; The target dimension reduction feature parameters corresponding to the speech information to be analyzed can be obtained by multiplying the feature parameters by the feature vector matrix.

S150. According to the preset scoring threshold and the voiceprint verification model, verify whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed to obtain a voiceprint verification result.

According to the preset scoring threshold and the voiceprint verification model, it is possible to verify whether the target dimensionality reduction feature parameters are consistent with the dimensionality reduction feature parameters of the corresponding examinee, and obtain the voiceprint verification result. Specifically, according to the voice collection terminal corresponding to the voice information to be analyzed, any dimension reduction feature parameter of a candidate corresponding to the speech collection terminal is obtained, and the target dimension reduction feature parameter and the dimension reduction feature parameter are combined and input for voiceprint verification The model obtains the corresponding output information. Based on the output information, the corresponding verification score can be calculated to determine whether the verification score is greater than the score threshold. If it is greater, the voiceprint verification result is consistent; if the verification score is not greater than the score threshold, the voiceprint verification result is inconsistent.

In one embodiment, as shown in FIG. 8 , step S150 includes sub-steps S151 , S152 and S153 .

S151, inputting the target dimensionality reduction feature parameter and any dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed into the voiceprint verification model to obtain corresponding output information;

The target dimensionality reduction feature parameter and a dimensionality reduction feature parameter of the corresponding candidate both contain parameter values of k dimensions, then the target dimensionality reduction feature parameter and a dimensionality reduction feature parameter are combined to obtain parameter values of 2k dimensions and input them after training. According to the voiceprint verification model, the output information is obtained through the calculation of the correlation formula in the voiceprint verification model, and the output information includes the output node values of the two output nodes.

S152. Calculate and obtain a verification score corresponding to the target dimensionality reduction feature parameter according to the output information; S153, determine whether the verification score is greater than a score threshold to obtain whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter voiceprint verification results.

Specifically, through the formula

The verification score corresponding to the target dimensionality reduction feature parameter is calculated, where f ₁ is the output node value of the first output node in the output information, and f ₂ is the output node value of the second output node in the output information. By judging whether the verification score is greater than the score threshold, a consistent voiceprint verification result can be obtained.

For example, the score threshold may be set to 50. If the verification score is greater than 50, the voiceprint verification result is consistent; otherwise, the voiceprint verification result is inconsistent.

S160. Perform speech recognition on the speech information to be analyzed according to a pre-stored speech recognition model to obtain target text information corresponding to the speech information to be analyzed.

The speech information to be analyzed can be recognized according to the speech recognition model to obtain corresponding target text information, wherein the speech recognition model includes an acoustic model, a speech feature dictionary and a semantic analysis model. First, the to-be-analyzed speech information is segmented according to the acoustic model in the speech recognition model to obtain a plurality of phonemes included in the to-be-analyzed speech information. The speech information to be analyzed is composed of phonemes of the pronunciation of a plurality of characters, and the phoneme of a character includes the frequency and timbre of the pronunciation of the character. The acoustic model contains the phonemes of all character pronunciations. By matching the speech information to be analyzed with all the phonemes in the acoustic model, the phonemes of a single character in the speech information to be analyzed can be segmented, and the speech information to be analyzed is finally obtained through segmentation. The multiple phonemes contained in .

Second, the phonemes are matched according to the speech feature dictionary in the speech recognition model to convert the phonemes into pinyin information. The phoneme information corresponding to all character pinyin is included in the phonetic feature dictionary. By matching the obtained phoneme with the phoneme information corresponding to the character pinyin, the phoneme of a single character can be converted into the phoneme matching the phoneme in the phonetic feature dictionary. , to convert all phonemes contained in the phonetic information into pinyin information.

Finally, semantic analysis is performed on the pinyin information according to the semantic analysis model in the speech recognition model to obtain target text information corresponding to the speech information to be analyzed. The semantic parsing model includes the mapping relationship between the pinyin information and the text information, and the obtained pinyin information can be semantically parsed through the mapping relationship included in the semantic parsing model to convert the pinyin information into the corresponding target text information. .

S170. Determine whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result.

A text judgment result is obtained by judging whether the target text information contains cheating words according to a preset text judgment model. Specifically, the text judgment model includes a plurality of cheat keywords and a text judgment neural network. First, it can be determined whether there is text information corresponding to the cheat keywords in the target text information. If so, the text judgment result contains cheat words. For example Cheating keywords can be "how to", "tell", "feel", etc. If there is no text information corresponding to cheating keywords in the target text information, the target text information can be converted into text codes, and the text codes can be input into the text judgment neural network for recognition, so as to obtain the output results of the text judgment neural network. The output result can be judged to determine whether the target text information has cheating tendency. If it is judged that the target text information has cheating tendency, the text judgment result is that it contains cheating words; if it is judged that the target text information does not have cheating tendency, the text judgment result is: Cheating vocabulary is not included.

Specifically, the target text information can be converted according to the pre-stored conversion dictionary, and the encoding value corresponding to each character in the target text information can be obtained and combined to obtain a text code, and the obtained text code adopts a numerical value for the feature of the target text information. The structure of the text judgment neural network is similar to that of the voiceprint verification model. The text encoding of the target text information is input into the text judgment neural network to obtain the corresponding output results. The output results can be represented by numerical values, and the judgment output Whether the result is greater than the preset cheating score value, if it is greater, it is judged that the target text information has a cheating tendency; if the output result is not greater than the preset cheating score value, it is judged that the obtained target text information does not have a cheating tendency.

S180. If the voiceprint verification result is inconsistent or the text judgment result is that the text contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued.

If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued. If the voiceprint verification result is inconsistent, it means that there are other candidates in the voice collection terminal corresponding to the current candidate, that is, the current candidate is passively communicating with other candidates, and it is determined that there is cheating and an alarm message is issued to remind the user terminal. The users of the system deal with the cheating behavior in a timely manner. If the result of the text judgment is that it contains cheating words, it means that the current examinee is actively communicating with other examinees, it is determined that there is cheating behavior, and an alarm prompt message is issued to prompt the user of the user terminal to deal with the cheating behavior in time.

The technical methods in this application can be applied to application scenarios such as smart government affairs/smart education, etc., which include judging cheating in exams based on speech recognition, so as to promote the construction of smart cities.

In the speech recognition-based exam cheating recognition method provided by the embodiment of the present application, the speech feature parameters of each sentence pronunciation are obtained from the basic speech information of each examinee, and the dimension reduction feature is performed to obtain the dimension reduction feature parameters. The parameters are used to train the initial voiceprint verification model, and use the voiceprint verification model to verify whether the target dimensionality reduction feature parameters of the voice information to be analyzed are consistent with the dimensionality reduction feature parameters of the corresponding candidates, and the target text information obtained by voice recognition for the voice information to be analyzed If the voiceprint verification result is inconsistent or the text judgment result contains cheating words, it will be judged that there is cheating and an alarm message will be sent. Through the above method, the speech information to be recognized of the examinee can be recognized based on the speech recognition, so as to realize the real-time and accurate judgment of the cheating behavior of the examinees.

Embodiments of the present application further provide a speech recognition-based exam cheating recognition device, which is used to execute any of the foregoing speech recognition-based exam cheating recognition methods. Specifically, please refer to FIG. 9. FIG. 9 is a schematic block diagram of an apparatus for recognizing exam cheating based on speech recognition provided by an embodiment of the present application. The apparatus for recognizing exam cheating based on speech recognition can be configured in the user terminal 10.

As shown in FIG. 9 , the apparatus 100 for detecting cheating based on speech recognition includes a speech feature parameter obtaining unit 110 , a dimension reduction processing unit 120 , a model training unit 130 , a target dimension reduction feature parameter obtaining unit 140 , and a voiceprint verification result obtaining unit 150 , a target text information acquisition unit 160 , a text judgment result acquisition unit 170 and a prompt information transmission unit 180 .

The voice feature parameter obtaining unit 110 is used to obtain the basic voice information corresponding to each candidate collected by the voice collecting terminal, and obtain the basic voice information corresponding to each candidate from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule. The speech feature parameters corresponding to the pronunciation of the sentence.

In one embodiment, the speech feature parameter acquisition unit 110 includes subunits: an audio information acquisition unit, an audio information conversion unit, an audio coefficient information acquisition unit, and a perceptual coefficient information acquisition unit.

The audio information acquisition unit is used to perform frame-by-frame processing on each of the sentence pronunciations to obtain corresponding multi-frame audio information; the audio information conversion unit is used for the multi-frame corresponding to each of the sentence pronunciations according to the spectrum conversion rule. The audio information is converted into an audio frequency spectrum; the audio coefficient information acquisition unit is used to acquire the audio coefficient information corresponding to each of the audio frequency spectrum according to the audio coefficient extraction rule; the perceptual coefficient information acquisition unit is used for according to the perceptual coefficient extraction rule Obtain perceptual coefficient information corresponding to each of the audio frequency spectrums.

In an embodiment, the audio coefficient information acquisition unit includes subunits: a frequency conversion unit and an inverse conversion processing unit.

a frequency conversion unit, configured to convert each of the audio frequency spectra into a corresponding nonlinear audio frequency spectrum according to the frequency conversion formula; an inverse transform processing unit, configured to convert each of the non-linear audio frequencies according to the inverse transform calculation formula The frequency spectrum is inversely transformed to obtain a plurality of audio coefficients corresponding to each of the non-linear audio frequency spectra as audio coefficient information of each of the audio frequency spectra.

In an embodiment, the perceptual coefficient information obtaining unit includes subunits: a frequency band energy vector obtaining unit and an inverse transform processing unit.

a frequency band energy vector obtaining unit, configured to filter each of the audio frequency spectra according to a plurality of frequency values included in the frequency array to obtain a frequency band energy vector corresponding to each of the audio frequency spectra; an inverse transform processing unit, used for After compressing the frequency band energy vector corresponding to each of the audio frequency spectrums, inverse transform is performed according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.

The dimension reduction processing unit 120 is configured to perform dimension reduction processing on each of the speech feature parameters according to a preset dimension reduction value to obtain a feature vector matrix and a dimension reduction feature parameter corresponding to the pronunciation of each sentence.

In one embodiment, the dimensionality reduction processing unit 120 includes subunits: a covariance matrix obtaining unit, a covariance matrix solving unit, an eigenvector matrix obtaining unit, and a matrix calculating unit.

A covariance matrix obtaining unit, used for integrating all the speech feature parameters into a parameter matrix and calculating the covariance matrix of the parameter matrix; a covariance matrix solving unit, used for solving the covariance eigenvalues of the covariance matrix and a covariance eigenvector corresponding to each of the covariance eigenvalues; an eigenvector matrix acquisition unit for selecting a plurality of covariance eigenvalues corresponding to the covariance eigenvalues that are the largest and equal to the dimensionality reduction value The covariance eigenvectors are combined to obtain an eigenvector matrix; a matrix calculation unit is used to multiply the parameter matrix and the eigenvector matrix to obtain a dimension reduction feature parameter corresponding to the pronunciation of each sentence.

The model training unit 130 is configured to perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each sentence and the preset model training rules to obtain a trained voiceprint verification model.

In one embodiment, the model training unit 130 includes subunits: a positive sample acquisition unit, a negative sample acquisition unit, a model output information acquisition unit, a loss value calculation unit, a loss value judgment unit, a parameter value update unit, and a voiceprint verification unit. The model determines the unit.

The positive sample acquisition unit is used to randomly select two dimensionality reduction feature parameters of the same candidate from the dimensionality reduction feature parameters as positive samples; the negative sample acquisition unit is used to randomly select different candidates from the dimensionality reduction feature parameters. Two dimensionality reduction feature parameters are used as negative samples; a model output information acquisition unit is used to input the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information; a loss value calculation unit is used for Calculate the model output information according to the loss value calculation formula to obtain a loss value; a loss value judgment unit is used to judge whether the loss value is less than the loss threshold value; a parameter value update unit is used to determine if the loss value is less than the loss value The value is not less than the loss threshold, and the update value of each parameter in the initialized voiceprint verification model is calculated according to the gradient calculation formula and the loss value to update the parameter value of the parameter, and returns to execute the The step of inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information; The fingerprint verification model is determined as the trained voiceprint verification model.

The target dimension reduction feature parameter obtaining unit 140 is configured to obtain the target corresponding to the voice information to be analyzed according to the extraction rule and the feature vector matrix if the voice information to be analyzed is received from any of the voice collection terminals. Dimensionality reduction feature parameters.

The voiceprint verification result obtaining unit 150 is configured to verify whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter of the examinee corresponding to the voice information to be analyzed according to the preset scoring threshold and the voiceprint verification model to obtain a voiceprint Validation results.

In one embodiment, the voiceprint verification result acquisition unit 150 includes subunits: an output information acquisition unit, a verification score acquisition unit, and a verification score judgment unit.

The output information acquisition unit is used for inputting the target dimensionality reduction feature parameter and any dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed into the voiceprint verification model to obtain corresponding output information; the verification score acquisition unit uses The verification score corresponding to the target dimension reduction feature parameter is calculated according to the output information; the verification score judgment unit is used for judging whether the verification score is greater than a scoring threshold to obtain the target dimension reduction feature parameter and the dimension reduction. The voiceprint verification result of whether the feature parameters are consistent.

The target text information acquisition unit 160 is configured to perform speech recognition on the to-be-analyzed speech information according to a pre-stored speech recognition model to obtain target text information corresponding to the to-be-analyzed speech information.

The text judgment result obtaining unit 170 is configured to judge whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result.

The prompt information sending unit 180 is configured to determine that cheating behavior exists and issue an alarm prompt message if the voiceprint verification result is inconsistent or the text judgment result is that the text contains cheating words.

The speech recognition-based exam cheating recognition device provided by the embodiment of the present application applies the above-mentioned speech recognition-based exam cheating recognition method, obtains the speech feature parameters of each sentence pronunciation from the basic speech information of each examinee, and performs dimensionality reduction to obtain Dimensionality reduction feature parameters, train the initialized voiceprint verification model according to the dimensionality reduction feature parameters, and use the voiceprint verification model to verify whether the target dimensionality reduction feature parameters of the voice information to be analyzed are consistent with the dimensionality reduction feature parameters of the corresponding candidates, and the speech to be analyzed is verified. It is judged whether the target text information obtained by voice recognition of the information contains cheating words. If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is sent. Through the above method, the speech information to be recognized of the examinee can be recognized based on the speech recognition, so as to realize the real-time and accurate judgment of the cheating behavior of the examinees.

The above-mentioned apparatus for recognizing exam cheating based on speech recognition can be implemented in the form of a computer program, and the computer program can be executed on a computer device as shown in FIG. 10 .

Please refer to FIG. 10. FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the present application. The computer device may be a user terminal 10 for executing a speech recognition-based exam cheating recognition method to judge exam cheating based on speech recognition.

Referring to FIG. 10 , the computer device 500 includes a processor 502 , a memory and a network interface 505 connected by a system bus 501 , wherein the memory may include a non-volatile storage medium 503 and an internal memory 504 .

The nonvolatile storage medium 503 can store an operating system 5031 and a computer program 5032 . When executed, the computer program 5032 can cause the processor 502 to execute a speech recognition-based exam cheating recognition method.

The processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500 .

The internal memory 504 provides an environment for the execution of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute the speech recognition-based exam cheating recognition method.

The network interface 505 is used for network communication, such as providing transmission of data information. Those skilled in the art can understand that the structure shown in FIG. 10 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 may include more or fewer components than shown, or combine certain components, or have a different arrangement of components.

Wherein, the processor 502 is configured to run the computer program 5032 stored in the memory, so as to realize the corresponding functions in the above-mentioned speech recognition-based examination cheating recognition method.

Those skilled in the art can understand that the embodiment of the computer device shown in FIG. 10 does not constitute a limitation on the specific structure of the computer device. Either some components are combined, or different components are arranged. For example, in some embodiments, the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are the same as those of the embodiment shown in FIG. 10 , which will not be repeated here.

It should be understood that, in this embodiment of the present application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein, the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.

In another embodiment of the present application, a computer-readable storage medium is provided. The computer-readable storage medium may be a non-volatile computer-readable storage medium, or a volatile computer-readable storage medium. The computer-readable storage medium stores a computer program, wherein when the computer program is executed by the processor, the steps included in the above-mentioned speech recognition-based exam cheating recognition method are implemented.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the above-described devices, devices and units, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here. Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the differences between hardware and software Interchangeability, the above description has generally described the components and steps of each example in terms of function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only logical function division. In actual implementation, there may be other division methods, or units with the same function may be grouped into one Units, such as multiple units or components, may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions of the embodiments of the present application.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application are essentially or part of contributions to the prior art, or all or part of the technical solutions can be embodied in the form of software products, and the computer software products are stored in a computer The read storage medium includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned computer-readable storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), magnetic disk or optical disk and other media that can store program codes.

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in the present application. Modifications or substitutions shall be covered by the protection scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A method for recognizing cheating in exams based on speech recognition, applied to a user terminal, the user terminal is connected with the voice collection terminal of each candidate through a network to transmit data information, wherein the method includes:

Obtain the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each section of the sentence from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule;

Perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;

Perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each of the sentences and the preset model training rules to obtain the trained voiceprint verification model;

If receiving the voice information to be analyzed from any of the voice collection terminals, obtain the target dimension reduction feature parameter corresponding to the voice information to be analyzed according to the extraction rule and the feature vector matrix;

According to the preset scoring threshold and the voiceprint verification model, verify whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed to obtain a voiceprint verification result;

Perform voice recognition on the voice information to be analyzed according to a pre-stored voice recognition model to obtain target text information corresponding to the voice information to be analyzed;

Judging whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;

If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued.
The method for recognizing cheating in exams based on speech recognition according to claim 1, wherein the extraction rules include spectrum conversion rules, audio coefficient extraction rules and perceptual coefficient extraction rules, and each of the speech feature parameters includes a pronunciation of the sentence The audio coefficient information and perceptual coefficient information, described according to the preset extraction rule, from the pronunciation of the multi-segment sentences included in each basic voice information, obtain the speech feature parameters corresponding to the pronunciation of each segment of the sentence, including:

Framing processing is carried out to each described sentence pronunciation to obtain corresponding multi-frame audio information;

Convert the multi-frame audio information corresponding to the pronunciation of each statement into an audio spectrum according to the spectrum conversion rule;

Obtain audio coefficient information corresponding to each of the audio frequency spectrums according to the audio coefficient extraction rule;

The perceptual coefficient information corresponding to each of the audio frequency spectrums is acquired according to the perceptual coefficient extraction rule.
The method for recognizing cheating in exams based on speech recognition according to claim 2, wherein the audio coefficient extraction rule includes a frequency conversion formula and an inverse conversion calculation formula, and each audio frequency spectrum is obtained according to the audio coefficient extraction rule Corresponding audio coefficient information, including:

converting each of the audio frequency spectra into a corresponding non-linear audio frequency spectrum according to the frequency conversion formula;

Perform inverse transformation on each of the nonlinear audio spectrums according to the inverse transform calculation formula to obtain a plurality of audio coefficients corresponding to each of the nonlinear audio spectrums as audio coefficient information of each of the audio spectrums.
The method for detecting cheating in exams based on speech recognition according to claim 2, wherein the perceptual coefficient extraction rule comprises a frequency array and an inverse transform calculation formula, and the acquisition of each audio frequency spectrum corresponding to the perceptual coefficient extraction rule The perception coefficient information of , including:

Filtering each of the audio frequency spectra according to a plurality of frequency values included in the frequency array to obtain a frequency band energy vector corresponding to each of the audio frequency spectra;

After compressing the frequency band energy vector corresponding to each of the audio frequency spectrums, inverse transform is performed according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.
The method for recognizing cheating in exams based on speech recognition according to claim 1, wherein the dimension reduction process is performed on each of the speech feature parameters according to a preset dimension reduction value to obtain a feature vector matrix and a correlation with each of the sentences The dimensionality reduction feature parameters corresponding to pronunciation, including:

Integrate all described speech characteristic parameters into a parameter matrix and calculate the covariance matrix of described parameter matrix;

Solving the covariance eigenvalues of the covariance matrix and the covariance eigenvectors corresponding to each of the covariance eigenvalues;

An eigenvector matrix is obtained according to the combination of covariance eigenvectors corresponding to a plurality of covariance eigenvalues with the largest covariance eigenvalue and the dimensionality reduction value being equal to the selection;

Multiplying the parameter matrix and the feature vector matrix to obtain a dimension reduction feature parameter corresponding to the pronunciation of each sentence.
The method for recognizing cheating in exams based on speech recognition according to claim 1, wherein the model training rule includes a loss value calculation formula, a gradient calculation formula and a loss threshold, and the dimension reduction feature parameter according to the pronunciation of each sentence and the preset model training rules to iteratively train the initialized voiceprint verification model to obtain the trained voiceprint verification model, including:

Two dimensionality reduction feature parameters of the same candidate are randomly selected as positive samples from the dimensionality reduction feature parameters;

Two dimensionality reduction feature parameters of different candidates are randomly selected as negative samples from the dimensionality reduction feature parameters;

Inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information;

Calculate the model output information according to the loss value calculation formula to obtain a loss value;

Determine whether the loss value is less than the loss threshold;

If the loss value is not less than the loss threshold, calculate the update value of each parameter in the initialized voiceprint verification model according to the gradient calculation formula and the loss value to update the parameter value of the parameter, and return performing the step of inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information;

If the loss value is less than the loss threshold, the voiceprint verification model is determined as the trained voiceprint verification model.
The method for recognizing cheating in exams based on speech recognition according to claim 1, wherein the verification of the target dimensionality reduction feature parameter and the candidate corresponding to the voice information to be analyzed according to a preset scoring threshold and the voiceprint verification model Whether the dimensionality reduction feature parameters of are consistent to obtain the voiceprint verification results, including:

Inputting the target dimensionality reduction feature parameter and any dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed into the voiceprint verification model to obtain corresponding output information;

The verification score corresponding to the target dimensionality reduction feature parameter is obtained by calculating according to the output information;

Judging whether the verification score is greater than a score threshold, obtains a voiceprint verification result indicating whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter.
A test cheating recognition device based on speech recognition, comprising:

The voice feature parameter acquisition unit is used to obtain the basic voice information corresponding to each candidate collected by the voice acquisition terminal, and obtains the basic voice information corresponding to each candidate from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule. The speech feature parameters corresponding to the pronunciation;

A dimensionality reduction processing unit, configured to perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;

A model training unit, configured to perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each said sentence and the preset model training rules to obtain the trained voiceprint verification model;

The target dimensionality reduction feature parameter acquisition unit is used to obtain the target reduction corresponding to the to-be-analyzed voice information according to the extraction rule and the feature vector matrix if the to-be-analyzed voice information is received from any of the voice collection terminals. dimensional feature parameters;

The voiceprint verification result obtaining unit is used for verifying whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter of the examinee corresponding to the voice information to be analyzed according to the preset scoring threshold and the voiceprint verification model to obtain the voiceprint verification result;

a target text information acquisition unit, configured to perform speech recognition on the to-be-analyzed speech information according to a pre-stored speech recognition model to obtain target text information corresponding to the to-be-analyzed speech information;

a text judgment result obtaining unit, configured to judge whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;

A prompt information sending unit, configured to determine that cheating behavior exists and issue an alarm prompt message if the voiceprint verification result is inconsistent or the text judgment result contains cheating words.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the following steps when executing the computer program:

Obtain the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each section of the sentence from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule;

Perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;

Perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each of the sentences and the preset model training rules to obtain the trained voiceprint verification model;

If receiving the voice information to be analyzed from any of the voice collection terminals, obtain the target dimension reduction feature parameter corresponding to the voice information to be analyzed according to the extraction rule and the feature vector matrix;

According to the preset scoring threshold and the voiceprint verification model, verify whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed to obtain a voiceprint verification result;

Perform voice recognition on the voice information to be analyzed according to a pre-stored voice recognition model to obtain target text information corresponding to the voice information to be analyzed;

Judging whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;

If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued.
The computer device according to claim 9, wherein the extraction rules include spectrum conversion rules, audio coefficient extraction rules and perceptual coefficient extraction rules, and each of the speech feature parameters includes audio coefficient information and perceptual coefficient information of the pronunciation of the sentence. The coefficient information, described according to the preset extraction rule, obtains the speech feature parameters corresponding to the pronunciation of each paragraph of the sentence from the pronunciation of the multi-segment sentences included in each basic speech information, including:

Framing processing is carried out to each described sentence pronunciation to obtain corresponding multi-frame audio information;

Convert the multi-frame audio information corresponding to the pronunciation of each statement into an audio spectrum according to the spectrum conversion rule;

Obtain audio coefficient information corresponding to each of the audio frequency spectrums according to the audio coefficient extraction rule;

The perceptual coefficient information corresponding to each of the audio frequency spectrums is acquired according to the perceptual coefficient extraction rule.
The computer device according to claim 10, wherein the audio coefficient extraction rule includes a frequency conversion formula and an inverse conversion calculation formula, and the audio coefficient information corresponding to each audio frequency spectrum is obtained according to the audio coefficient extraction rule, include:

converting each of the audio frequency spectra into a corresponding non-linear audio frequency spectrum according to the frequency conversion formula;

Perform inverse transformation on each of the nonlinear audio spectrums according to the inverse transform calculation formula to obtain a plurality of audio coefficients corresponding to each of the nonlinear audio spectrums as audio coefficient information of each of the audio spectrums.
The computer device according to claim 10, wherein the perceptual coefficient extraction rule comprises a frequency array and an inverse transform calculation formula, and the acquisition of perceptual coefficient information corresponding to each of the audio frequency spectrum according to the perceptual coefficient extraction rule comprises: :

Filtering each of the audio frequency spectra according to a plurality of frequency values included in the frequency array to obtain a frequency band energy vector corresponding to each of the audio frequency spectra;

After compressing the frequency band energy vector corresponding to each of the audio frequency spectrums, inverse transform is performed according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.
The computer device according to claim 9, wherein the dimension reduction process is performed on each of the speech feature parameters according to a preset dimension reduction value to obtain a feature vector matrix and a dimension reduction feature corresponding to the pronunciation of each sentence parameters, including:

Integrate all described speech characteristic parameters into a parameter matrix and calculate the covariance matrix of described parameter matrix;

Solving the covariance eigenvalues of the covariance matrix and the covariance eigenvectors corresponding to each of the covariance eigenvalues;

An eigenvector matrix is obtained according to the combination of covariance eigenvectors corresponding to a plurality of covariance eigenvalues with the largest covariance eigenvalue and the dimensionality reduction value being equal to the selection;

Multiplying the parameter matrix and the feature vector matrix to obtain a dimension reduction feature parameter corresponding to the pronunciation of each sentence.
The computer device according to claim 9, wherein the model training rule includes a loss value calculation formula, a gradient calculation formula and a loss threshold, and the model training is based on the dimensionality reduction feature parameters of the pronunciation of each sentence and a preset model The rules iteratively train the initialized voiceprint verification model to obtain the trained voiceprint verification model, including:

Two dimensionality reduction feature parameters of the same candidate are randomly selected as positive samples from the dimensionality reduction feature parameters;

Two dimensionality reduction feature parameters of different candidates are randomly selected as negative samples from the dimensionality reduction feature parameters;

Inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information;

Calculate the model output information according to the loss value calculation formula to obtain a loss value;

Determine whether the loss value is less than the loss threshold;

If the loss value is not less than the loss threshold, calculate the update value of each parameter in the initialized voiceprint verification model according to the gradient calculation formula and the loss value to update the parameter value of the parameter, and return performing the step of inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information;

If the loss value is less than the loss threshold, the voiceprint verification model is determined as the trained voiceprint verification model.
The computer device according to claim 9, wherein, according to the preset scoring threshold and the voiceprint verification model, it is to verify whether the target dimensionality reduction feature parameter and the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed are not Consistently obtain voiceprint verification results, including:

Inputting the target dimensionality reduction feature parameter and any dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed into the voiceprint verification model to obtain corresponding output information;

The verification score corresponding to the target dimensionality reduction feature parameter is obtained by calculating according to the output information;

Judging whether the verification score is greater than a score threshold, obtains a voiceprint verification result indicating whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program that, when executed by a processor, causes the processor to perform the following operations:

Obtain the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each section of the sentence from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule;

Perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;

Perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each of the sentences and the preset model training rules to obtain the trained voiceprint verification model;

If receiving the voice information to be analyzed from any of the voice collection terminals, obtain the target dimension reduction feature parameter corresponding to the voice information to be analyzed according to the extraction rule and the feature vector matrix;

According to the preset scoring threshold and the voiceprint verification model, verify whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed to obtain a voiceprint verification result;

Perform voice recognition on the voice information to be analyzed according to a pre-stored voice recognition model to obtain target text information corresponding to the voice information to be analyzed;

Judging whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;

If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued.
The computer-readable storage medium according to claim 16, wherein the extraction rules include spectral conversion rules, audio coefficient extraction rules, and perceptual coefficient extraction rules, and each of the speech feature parameters includes an audio coefficient for the pronunciation of the sentence Information and perception coefficient information, described according to preset extraction rules from the pronunciation of the multi-segment sentences included in each basic voice information to obtain the speech feature parameters corresponding to the pronunciation of each segment of the sentence, including:

Framing processing is carried out to each described sentence pronunciation to obtain corresponding multi-frame audio information;

Convert the multi-frame audio information corresponding to the pronunciation of each statement into an audio spectrum according to the spectrum conversion rule;

Obtain audio coefficient information corresponding to each of the audio frequency spectrums according to the audio coefficient extraction rule;

The perceptual coefficient information corresponding to each of the audio frequency spectrums is acquired according to the perceptual coefficient extraction rule.
The computer-readable storage medium according to claim 17, wherein the audio coefficient extraction rule comprises a frequency conversion formula and an inverse conversion calculation formula, and the audio frequency corresponding to each audio frequency spectrum is acquired according to the audio coefficient extraction rule Coefficient information, including:

converting each of the audio frequency spectra into a corresponding non-linear audio frequency spectrum according to the frequency conversion formula;

Perform inverse transformation on each of the nonlinear audio spectrums according to the inverse transform calculation formula to obtain a plurality of audio coefficients corresponding to each of the nonlinear audio spectrums as audio coefficient information of each of the audio spectrums.
The computer-readable storage medium according to claim 17, wherein the perceptual coefficient extraction rule comprises a frequency array and an inverse transform calculation formula, and the perceptual coefficient corresponding to each audio frequency spectrum is acquired according to the perceptual coefficient extraction rule information, including:

Filtering each of the audio frequency spectra according to a plurality of frequency values included in the frequency array to obtain a frequency band energy vector corresponding to each of the audio frequency spectra;

After compressing the frequency band energy vector corresponding to each of the audio frequency spectrums, inverse transform is performed according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.
The computer-readable storage medium according to claim 16, wherein the feature vector matrix and the corresponding pronunciation of each sentence are obtained by performing dimension reduction processing on each of the speech feature parameters according to a preset dimension reduction value. Dimensionality reduction feature parameters, including:

Integrate all described speech characteristic parameters into a parameter matrix and calculate the covariance matrix of described parameter matrix;

Solving the covariance eigenvalues of the covariance matrix and the covariance eigenvectors corresponding to each of the covariance eigenvalues;

An eigenvector matrix is obtained according to the combination of covariance eigenvectors corresponding to a plurality of covariance eigenvalues that have the largest covariance eigenvalue and are equal to the dimensionality reduction value;

Multiplying the parameter matrix and the feature vector matrix to obtain a dimension reduction feature parameter corresponding to the pronunciation of each sentence.