WO2022127042A1 - Examination cheating recognition method and apparatus based on speech recognition, and computer device - Google Patents

Examination cheating recognition method and apparatus based on speech recognition, and computer device Download PDF

Info

Publication number
WO2022127042A1
WO2022127042A1 PCT/CN2021/097100 CN2021097100W WO2022127042A1 WO 2022127042 A1 WO2022127042 A1 WO 2022127042A1 CN 2021097100 W CN2021097100 W CN 2021097100W WO 2022127042 A1 WO2022127042 A1 WO 2022127042A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
audio
dimensionality reduction
pronunciation
voiceprint verification
Prior art date
Application number
PCT/CN2021/097100
Other languages
French (fr)
Chinese (zh)
Inventor
苏雪琦
王健宗
程宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022127042A1 publication Critical patent/WO2022127042A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of artificial intelligence technology, and belongs to the application scenario of judging cheating in exams based on speech recognition in smart cities, and in particular, relates to a method, device and computer equipment for recognizing cheating in exams based on speech recognition.
  • More and more selection and evaluation processes are carried out by examination to ensure the fairness of selection or evaluation, such as civil servant selection, CET-4 and CET-6 test evaluation, driver's license test evaluation, etc.
  • invigilators will be arranged in the examination room to invigilate the examination.
  • the invigilators cannot monitor each candidate at all times, resulting in unsatisfactory invigilation results.
  • video surveillance is used to assist invigilators in invigilating the exam, and the video is analyzed to determine the specific location of the cheating examinee.
  • the real-time behavior of behavior judgment cannot be guaranteed, and the surveillance video can only analyze whether the examinee's body movements have cheating behaviors through images. If the examinees communicate with each other and the body movements are small, the obtained surveillance video cannot accurately identify the examinee's cheating Behavior. Therefore, the prior art method has the problem of being unable to make real-time and accurate judgments on the cheating behaviors between candidates.
  • the embodiments of the present application provide a method, device, and computer equipment for detecting cheating in an exam based on speech recognition, which aim to solve the problem in the prior art that the cheating behavior of exchanges between candidates cannot be judged in real time and accurately.
  • an embodiment of the present application provides a method for recognizing cheating in an exam based on speech recognition, which includes:
  • the preset scoring threshold and the voiceprint verification model verify whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed to obtain a voiceprint verification result;
  • an embodiment of the present application provides a speech recognition-based examination cheating recognition device, which includes:
  • the voice feature parameter acquisition unit is used to obtain the basic voice information corresponding to each candidate collected by the voice acquisition terminal, and obtains the basic voice information corresponding to each candidate from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule.
  • a dimensionality reduction processing unit configured to perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;
  • a model training unit configured to perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each said sentence and the preset model training rules to obtain the trained voiceprint verification model;
  • the target dimensionality reduction feature parameter acquisition unit is used to obtain the target reduction corresponding to the to-be-analyzed voice information according to the extraction rule and the feature vector matrix if the to-be-analyzed voice information is received from any of the voice collection terminals.
  • the voiceprint verification result obtaining unit is used for verifying whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter of the examinee corresponding to the voice information to be analyzed according to the preset scoring threshold and the voiceprint verification model to obtain the voiceprint verification result;
  • a target text information acquisition unit configured to perform speech recognition on the to-be-analyzed speech information according to a pre-stored speech recognition model to obtain target text information corresponding to the to-be-analyzed speech information;
  • a text judgment result obtaining unit configured to judge whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result
  • a prompt information sending unit configured to determine that cheating behavior exists and issue an alarm prompt message if the voiceprint verification result is inconsistent or the text judgment result contains cheating words.
  • an embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer
  • the program implements the speech recognition-based exam cheating recognition method described in the first aspect.
  • an embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when executed by a processor, the computer program causes the processor to execute the above-mentioned first step.
  • the speech recognition-based exam cheating recognition method is described.
  • the embodiments of the present application provide a method, device and computer equipment for detecting cheating in an exam based on speech recognition.
  • the initialized voiceprint verification model is trained, and the voiceprint verification model is used to analyze the speech. Whether the target dimensionality reduction feature parameters of the information are consistent with the dimensionality reduction feature parameters of the corresponding candidates is verified, and whether the target text information obtained by speech recognition from the analyzed voice information contains cheating words. If the voiceprint verification result is inconsistent or text judgment If the result is that the word contains cheating words, it is determined that there is cheating behavior and an alarm prompt message is sent.
  • the speech information to be recognized of the examinee can be recognized based on the speech recognition, so as to realize the real-time and accurate judgment of the cheating behavior of the examinees.
  • FIG. 1 is a schematic flowchart of a speech recognition-based exam cheating recognition method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of an application scenario of a speech recognition-based exam cheating recognition method provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a sub-flow of a speech recognition-based exam cheating recognition method provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application.
  • FIG. 7 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application.
  • FIG. 8 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a speech recognition-based examination cheating recognition device provided by an embodiment of the present application.
  • FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for recognizing cheating in an exam based on speech recognition provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of an application scenario of the method for recognizing cheating in an exam based on speech recognition provided by an embodiment of the present application
  • the test cheating recognition method based on speech recognition is applied in the user terminal 10
  • the method is executed by the application software installed in the user terminal 10
  • the user terminal 10 and the voice collection terminal 20 of each candidate are connected through the network to perform data analysis
  • the user terminal 10 is a terminal device used to perform a speech recognition-based examination cheating identification method to realize the judgment of whether the examinee has cheating in the examination, such as a desktop computer, a notebook computer, a tablet computer or a mobile phone and other terminal equipment
  • the voice collection terminal 20 is a terminal used for real-time collection of the voice information sent by the examinee, such as a microphone, etc., then each examinee at the test site
  • S110 Acquire the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each sentence from the pronunciations of multiple sentences contained in each basic voice information according to a preset extraction rule .
  • each candidate can pass the corresponding information for each candidate.
  • the voice collection terminal collects the corresponding basic voice information, and the basic voice information of each examinee includes the pronunciation of multiple sentences.
  • Each speech pronunciation corresponds to a sentence spoken by a candidate.
  • the corresponding speech feature parameters can be obtained from the pronunciation of each sentence according to the extraction rules.
  • the speech feature parameters can quantify the audio features of the pronunciation of a sentence.
  • the parameters include audio coefficient information and perceptual coefficient information of the pronunciation of a sentence.
  • the audio coefficient information can be the Mel Frequency Cepstrum Coefficient (MFCC) corresponding to the pronunciation of the sentence, and the perceptual coefficient information can be the perception corresponding to the pronunciation of the sentence.
  • the extraction rules include spectral conversion rules, audio coefficient extraction rules, and perceptual coefficient extraction rules.
  • the pronunciation of each sentence can be spectrum converted according to the spectrum conversion rules, and the audio frequency spectrum obtained after the spectrum conversion can be analyzed according to the audio coefficient extraction rules to obtain the audio coefficient information, and the audio frequency spectrum can be analyzed according to the perceptual coefficient extraction rules to obtain the perceptual coefficient information. .
  • step S110 includes sub-steps S111 , S112 , S113 and S114 .
  • Sentence pronunciation is represented in the computer by the spectrogram containing the audio track.
  • the spectrogram contains many frames, each frame corresponds to a time unit, and the audio information of each frame can be obtained from the spectrogram of the sentence pronunciation.
  • each frame of audio information corresponds to the audio information contained in a time unit in the spectrogram.
  • FFT Fast Fourier Transform
  • S113 Acquire audio coefficient information corresponding to each of the audio frequency spectrums according to the audio coefficient extraction rule.
  • the audio coefficient information can be extracted from each audio frequency spectrum through the audio coefficient extraction rule.
  • the audio coefficient extraction rule includes a frequency conversion formula and an inverse conversion calculation formula.
  • step S113 includes sub-steps S1131 and S1132.
  • the linearly expressed audio spectrum is converted into a nonlinear audio spectrum.
  • the human auditory system is a special nonlinear system, and its sensitivity to different frequency signals is different.
  • the characteristic of sensitivity for perception can simulate the characterization of the audio signal by the human auditory system through the nonlinear audio spectrum, and further obtain the characteristics consistent with the human auditory system.
  • Both the audio spectrum and the nonlinear audio spectrum can be represented by a spectrum curve, and the spectrum curve is composed of multiple continuous spectrum values.
  • the frequency conversion formula can be expressed by formula (1):
  • mel(f) is the spectral value of the converted nonlinear audio spectrum
  • f is the frequency value of the audio audio
  • Each nonlinear audio spectrum can be inversely transformed according to the inverse transform calculation formula. Specifically, a discrete cosine transform (Discrete Cosine Transform, DCT) is performed after taking the logarithm of the obtained nonlinear audio spectrum, and the discrete cosine transform is performed. The 2nd to 13th coefficients are combined to obtain audio coefficients corresponding to the non-linear audio frequency spectrum, and audio coefficient information corresponding to each audio frequency spectrum can be obtained by acquiring the audio coefficients corresponding to each non-linear audio frequency.
  • DCT Discrete Cosine Transform
  • the perceptual coefficient information can be extracted from each audio frequency spectrum through the perceptual coefficient extraction rule.
  • the perceptual coefficient extraction rule includes a frequency array and an inverse transform calculation formula.
  • step S114 includes sub-steps S1141 and S1142.
  • the frequency array contains multiple frequency values, and an equal loudness curve filter can be performed on an audio spectrum according to the multiple frequency values, so as to obtain a frequency band energy vector corresponding to the audio spectrum and each frequency value.
  • the frequency array can be expressed as ⁇ 250, 350, 450, 570, 700, 840, 1000, 1170, 1370, 1600, 1850, 2150, 2500, 2900, 3400 ⁇ .
  • S1142 Compress the frequency band energy vector corresponding to each of the audio frequency spectrums, and then perform inverse transformation according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.
  • the inverse fast Fourier transform of 30 points can be performed on the band energy vector after the compression calculation to obtain the coefficient values corresponding to the 30 points, and the first 15 coefficient values are obtained as the perceptual coefficient information of the corresponding audio spectrum.
  • the speech characteristic parameters of the audio frequency spectrum can be obtained.
  • Each of the obtained speech feature parameters contains multiple parameter values corresponding to multiple dimensions.
  • the multiple parameter values of some dimensions in the speech feature parameters are too concentrated, so this part of the parameter values cannot be pronounced for multiple sentences.
  • the difference in the corresponding dimension is clearly reflected, that is, it is difficult for this part of the parameter value to highlight the characteristics of the pronunciation of the corresponding sentence in the dimension corresponding to the parameter value.
  • Each speech feature parameter is subjected to dimensionality reduction processing, and the dimension that cannot highlight the difference in pronunciation of multiple sentences is removed, and the dimensionality reduction feature parameter is obtained. ground analysis. After each speech feature parameter is subjected to dimension reduction processing, a dimension reduction feature parameter is obtained, and the number of dimensions included in the dimension reduction feature parameter is smaller than the number of dimensions included in the speech feature parameter.
  • step S120 includes sub-steps S121 , S122 , S123 and S124 .
  • Each voice feature parameter contains multiple parameter values of the same dimension, and the voice feature parameters can be integrated to obtain a parameter matrix.
  • the number of voice feature parameters can be expressed as m, and the number of dimensions in the voice feature parameters can be expressed as is n, the parameter matrix obtained by combination is X m ⁇ n , then the parameter matrix can be represented as a matrix with m rows and n columns, and the values contained in the parameter matrix are the parameter values contained in each speech feature parameter.
  • the covariance matrix of the parameter matrix can be calculated, and the specific calculation can be expressed by formula (2):
  • the covariance proof represents the feature distribution of multiple speech feature parameters in n directions in the n-dimensional space.
  • some specific directions and multiple speech feature parameters can be determined from the n-dimensional space.
  • the feature set is distributed in the specific direction determined, and the size of the feature value reflects the feature difference of multiple speech feature parameters in the direction corresponding to the feature value, delete the dimension corresponding to the direction with the smaller feature value, and keep the main direction corresponding to The dimension of speech feature parameters can be reduced to achieve the purpose of dimensionality reduction.
  • Orthogonal triangular decomposition algorithm QR decomposition algorithm
  • Jocobi iteration algorithm singular value decomposition algorithm
  • SVD algorithm singular value decomposition algorithm
  • other mathematical calculation methods can be used to solve the n covariance eigenvalues of the covariance matrix and the corresponding n covariance eigenvectors, A covariance eigenvalue corresponds to a covariance eigenvector.
  • the dimensionality reduction value is k (k ⁇ n)
  • each covariance eigenvector is a vector with n rows and 1 column
  • k are selected from the n covariance eigenvectors for combination
  • the obtained eigenvector matrix can be expressed as W n ⁇ k ;
  • the dimension reduction feature parameters corresponding to the pronunciation of each sentence can be obtained according to the matrix calculation result.
  • the dimensionality reduction feature parameter of each said sentence pronunciation wherein, the k parameter values of the i-th row are the dimensionality reduction feature parameters corresponding to the inputted i-th speech feature parameter, and the dimensionality reduction feature parameters include k-dimensional parameters value.
  • S130 Perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameter of each sentence pronunciation and the preset model training rule to obtain a trained voiceprint verification model.
  • the initialized voiceprint verification model is iteratively trained according to the dimensionality reduction feature parameters of the pronunciation of each sentence and the preset model training rules to obtain the trained voiceprint verification model.
  • the model training rule includes a loss value calculation formula, a gradient calculation formula and a loss threshold.
  • the pre-stored initial voiceprint verification model can be trained, and the trained voiceprint verification model can be used for voiceprint verification to improve the accuracy of verification. Specific rules for conducting training.
  • step S130 includes sub-steps S131 , S132 , S133 , S134 , S135 , S136 and S137 .
  • S131 Randomly select two dimension-reduction characteristic parameters of the same examinee from the dimension-reduction characteristic parameters as positive samples; S132, randomly select two dimension-reduction characteristic parameters of different examinees from the dimension-reduction characteristic parameters as negative samples.
  • the obtained dimensionality reduction feature parameters of the pronunciation of each sentence can be used as sample data to train the initialized voiceprint verification model, and the two dimensionality reduction feature parameters corresponding to the pronunciation of two sentences of the same examinee can be obtained from the dimensionality reduction feature parameters as positive values.
  • a positive sample or a negative sample can be used to train the initialized voiceprint verification model once.
  • the voiceprint verification model is a neural network model constructed based on artificial intelligence.
  • the voiceprint verification model consists of an input layer, multiple intermediate layers and an output layer.
  • the input layer contains multiple input nodes, and the number of input nodes is related to the two.
  • the total number of dimensions contained in the dimensionality reduction feature parameters is equal. If a dimensionality reduction feature parameter contains parameter values of k dimensions, the input layer contains 2k input nodes, the output layer contains two output nodes, and the gap between the input layer and the middle layer is , between the middle layer and other adjacent middle layers, and between the middle layer and the output layer are related by association formulas, each association formula contains corresponding parameters, the process of training the voiceprint verification model is the association The parameter values of the parameters in the formula are adjusted.
  • the model output information includes the output node values of the two output nodes.
  • the value of the first output node is The output node value is the predicted probability value of the two dimensionality reduction feature parameters that are consistent
  • the output node value of the second output node is the predicted probability value of the two dimensionality reduction feature parameters that are inconsistent.
  • the value of each output node is taken.
  • the value range is [0, 1].
  • the loss value corresponding to the model output information can be calculated according to the loss value calculation formula. Specifically, if the positive samples are input into the voiceprint verification model, the corresponding loss value calculation formula can be expressed as: If the negative samples are input into the voiceprint verification model, the corresponding loss value calculation formula can be expressed as: Among them, f 1 is the output node value of the first output node in the model output information, and f 2 is the output node value of the second output node in the model output information.
  • S135 determine whether the loss value is less than the loss threshold; S136, if the loss value is not less than the loss threshold, calculate each value in the initialization voiceprint verification model according to the gradient calculation formula and the loss value An update value of a parameter is used to update the parameter value of the parameter, and returns to the step of inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information; S137, if so If the loss value is less than the loss threshold, the voiceprint verification model is determined as the trained voiceprint verification model.
  • the loss value is less than the loss threshold, if the loss value is less than the loss threshold, it means that the currently obtained voiceprint verification model can meet the needs of use, and the currently obtained voiceprint verification model is determined as the trained voiceprint verification model; If the loss value is not less than the loss threshold, it indicates that the currently obtained voiceprint verification model cannot meet the usage requirements, and the parameter values of the parameters in the voiceprint verification model need to be adjusted, and the voiceprint verification model based on the adjusted parameter values is calculated again. The new loss value, and repeatedly judge whether the new loss value is less than the loss threshold, until the obtained voiceprint verification model meets the needs of use.
  • the updated value of each parameter in the voiceprint verification model can be calculated according to the gradient calculation formula to update the original parameter value of each parameter.
  • the calculated value obtained by calculating a positive sample or a negative sample for a parameter in the voiceprint verification model is input into the gradient calculation formula, and combined with the loss value obtained from the above calculation, the update corresponding to the parameter can be calculated. value, this calculation process is also known as gradient descent calculation.
  • the gradient calculation formula can be expressed as:
  • ⁇ r is the original parameter value of the parameter r
  • is the preset learning rate in the gradient calculation formula
  • the target dimension reduction feature parameter corresponding to the to-be-analyzed speech information is acquired according to the extraction rule and the feature vector matrix.
  • the process of training the initialized voiceprint verification model can be completed before the start of the test. After the test is started, the voices around each candidate are collected through each voice collection terminal.
  • the target dimension reduction feature parameters corresponding to the speech information to be analyzed can be obtained according to the extraction rule and the feature vector matrix.
  • the target speech characteristic parameters are obtained from the speech information to be analyzed according to the extraction rules, and the specific method for obtaining the target speech characteristic parameters is the same as the specific method for obtaining the speech characteristic parameters corresponding to the pronunciation of a sentence, which will not be repeated here;
  • the target dimension reduction feature parameters corresponding to the speech information to be analyzed can be obtained by multiplying the feature parameters by the feature vector matrix.
  • the preset scoring threshold and the voiceprint verification model it is possible to verify whether the target dimensionality reduction feature parameters are consistent with the dimensionality reduction feature parameters of the corresponding examinee, and obtain the voiceprint verification result.
  • any dimension reduction feature parameter of a candidate corresponding to the speech collection terminal is obtained, and the target dimension reduction feature parameter and the dimension reduction feature parameter are combined and input for voiceprint verification
  • the model obtains the corresponding output information. Based on the output information, the corresponding verification score can be calculated to determine whether the verification score is greater than the score threshold. If it is greater, the voiceprint verification result is consistent; if the verification score is not greater than the score threshold, the voiceprint verification result is inconsistent.
  • step S150 includes sub-steps S151 , S152 and S153 .
  • the target dimensionality reduction feature parameter and a dimensionality reduction feature parameter of the corresponding candidate both contain parameter values of k dimensions, then the target dimensionality reduction feature parameter and a dimensionality reduction feature parameter are combined to obtain parameter values of 2k dimensions and input them after training.
  • the output information is obtained through the calculation of the correlation formula in the voiceprint verification model, and the output information includes the output node values of the two output nodes.
  • S152 Calculate and obtain a verification score corresponding to the target dimensionality reduction feature parameter according to the output information; S153, determine whether the verification score is greater than a score threshold to obtain whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter voiceprint verification results.
  • the verification score corresponding to the target dimensionality reduction feature parameter is calculated, where f 1 is the output node value of the first output node in the output information, and f 2 is the output node value of the second output node in the output information.
  • the score threshold may be set to 50. If the verification score is greater than 50, the voiceprint verification result is consistent; otherwise, the voiceprint verification result is inconsistent.
  • S160 Perform speech recognition on the speech information to be analyzed according to a pre-stored speech recognition model to obtain target text information corresponding to the speech information to be analyzed.
  • the speech information to be analyzed can be recognized according to the speech recognition model to obtain corresponding target text information, wherein the speech recognition model includes an acoustic model, a speech feature dictionary and a semantic analysis model.
  • the to-be-analyzed speech information is segmented according to the acoustic model in the speech recognition model to obtain a plurality of phonemes included in the to-be-analyzed speech information.
  • the speech information to be analyzed is composed of phonemes of the pronunciation of a plurality of characters, and the phoneme of a character includes the frequency and timbre of the pronunciation of the character.
  • the acoustic model contains the phonemes of all character pronunciations.
  • the phonemes of a single character in the speech information to be analyzed can be segmented, and the speech information to be analyzed is finally obtained through segmentation.
  • the phonemes are matched according to the speech feature dictionary in the speech recognition model to convert the phonemes into pinyin information.
  • the phoneme information corresponding to all character pinyin is included in the phonetic feature dictionary.
  • semantic analysis is performed on the pinyin information according to the semantic analysis model in the speech recognition model to obtain target text information corresponding to the speech information to be analyzed.
  • the semantic parsing model includes the mapping relationship between the pinyin information and the text information, and the obtained pinyin information can be semantically parsed through the mapping relationship included in the semantic parsing model to convert the pinyin information into the corresponding target text information. .
  • S170 Determine whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result.
  • a text judgment result is obtained by judging whether the target text information contains cheating words according to a preset text judgment model.
  • the text judgment model includes a plurality of cheat keywords and a text judgment neural network.
  • the text judgment result contains cheat words.
  • Cheating keywords can be "how to”, “tell", “feel”, etc.
  • the target text information can be converted into text codes, and the text codes can be input into the text judgment neural network for recognition, so as to obtain the output results of the text judgment neural network.
  • the output result can be judged to determine whether the target text information has cheating tendency. If it is judged that the target text information has cheating tendency, the text judgment result is that it contains cheating words; if it is judged that the target text information does not have cheating tendency, the text judgment result is: Cheating vocabulary is not included.
  • the target text information can be converted according to the pre-stored conversion dictionary, and the encoding value corresponding to each character in the target text information can be obtained and combined to obtain a text code, and the obtained text code adopts a numerical value for the feature of the target text information.
  • the structure of the text judgment neural network is similar to that of the voiceprint verification model.
  • the text encoding of the target text information is input into the text judgment neural network to obtain the corresponding output results.
  • the output results can be represented by numerical values, and the judgment output Whether the result is greater than the preset cheating score value, if it is greater, it is judged that the target text information has a cheating tendency; if the output result is not greater than the preset cheating score value, it is judged that the obtained target text information does not have a cheating tendency.
  • the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued. If the voiceprint verification result is inconsistent, it means that there are other candidates in the voice collection terminal corresponding to the current candidate, that is, the current candidate is passively communicating with other candidates, and it is determined that there is cheating and an alarm message is issued to remind the user terminal. The users of the system deal with the cheating behavior in a timely manner. If the result of the text judgment is that it contains cheating words, it means that the current examinee is actively communicating with other examinees, it is determined that there is cheating behavior, and an alarm prompt message is issued to prompt the user of the user terminal to deal with the cheating behavior in time.
  • the technical methods in this application can be applied to application scenarios such as smart government affairs/smart education, etc., which include judging cheating in exams based on speech recognition, so as to promote the construction of smart cities.
  • the speech feature parameters of each sentence pronunciation are obtained from the basic speech information of each examinee, and the dimension reduction feature is performed to obtain the dimension reduction feature parameters.
  • the parameters are used to train the initial voiceprint verification model, and use the voiceprint verification model to verify whether the target dimensionality reduction feature parameters of the voice information to be analyzed are consistent with the dimensionality reduction feature parameters of the corresponding candidates, and the target text information obtained by voice recognition for the voice information to be analyzed If the voiceprint verification result is inconsistent or the text judgment result contains cheating words, it will be judged that there is cheating and an alarm message will be sent.
  • the speech information to be recognized of the examinee can be recognized based on the speech recognition, so as to realize the real-time and accurate judgment of the cheating behavior of the examinees.
  • Embodiments of the present application further provide a speech recognition-based exam cheating recognition device, which is used to execute any of the foregoing speech recognition-based exam cheating recognition methods.
  • FIG. 9 is a schematic block diagram of an apparatus for recognizing exam cheating based on speech recognition provided by an embodiment of the present application.
  • the apparatus for recognizing exam cheating based on speech recognition can be configured in the user terminal 10.
  • the apparatus 100 for detecting cheating based on speech recognition includes a speech feature parameter obtaining unit 110 , a dimension reduction processing unit 120 , a model training unit 130 , a target dimension reduction feature parameter obtaining unit 140 , and a voiceprint verification result obtaining unit 150 , a target text information acquisition unit 160 , a text judgment result acquisition unit 170 and a prompt information transmission unit 180 .
  • the voice feature parameter obtaining unit 110 is used to obtain the basic voice information corresponding to each candidate collected by the voice collecting terminal, and obtain the basic voice information corresponding to each candidate from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule.
  • the speech feature parameters corresponding to the pronunciation of the sentence are used to obtain the basic voice information corresponding to each candidate collected by the voice collecting terminal, and obtain the basic voice information corresponding to each candidate from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule.
  • the speech feature parameter acquisition unit 110 includes subunits: an audio information acquisition unit, an audio information conversion unit, an audio coefficient information acquisition unit, and a perceptual coefficient information acquisition unit.
  • the audio information acquisition unit is used to perform frame-by-frame processing on each of the sentence pronunciations to obtain corresponding multi-frame audio information; the audio information conversion unit is used for the multi-frame corresponding to each of the sentence pronunciations according to the spectrum conversion rule.
  • the audio information is converted into an audio frequency spectrum; the audio coefficient information acquisition unit is used to acquire the audio coefficient information corresponding to each of the audio frequency spectrum according to the audio coefficient extraction rule; the perceptual coefficient information acquisition unit is used for according to the perceptual coefficient extraction rule Obtain perceptual coefficient information corresponding to each of the audio frequency spectrums.
  • the audio coefficient information acquisition unit includes subunits: a frequency conversion unit and an inverse conversion processing unit.
  • a frequency conversion unit configured to convert each of the audio frequency spectra into a corresponding nonlinear audio frequency spectrum according to the frequency conversion formula
  • an inverse transform processing unit configured to convert each of the non-linear audio frequencies according to the inverse transform calculation formula
  • the frequency spectrum is inversely transformed to obtain a plurality of audio coefficients corresponding to each of the non-linear audio frequency spectra as audio coefficient information of each of the audio frequency spectra.
  • the perceptual coefficient information obtaining unit includes subunits: a frequency band energy vector obtaining unit and an inverse transform processing unit.
  • a frequency band energy vector obtaining unit configured to filter each of the audio frequency spectra according to a plurality of frequency values included in the frequency array to obtain a frequency band energy vector corresponding to each of the audio frequency spectra; an inverse transform processing unit, used for After compressing the frequency band energy vector corresponding to each of the audio frequency spectrums, inverse transform is performed according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.
  • the dimension reduction processing unit 120 is configured to perform dimension reduction processing on each of the speech feature parameters according to a preset dimension reduction value to obtain a feature vector matrix and a dimension reduction feature parameter corresponding to the pronunciation of each sentence.
  • the dimensionality reduction processing unit 120 includes subunits: a covariance matrix obtaining unit, a covariance matrix solving unit, an eigenvector matrix obtaining unit, and a matrix calculating unit.
  • a covariance matrix obtaining unit used for integrating all the speech feature parameters into a parameter matrix and calculating the covariance matrix of the parameter matrix; a covariance matrix solving unit, used for solving the covariance eigenvalues of the covariance matrix and a covariance eigenvector corresponding to each of the covariance eigenvalues; an eigenvector matrix acquisition unit for selecting a plurality of covariance eigenvalues corresponding to the covariance eigenvalues that are the largest and equal to the dimensionality reduction value The covariance eigenvectors are combined to obtain an eigenvector matrix; a matrix calculation unit is used to multiply the parameter matrix and the eigenvector matrix to obtain a dimension reduction feature parameter corresponding to the pronunciation of each sentence.
  • the model training unit 130 is configured to perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each sentence and the preset model training rules to obtain a trained voiceprint verification model.
  • the model training unit 130 includes subunits: a positive sample acquisition unit, a negative sample acquisition unit, a model output information acquisition unit, a loss value calculation unit, a loss value judgment unit, a parameter value update unit, and a voiceprint verification unit.
  • the model determines the unit.
  • the positive sample acquisition unit is used to randomly select two dimensionality reduction feature parameters of the same candidate from the dimensionality reduction feature parameters as positive samples; the negative sample acquisition unit is used to randomly select different candidates from the dimensionality reduction feature parameters.
  • Two dimensionality reduction feature parameters are used as negative samples;
  • a model output information acquisition unit is used to input the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information;
  • a loss value calculation unit is used for Calculate the model output information according to the loss value calculation formula to obtain a loss value;
  • a loss value judgment unit is used to judge whether the loss value is less than the loss threshold value;
  • a parameter value update unit is used to determine if the loss value is less than the loss value The value is not less than the loss threshold, and the update value of each parameter in the initialized voiceprint verification model is calculated according to the gradient calculation formula and the loss value to update the parameter value of the parameter, and returns to execute the The step of inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding
  • the target dimension reduction feature parameter obtaining unit 140 is configured to obtain the target corresponding to the voice information to be analyzed according to the extraction rule and the feature vector matrix if the voice information to be analyzed is received from any of the voice collection terminals. Dimensionality reduction feature parameters.
  • the voiceprint verification result obtaining unit 150 is configured to verify whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter of the examinee corresponding to the voice information to be analyzed according to the preset scoring threshold and the voiceprint verification model to obtain a voiceprint Validation results.
  • the voiceprint verification result acquisition unit 150 includes subunits: an output information acquisition unit, a verification score acquisition unit, and a verification score judgment unit.
  • the output information acquisition unit is used for inputting the target dimensionality reduction feature parameter and any dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed into the voiceprint verification model to obtain corresponding output information;
  • the verification score acquisition unit uses The verification score corresponding to the target dimension reduction feature parameter is calculated according to the output information;
  • the verification score judgment unit is used for judging whether the verification score is greater than a scoring threshold to obtain the target dimension reduction feature parameter and the dimension reduction.
  • the voiceprint verification result of whether the feature parameters are consistent.
  • the target text information acquisition unit 160 is configured to perform speech recognition on the to-be-analyzed speech information according to a pre-stored speech recognition model to obtain target text information corresponding to the to-be-analyzed speech information.
  • the text judgment result obtaining unit 170 is configured to judge whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result.
  • the prompt information sending unit 180 is configured to determine that cheating behavior exists and issue an alarm prompt message if the voiceprint verification result is inconsistent or the text judgment result is that the text contains cheating words.
  • the speech recognition-based exam cheating recognition device applies the above-mentioned speech recognition-based exam cheating recognition method, obtains the speech feature parameters of each sentence pronunciation from the basic speech information of each examinee, and performs dimensionality reduction to obtain Dimensionality reduction feature parameters, train the initialized voiceprint verification model according to the dimensionality reduction feature parameters, and use the voiceprint verification model to verify whether the target dimensionality reduction feature parameters of the voice information to be analyzed are consistent with the dimensionality reduction feature parameters of the corresponding candidates, and the speech to be analyzed is verified. It is judged whether the target text information obtained by voice recognition of the information contains cheating words.
  • the speech information to be recognized of the examinee can be recognized based on the speech recognition, so as to realize the real-time and accurate judgment of the cheating behavior of the examinees.
  • the above-mentioned apparatus for recognizing exam cheating based on speech recognition can be implemented in the form of a computer program, and the computer program can be executed on a computer device as shown in FIG. 10 .
  • FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • the computer device may be a user terminal 10 for executing a speech recognition-based exam cheating recognition method to judge exam cheating based on speech recognition.
  • the computer device 500 includes a processor 502 , a memory and a network interface 505 connected by a system bus 501 , wherein the memory may include a non-volatile storage medium 503 and an internal memory 504 .
  • the nonvolatile storage medium 503 can store an operating system 5031 and a computer program 5032 .
  • the computer program 5032 can cause the processor 502 to execute a speech recognition-based exam cheating recognition method.
  • the processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500 .
  • the internal memory 504 provides an environment for the execution of the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute the speech recognition-based exam cheating recognition method.
  • the network interface 505 is used for network communication, such as providing transmission of data information.
  • the network interface 505 is used for network communication, such as providing transmission of data information.
  • FIG. 10 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or fewer components than shown, or combine certain components, or have a different arrangement of components.
  • the processor 502 is configured to run the computer program 5032 stored in the memory, so as to realize the corresponding functions in the above-mentioned speech recognition-based examination cheating recognition method.
  • the embodiment of the computer device shown in FIG. 10 does not constitute a limitation on the specific structure of the computer device. Either some components are combined, or different components are arranged.
  • the computer device may only include a memory and a processor.
  • the structures and functions of the memory and the processor are the same as those of the embodiment shown in FIG. 10 , which will not be repeated here.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.
  • a computer-readable storage medium may be a non-volatile computer-readable storage medium, or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, wherein when the computer program is executed by the processor, the steps included in the above-mentioned speech recognition-based exam cheating recognition method are implemented.
  • the disclosed apparatus, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only logical function division.
  • there may be other division methods, or units with the same function may be grouped into one Units, such as multiple units or components, may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions of the embodiments of the present application.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the read storage medium includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned computer-readable storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

An examination cheating recognition method and apparatus based on speech recognition, and a computer device. The method comprises: acquiring, from basic speech information of each examinee, a speech feature parameter of each piece of statement pronunciation, and performing dimension reduction to obtain a dimension-reduced feature parameter; training an initialized voiceprint verification model according to the dimension-reduced feature parameter; using the voiceprint verification model to verify whether a target dimension-reduced feature parameter of speech information to be analyzed is consistent with a dimension-reduced feature parameter of a corresponding examinee; determining whether a cheating word is included in target text information that is obtained by means of performing speech recognition on the speech information to be analyzed; and if a voiceprint verification result indicates inconsistency or a text determination result indicates that a cheating word is included, determining that there is a cheating behavior, and sending alarm prompt information. In the method, speech information to be recognized of a examinee is recognized on the basis of speech recognition, so as to realize real-time and accurate determination of a communication cheating behavior between examinees.

Description

基于语音识别的考试作弊识别方法、装置及计算机设备Test cheating recognition method, device and computer equipment based on speech recognition
本申请要求于2020年12月16日提交中国专利局、申请号为202011490833.2,发明名称为“基于语音识别的考试作弊识别方法、装置及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on December 16, 2020 with the application number 202011490833.2 and the invention titled "Method, Device and Computer Equipment for Exam Cheating Recognition Based on Speech Recognition", the entire content of which has passed Reference is incorporated in this application.
技术领域technical field
本申请涉及人工智能技术领域,属于智慧城市中基于语音识别对考试作弊进行判断的应用场景,尤其涉及一种基于语音识别的考试作弊识别方法、装置及计算机设备。The present application relates to the field of artificial intelligence technology, and belongs to the application scenario of judging cheating in exams based on speech recognition in smart cities, and in particular, relates to a method, device and computer equipment for recognizing cheating in exams based on speech recognition.
背景技术Background technique
越来越多的选拔、评定流程均采用考试方式进行,以确保选拔或评定的公平性,例如公务员选拔、四六级考试评定、驾照考试评定等。为避免考生在考试过程中出现作弊行为而影响考试的公平性,考场内会安排监考人员进行监考,然而监考人员无法每时每刻对每一考生进行监控,导致监考效果不理想。传统技术方法中均是采用视频监控方式辅助监考人员进行监考,分析视频以确定作弊考生的具体位置,然而发明人发现,通过分析监控视频仅能在事后对考生是否存在作弊行为进行判断,进行作弊行为判断的实时性无法得到保证,并且监控视频仅能通过图像分析考生的肢体动作是否具有作弊行为,若考生之间相互交流而肢体动作较小,则无法通过得到的监控视频准确识别考生的作弊行为。因此,现有技术方法存在无法对考生之间的交流作弊行为进行实时、准确判断的问题。More and more selection and evaluation processes are carried out by examination to ensure the fairness of selection or evaluation, such as civil servant selection, CET-4 and CET-6 test evaluation, driver's license test evaluation, etc. In order to prevent candidates from cheating during the examination and affect the fairness of the examination, invigilators will be arranged in the examination room to invigilate the examination. However, the invigilators cannot monitor each candidate at all times, resulting in unsatisfactory invigilation results. In the traditional technical methods, video surveillance is used to assist invigilators in invigilating the exam, and the video is analyzed to determine the specific location of the cheating examinee. However, the inventor found that by analyzing the surveillance video, it is only possible to judge whether the examinee has cheated after the event. The real-time behavior of behavior judgment cannot be guaranteed, and the surveillance video can only analyze whether the examinee's body movements have cheating behaviors through images. If the examinees communicate with each other and the body movements are small, the obtained surveillance video cannot accurately identify the examinee's cheating Behavior. Therefore, the prior art method has the problem of being unable to make real-time and accurate judgments on the cheating behaviors between candidates.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种基于语音识别的考试作弊识别方法、装置及计算机设备,旨在解决现有技术方法中所存在的无法对考生之间的交流作弊行为进行实时、准确判断的问题。The embodiments of the present application provide a method, device, and computer equipment for detecting cheating in an exam based on speech recognition, which aim to solve the problem in the prior art that the cheating behavior of exchanges between candidates cannot be judged in real time and accurately.
第一方面,本申请实施例提供了一种基于语音识别的考试作弊识别方法,其包括:In a first aspect, an embodiment of the present application provides a method for recognizing cheating in an exam based on speech recognition, which includes:
获取所述语音采集终端采集得到的与每一考生对应的基本语音信息,根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数;Obtain the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each section of the sentence from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule;
根据预设的降维数值对每一所述语音特征参数进行降维处理得到特征向量矩阵及与每一所述语句发音对应的降维特征参数;Perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;
根据每一所述语句发音的降维特征参数及预置的模型训练规则对初始化的声纹验证模型进行迭代训练得到训练后的声纹验证模型;Perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each of the sentences and the preset model training rules to obtain the trained voiceprint verification model;
若接收到来自任一所述语音采集终端的待分析语音信息,根据所述提取规则及所述特征向量矩阵获取与所述待分析语音信息对应的目标降维特征参数;If receiving the voice information to be analyzed from any of the voice collection terminals, obtain the target dimension reduction feature parameter corresponding to the voice information to be analyzed according to the extraction rule and the feature vector matrix;
根据预置的评分阈值及所述声纹验证模型验证所述目标降维特征参数与所述待分析语音信息对应考生的降维特征参数是否一致得到声纹验证结果;According to the preset scoring threshold and the voiceprint verification model, verify whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed to obtain a voiceprint verification result;
根据预存的语音识别模型对所述待分析语音信息进行语音识别得到与所述待分析语音信息对应的目标文本信息;Perform voice recognition on the voice information to be analyzed according to a pre-stored voice recognition model to obtain target text information corresponding to the voice information to be analyzed;
根据预置的文本判断模型对所述目标文本信息是否包含作弊词汇进行判断得到文本判断结果;Judging whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;
若所述声纹验证结果为不一致或所述文本判断结果为包含作弊词汇,则判定存在作弊行为并发出报警提示信息。If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued.
第二方面,本申请实施例提供了一种基于语音识别的考试作弊识别装置,其包括:In a second aspect, an embodiment of the present application provides a speech recognition-based examination cheating recognition device, which includes:
语音特征参数获取单元,用于获取所述语音采集终端采集得到的与每一考生对应的基本语音信息,根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数;The voice feature parameter acquisition unit is used to obtain the basic voice information corresponding to each candidate collected by the voice acquisition terminal, and obtains the basic voice information corresponding to each candidate from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule. The speech feature parameters corresponding to the pronunciation;
降维处理单元,用于根据预设的降维数值对每一所述语音特征参数进行降维处理得到特征向量矩阵及与每一所述语句发音对应的降维特征参数;A dimensionality reduction processing unit, configured to perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;
模型训练单元,用于根据每一所述语句发音的降维特征参数及预置的模型训练规则对初始化的声纹验证模型进行迭代训练得到训练后的声纹验证模型;A model training unit, configured to perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each said sentence and the preset model training rules to obtain the trained voiceprint verification model;
目标降维特征参数获取单元,用于若接收到来自任一所述语音采集终端的待分析语音信息,根据所述提取规则及所述特征向量矩阵获取与所述待分析语音信息对应的目标降维特征参数;The target dimensionality reduction feature parameter acquisition unit is used to obtain the target reduction corresponding to the to-be-analyzed voice information according to the extraction rule and the feature vector matrix if the to-be-analyzed voice information is received from any of the voice collection terminals. dimensional feature parameters;
声纹验证结果获取单元,用于根据预置的评分阈值及所述声纹验证模型验证所述目标降维特征参数与所述待分析语音信息对应考生的降维特征参数是否一致得到声纹验证结果;The voiceprint verification result obtaining unit is used for verifying whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter of the examinee corresponding to the voice information to be analyzed according to the preset scoring threshold and the voiceprint verification model to obtain the voiceprint verification result;
目标文本信息获取单元,用于根据预存的语音识别模型对所述待分析语音信息进行语音识别得到与所述待分析语音信息对应的目标文本信息;a target text information acquisition unit, configured to perform speech recognition on the to-be-analyzed speech information according to a pre-stored speech recognition model to obtain target text information corresponding to the to-be-analyzed speech information;
文本判断结果获取单元,用于根据预置的文本判断模型对所述目标文本信息是否包含作弊词汇进行判断得到文本判断结果;a text judgment result obtaining unit, configured to judge whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;
提示信息发送单元,用于若所述声纹验证结果为不一致或所述文本判断结果为包含作弊词汇,则判定存在作弊行为并发出报警提示信息。A prompt information sending unit, configured to determine that cheating behavior exists and issue an alarm prompt message if the voiceprint verification result is inconsistent or the text judgment result contains cheating words.
第三方面,本申请实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述的基于语音识别的考试作弊识别方法。In a third aspect, an embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer The program implements the speech recognition-based exam cheating recognition method described in the first aspect.
第四方面,本申请实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行上述第一方面所述的基于语音识别的考试作弊识别方法。In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when executed by a processor, the computer program causes the processor to execute the above-mentioned first step. In one aspect, the speech recognition-based exam cheating recognition method is described.
本申请实施例提供了一种基于语音识别的考试作弊识别方法、装置及计算机设备。从每一考生的基本语音信息中获取每一语句发音的语音特征参数并进行降维得到降维特征参数,根据降维特征参数对初始化声纹验证模型进行训练,使用声纹验证模型对待分析语音信息的目标降维特征参数是否与相应考生的降维特征参数一致进行验证,对待分析语音信息进行语音识别得到的目标文本信息中是否包含作弊词汇进行判断,若声纹验证结果为不一致或文本判断结果为包含作弊词汇,则判定存在作弊行为并发送报警提示信息。通过上述方法,可基于语音识别对考生的待识别语音信息进行识别,以实现对考生之间的交流作弊行为进行实时、准确判断。The embodiments of the present application provide a method, device and computer equipment for detecting cheating in an exam based on speech recognition. Obtain the voice feature parameters of each sentence pronunciation from the basic voice information of each candidate and perform dimension reduction to obtain the dimensionality reduction feature parameters. According to the dimensionality reduction feature parameters, the initialized voiceprint verification model is trained, and the voiceprint verification model is used to analyze the speech. Whether the target dimensionality reduction feature parameters of the information are consistent with the dimensionality reduction feature parameters of the corresponding candidates is verified, and whether the target text information obtained by speech recognition from the analyzed voice information contains cheating words. If the voiceprint verification result is inconsistent or text judgment If the result is that the word contains cheating words, it is determined that there is cheating behavior and an alarm prompt message is sent. Through the above method, the speech information to be recognized of the examinee can be recognized based on the speech recognition, so as to realize the real-time and accurate judgment of the cheating behavior of the examinees.
附图说明Description of drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.
图1为本申请实施例提供的基于语音识别的考试作弊识别方法的流程示意图;1 is a schematic flowchart of a speech recognition-based exam cheating recognition method provided by an embodiment of the present application;
图2为本申请实施例提供的基于语音识别的考试作弊识别方法的应用场景示意图;2 is a schematic diagram of an application scenario of a speech recognition-based exam cheating recognition method provided by an embodiment of the present application;
图3为本申请实施例提供的基于语音识别的考试作弊识别方法的子流程示意图;3 is a schematic diagram of a sub-flow of a speech recognition-based exam cheating recognition method provided by an embodiment of the present application;
图4为本申请实施例提供的基于语音识别的考试作弊识别方法的另一子流程示意图;4 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application;
图5为本申请实施例提供的基于语音识别的考试作弊识别方法的另一子流程示意图;5 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application;
图6为本申请实施例提供的基于语音识别的考试作弊识别方法的另一子流程示意图;6 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application;
图7为本申请实施例提供的基于语音识别的考试作弊识别方法的另一子流程示意图;7 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application;
图8为本申请实施例提供的基于语音识别的考试作弊识别方法的另一子流程示意图;8 is a schematic diagram of another sub-flow of the speech recognition-based exam cheating recognition method provided by the embodiment of the present application;
图9为本申请实施例提供的基于语音识别的考试作弊识别装置的示意性框图;9 is a schematic block diagram of a speech recognition-based examination cheating recognition device provided by an embodiment of the present application;
图10为本申请实施例提供的计算机设备的示意性框图。FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的 实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It is to be understood that, when used in this specification and the appended claims, the terms "comprising" and "comprising" indicate the presence of the described features, integers, steps, operations, elements and/or components, but do not exclude one or The presence or addition of a number of other features, integers, steps, operations, elements, components, and/or sets thereof.
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terminology used in the specification of the application herein is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural unless the context clearly dictates otherwise.
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be further understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items .
请参阅图1及图2,图1是本申请实施例提供的基于语音识别的考试作弊识别方法的流程示意图,图2为本申请实施例提供的基于语音识别的考试作弊识别方法的应用场景示意图,该基于语音识别的考试作弊识别方法应用于用户终端10中,该方法通过安装于用户终端10中的应用软件进行执行,用户终端10与每一考生的语音采集终端20通过网络连接以进行数据信息的传输,用户终端10即是用于执行基于语音识别的考试作弊识别方法以实现对考生是否存在考试作弊行为进行判断的终端设备,如台式电脑、笔记本电脑、平板电脑或手机等终端设备,语音采集终端20即是用于对考生发出的语音信息进行实时采集的终端,如麦克风等,则考试现场每一考生均对应佩戴一台语音采集终端20或每一考生的考试桌面上均对应配置一台语音采集终端20。如图1所示,该方法包括步骤S110~S180。Please refer to FIG. 1 and FIG. 2 , FIG. 1 is a schematic flowchart of a method for recognizing cheating in an exam based on speech recognition provided by an embodiment of the present application, and FIG. 2 is a schematic diagram of an application scenario of the method for recognizing cheating in an exam based on speech recognition provided by an embodiment of the present application , the test cheating recognition method based on speech recognition is applied in the user terminal 10, the method is executed by the application software installed in the user terminal 10, and the user terminal 10 and the voice collection terminal 20 of each candidate are connected through the network to perform data analysis For the transmission of information, the user terminal 10 is a terminal device used to perform a speech recognition-based examination cheating identification method to realize the judgment of whether the examinee has cheating in the examination, such as a desktop computer, a notebook computer, a tablet computer or a mobile phone and other terminal equipment, The voice collection terminal 20 is a terminal used for real-time collection of the voice information sent by the examinee, such as a microphone, etc., then each examinee at the test site should wear a corresponding voice collection terminal 20 or each examinee's examination desktop is configured accordingly. A voice collection terminal 20 . As shown in FIG. 1, the method includes steps S110-S180.
S110、获取所述语音采集终端采集得到的与每一考生对应的基本语音信息,根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数。S110: Acquire the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each sentence from the pronunciations of multiple sentences contained in each basic voice information according to a preset extraction rule .
获取所述语音采集终端采集得到的与每一考生对应的基本语音信息,根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数。考生在进入答题界面时,需要对考试协议、考场须知等内容进行阅读,这一阅读过程在考试开考之前完成,每一考生在阅读上述内容的过程中,则可通过与每一考生对应的语音采集终端采集得到对应的基本语音信息,每一考生的基本语音信息中包含多段语句发音。每一段语音发音即对应一个考生所讲的一句话,可根据提取规则从每一段语句发音中获取对应的语音特征参数,语音特征参数即可对一段语句发音的音频特征进行量化表示,其中语音特征参数包括一个语句发音的音频系数信息及感知系数信息,音频系数信息可以是该语句发音对应的梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC),感知系数信息可以是该语句发音对应的感知线性预测系数(Linear Prediction Coefficient,LPC)。提取规则包括频谱转换规则、音频系数提取规则及感知系数提取规则。可根据频谱转换规则对每一语句发音进行频谱转换,并根据音频系数提取规则对进行频谱转换后得到的音频频谱进行分析得到音频系数信息,根据感知系数提取规则对音频频谱进行分析得到感知系数信息。Obtain the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each sentence from the pronunciations of multiple sentences contained in each basic voice information according to a preset extraction rule. When entering the answering interface, candidates need to read the test agreement, test room instructions, etc. This reading process is completed before the test starts. During the process of reading the above content, each candidate can pass the corresponding information for each candidate. The voice collection terminal collects the corresponding basic voice information, and the basic voice information of each examinee includes the pronunciation of multiple sentences. Each speech pronunciation corresponds to a sentence spoken by a candidate. The corresponding speech feature parameters can be obtained from the pronunciation of each sentence according to the extraction rules. The speech feature parameters can quantify the audio features of the pronunciation of a sentence. The parameters include audio coefficient information and perceptual coefficient information of the pronunciation of a sentence. The audio coefficient information can be the Mel Frequency Cepstrum Coefficient (MFCC) corresponding to the pronunciation of the sentence, and the perceptual coefficient information can be the perception corresponding to the pronunciation of the sentence. Linear Prediction Coefficient (LPC). The extraction rules include spectral conversion rules, audio coefficient extraction rules, and perceptual coefficient extraction rules. The pronunciation of each sentence can be spectrum converted according to the spectrum conversion rules, and the audio frequency spectrum obtained after the spectrum conversion can be analyzed according to the audio coefficient extraction rules to obtain the audio coefficient information, and the audio frequency spectrum can be analyzed according to the perceptual coefficient extraction rules to obtain the perceptual coefficient information. .
在一实施例中,如图3所示,步骤S110包括子步骤S111、S112、S113和S114。In one embodiment, as shown in FIG. 3 , step S110 includes sub-steps S111 , S112 , S113 and S114 .
S111、对每一所述语句发音进行分帧处理得到对应的多帧音频信息。S111. Perform frame-by-frame processing on the pronunciation of each sentence to obtain corresponding multi-frame audio information.
语句发音在计算机中以包含音轨的声谱图进行表示,声谱图中包含很多帧,每一帧即对应一个时间单元,则可从语句发音的声谱图中获取得到每一帧音频信息,每一帧音频信息即对应声谱图中一个时间单元内所包含的音频信息。Sentence pronunciation is represented in the computer by the spectrogram containing the audio track. The spectrogram contains many frames, each frame corresponds to a time unit, and the audio information of each frame can be obtained from the spectrogram of the sentence pronunciation. , each frame of audio information corresponds to the audio information contained in a time unit in the spectrogram.
S112、根据所述频谱转换规则将每一所述语句发音对应的多帧音频信息转换为音频频谱。S112. Convert the multi-frame audio information corresponding to the pronunciation of each sentence into an audio spectrum according to the spectrum conversion rule.
可根据频谱转换规则对每一语句发音包含的多帧音频信息进行快速傅里叶变换(Fast Fourier Transform,FFT)然后旋转90度,得到与每一语句发音对应的音频频谱,在音频频谱中的频谱表示频率与能量的关系。Fast Fourier Transform (FFT) can be performed on the multi-frame audio information contained in the pronunciation of each sentence according to the spectral conversion rules, and then rotated by 90 degrees to obtain the audio spectrum corresponding to the pronunciation of each sentence. The spectrum represents the relationship between frequency and energy.
S113、根据所述音频系数提取规则获取每一所述音频频谱对应的音频系数信息。S113: Acquire audio coefficient information corresponding to each of the audio frequency spectrums according to the audio coefficient extraction rule.
通过音频系数提取规则即可从每一音频频谱中提取得到音频系数信息,具体的,音频系数提取规则包括频率转换公式及逆变换计算公式。The audio coefficient information can be extracted from each audio frequency spectrum through the audio coefficient extraction rule. Specifically, the audio coefficient extraction rule includes a frequency conversion formula and an inverse conversion calculation formula.
在一实施例中,如图4所示,步骤S113包括子步骤S1131和S1132。In one embodiment, as shown in FIG. 4 , step S113 includes sub-steps S1131 and S1132.
S1131、根据所述频率转换公式将每一所述音频频谱转换为对应的非线性音频频谱;S1132、根据所述逆变换计算公式对每一所述非线性音频频谱进行逆变换得到与每一所述非线性音频频谱对应的多个音频系数作为每一所述音频频谱的音频系数信息。S1131. Convert each of the audio frequency spectrum into a corresponding nonlinear audio frequency spectrum according to the frequency conversion formula; S1132. Perform inverse transformation on each of the nonlinear audio frequency spectrum according to the inverse transform calculation formula to obtain a A plurality of audio coefficients corresponding to the nonlinear audio spectrum are used as audio coefficient information of each of the audio spectrums.
根据频率转换公式将以线性方式表示的音频频谱转换为非线性音频频谱,人的听觉系统是一个特殊的非线性系统,它响应不同频率信号的灵敏度是不同的,为模拟人类听觉系统对音频信号灵敏度进行感知的特点,可通过非线性音频频谱模拟人类听觉系统对音频信号的表征,并进一步从中获取符合人类听觉系统的特征。音频频谱及非线性音频频谱均可采用频谱曲线进行表示,则频谱曲线由多个连续的频谱值所组成。According to the frequency conversion formula, the linearly expressed audio spectrum is converted into a nonlinear audio spectrum. The human auditory system is a special nonlinear system, and its sensitivity to different frequency signals is different. The characteristic of sensitivity for perception can simulate the characterization of the audio signal by the human auditory system through the nonlinear audio spectrum, and further obtain the characteristics consistent with the human auditory system. Both the audio spectrum and the nonlinear audio spectrum can be represented by a spectrum curve, and the spectrum curve is composed of multiple continuous spectrum values.
具体的,频率转换公式可采用公式(1)进行表示:Specifically, the frequency conversion formula can be expressed by formula (1):
mel(f)=2959×log(1+f/700)   (1);mel(f)=2959×log(1+f/700) (1);
其中mel(f)为转换后非线性音频频谱的频谱值,f为音频音频的频率值。where mel(f) is the spectral value of the converted nonlinear audio spectrum, and f is the frequency value of the audio audio.
可根据逆变换计算公式对每一非线性音频频谱进行逆变换,具体的,对所得到的一个非线性音频频谱取对数后进行离散余弦变换(Discrete Cosine Transform,DCT),取进行离散余弦变换的第2个至第13个系数进行组合以得到与该非线性音频频谱对应的音频系数,获取每一非线性音频频率对应的音频系数即可得到与每一所述音频频谱的音频系数信息。Each nonlinear audio spectrum can be inversely transformed according to the inverse transform calculation formula. Specifically, a discrete cosine transform (Discrete Cosine Transform, DCT) is performed after taking the logarithm of the obtained nonlinear audio spectrum, and the discrete cosine transform is performed. The 2nd to 13th coefficients are combined to obtain audio coefficients corresponding to the non-linear audio frequency spectrum, and audio coefficient information corresponding to each audio frequency spectrum can be obtained by acquiring the audio coefficients corresponding to each non-linear audio frequency.
S114、根据所述感知系数提取规则获取每一所述音频频谱对应的感知系数信息。S114. Acquire the perceptual coefficient information corresponding to each of the audio frequency spectrums according to the perceptual coefficient extraction rule.
通过感知系数提取规则即可从每一音频频谱中提取得到感知系数信息,具体的,感知系数提取规则包括频率数组及反变换计算公式。The perceptual coefficient information can be extracted from each audio frequency spectrum through the perceptual coefficient extraction rule. Specifically, the perceptual coefficient extraction rule includes a frequency array and an inverse transform calculation formula.
在一实施例中,如图5所示,步骤S114包括子步骤S1141和S1142。In one embodiment, as shown in FIG. 5 , step S114 includes sub-steps S1141 and S1142.
S1141、根据所述频率数组所包含的多个频率值对每一所述音频频谱进行滤波,得到每一所述音频频谱对应的频带能量向量。S1141. Filter each of the audio frequency spectrums according to a plurality of frequency values included in the frequency array to obtain a frequency band energy vector corresponding to each of the audio frequency spectra.
具体的,频率数组中包含多个频率值,可根据多个频率值分别对一个音频频谱进行等响曲线滤波,得到该音频频谱与每一频率值对应的频带能量向量。Specifically, the frequency array contains multiple frequency values, and an equal loudness curve filter can be performed on an audio spectrum according to the multiple frequency values, so as to obtain a frequency band energy vector corresponding to the audio spectrum and each frequency value.
例如,针对人体发声频率段,可选取15个频率值组合成为频率数组,则频率数组可表示为{250,350,450,570,700,840,1000,1170,1370,1600,1850,2150,2500,2900,3400}。For example, for the frequency range of human voice, 15 frequency values can be selected and combined into a frequency array, then the frequency array can be expressed as {250, 350, 450, 570, 700, 840, 1000, 1170, 1370, 1600, 1850, 2150, 2500, 2900, 3400}.
S1142、对每一所述音频频谱对应的频带能量向量进行压缩后根据所述反变换计算公式进行反变换得到每一所述音频频谱的感知系数信息。S1142: Compress the frequency band energy vector corresponding to each of the audio frequency spectrums, and then perform inverse transformation according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.
对每一音频频率的频带能量向量进行立方根计算以压缩频带能量向量,并根据反变换计算公式对进行压缩计算后的频带能量向量进行快速傅里叶反变换(Inverse FastFourierTransform,IFFT)得到每一音频频谱对应的多个系数值,从中获取部分靠前的系数值作为每一音频频谱的感知系数信息。Perform cube root calculation on the band energy vector of each audio frequency to compress the band energy vector, and perform inverse fast Fourier transform (Inverse Fast Fourier Transform, IFFT) on the band energy vector after the compression calculation according to the inverse transform calculation formula to obtain each audio frequency. A plurality of coefficient values corresponding to the frequency spectrum, from which some of the front coefficient values are obtained as the perceptual coefficient information of each audio frequency spectrum.
例如,可对进行压缩计算后的频带能量向量做30个点的快速傅里叶反变换,得到30个点对应的系数值,获取取值前15个系数值作为相应音频频谱的感知系数信息。For example, the inverse fast Fourier transform of 30 points can be performed on the band energy vector after the compression calculation to obtain the coefficient values corresponding to the 30 points, and the first 15 coefficient values are obtained as the perceptual coefficient information of the corresponding audio spectrum.
将同一音频频谱的音频系数信息及感知系数信息进行组合,即可得到该音频频谱的语音特征参数。By combining the audio coefficient information and perceptual coefficient information of the same audio frequency spectrum, the speech characteristic parameters of the audio frequency spectrum can be obtained.
S120、根据预设的降维数值对每一所述语音特征参数进行降维处理得到特征向量矩阵及与每一所述语句发音对应的降维特征参数。S120. Perform dimension reduction processing on each of the speech feature parameters according to a preset dimension reduction value to obtain a feature vector matrix and a dimension reduction feature parameter corresponding to the pronunciation of each sentence.
所得到的每一语音特征参数中均包含多个维度对应的多个参数值,语音特征参数中部分维度的多个参数值过于集中,则这部分参数值无法对多个语句发音在该参数值对应维度上的 差别进行清晰体现,也即是这部分参数值难以突出体现相应语句发音在该参数值对应维度上的特征,为提高语音特征参数对语句发音的特征进行表示的突出性,可对每一语音特征参数进行降维处理,去掉无法突出体现多个语句发音差别值的维度,得到降维特征参数,通过对降维特征参数进行分析,可实现对每一语句发音的特征进行更准确地分析。每一语音特征参数进行降维处理后得到一个降维特征参数,降维特征参数中所包含的维度数量小于语音特征参数中包含的维度数量。Each of the obtained speech feature parameters contains multiple parameter values corresponding to multiple dimensions. The multiple parameter values of some dimensions in the speech feature parameters are too concentrated, so this part of the parameter values cannot be pronounced for multiple sentences. The difference in the corresponding dimension is clearly reflected, that is, it is difficult for this part of the parameter value to highlight the characteristics of the pronunciation of the corresponding sentence in the dimension corresponding to the parameter value. Each speech feature parameter is subjected to dimensionality reduction processing, and the dimension that cannot highlight the difference in pronunciation of multiple sentences is removed, and the dimensionality reduction feature parameter is obtained. ground analysis. After each speech feature parameter is subjected to dimension reduction processing, a dimension reduction feature parameter is obtained, and the number of dimensions included in the dimension reduction feature parameter is smaller than the number of dimensions included in the speech feature parameter.
在一实施例中,如图6所示,步骤S120包括子步骤S121、S122、S123和S124。In one embodiment, as shown in FIG. 6 , step S120 includes sub-steps S121 , S122 , S123 and S124 .
S121、将所有所述语音特征参数整合为一个参数矩阵并计算所述参数矩阵的协方差矩阵。S121. Integrate all the speech feature parameters into a parameter matrix and calculate a covariance matrix of the parameter matrix.
每一语音特征参数中均包含相同维度的多个参数值,则可对语音特征参数进行整合得到参数矩阵,具体的,语音特征参数的数量可表示为m,语音特征参数中的维度数可表示为n,则组合得到的参数矩阵为X m×n,则该参数矩阵可表示为m行n列的一个矩阵,参数矩阵中的包含的数值即为每一语音特征参数所包含的参数值。可计算得到参数矩阵的协方差矩阵,具体计算可采用公式(2)进行表示: Each voice feature parameter contains multiple parameter values of the same dimension, and the voice feature parameters can be integrated to obtain a parameter matrix. Specifically, the number of voice feature parameters can be expressed as m, and the number of dimensions in the voice feature parameters can be expressed as is n, the parameter matrix obtained by combination is X m×n , then the parameter matrix can be represented as a matrix with m rows and n columns, and the values contained in the parameter matrix are the parameter values contained in each speech feature parameter. The covariance matrix of the parameter matrix can be calculated, and the specific calculation can be expressed by formula (2):
Figure PCTCN2021097100-appb-000001
Figure PCTCN2021097100-appb-000001
其中
Figure PCTCN2021097100-appb-000002
为参数矩阵X各维度的参数值的均值所组成的均值向量,T为转制符号,所得到的协方差矩阵即可表示为S n×n
in
Figure PCTCN2021097100-appb-000002
is the mean vector composed of the mean values of the parameter values of each dimension of the parameter matrix X, T is the conversion symbol, and the obtained covariance matrix can be expressed as S n×n .
S122、求解所述协方差矩阵的协方差特征值及与每一所述协方差特征值对应的协方差特征向量。S122. Find the covariance eigenvalues of the covariance matrix and the covariance eigenvectors corresponding to each of the covariance eigenvalues.
协方差举证表示了多个语音特征参数在n个维度空间中个方向上的特征分布,通过对协方差矩阵进行求解,即可从n个维度空间中确定一些特定的方向,多个语音特征参数的特征集中分布在所确定的特定方向上,特征值的大小即反映了多个语音特征参数在该特征值对应方向上的特征差异,删除特征值较小的方向对应的维度,保留主要方向对应的维度,即可达到对语音特征参数进行降维的目的。可采用正交三角分解算法(QR分解算法)、Jocobi迭代算法、奇异值分解算法(SVD算法)等数学计算方法求解协方差矩阵的n个协方差特征值及对应的n个协方差特征向量,一个协方差特征值对应一个协方差特征向量。The covariance proof represents the feature distribution of multiple speech feature parameters in n directions in the n-dimensional space. By solving the covariance matrix, some specific directions and multiple speech feature parameters can be determined from the n-dimensional space. The feature set is distributed in the specific direction determined, and the size of the feature value reflects the feature difference of multiple speech feature parameters in the direction corresponding to the feature value, delete the dimension corresponding to the direction with the smaller feature value, and keep the main direction corresponding to The dimension of speech feature parameters can be reduced to achieve the purpose of dimensionality reduction. Orthogonal triangular decomposition algorithm (QR decomposition algorithm), Jocobi iteration algorithm, singular value decomposition algorithm (SVD algorithm) and other mathematical calculation methods can be used to solve the n covariance eigenvalues of the covariance matrix and the corresponding n covariance eigenvectors, A covariance eigenvalue corresponds to a covariance eigenvector.
S123、根据选择所述协方差特征值最大且与所述降维数值相等的多个协方差特征值对应的协方差特征向量组合得到特征向量矩阵;S124、将所述参数矩阵与所述特征向量矩阵相乘,得到与每一所述语句发音对应的降维特征参数。S123. Obtain an eigenvector matrix by selecting the covariance eigenvectors corresponding to a plurality of covariance eigenvalues with the largest covariance eigenvalue and equal to the dimension reduction value; S124, combining the parameter matrix with the eigenvectors The matrices are multiplied to obtain dimension-reduced feature parameters corresponding to the pronunciation of each sentence.
对所得到的多个协方差特征值从大到小进行排序,并根据排序结果选择其中与降维数值相等的多个协方差特征值对应的协方差特征向量进行组合,得到特征向量矩阵。Sort the obtained multiple covariance eigenvalues from large to small, and select the covariance eigenvectors corresponding to the multiple covariance eigenvalues with equal dimension reduction values according to the sorting result, and combine them to obtain an eigenvector matrix.
例如,降维数值为k(k<n),每一协方差特征向量为一个n行1列的向量,从n个协方差特征向量中选择k个进行组合,得到特征向量矩阵可表示为W n×kFor example, the dimensionality reduction value is k (k<n), each covariance eigenvector is a vector with n rows and 1 column, and k are selected from the n covariance eigenvectors for combination, and the obtained eigenvector matrix can be expressed as W n×k ;
将参数矩阵与所得到的特征向量矩阵相乘,即可根据矩阵计算结果获取每一语句发音对应的降维特征参数。计算过程可表示为Z m×k=X m×nW n×k,矩阵计算结果为一个m行k列的矩阵Z m ×k,获取矩阵Z m×k中每一行的参数值即可得到每一所述语句发音的降维特征参数,其中,第i行的k个参数值即对应所输入的第i个语音特征参数的降维特征参数,降维特征参数中包含k个维度的参数值。 By multiplying the parameter matrix by the obtained feature vector matrix, the dimension reduction feature parameters corresponding to the pronunciation of each sentence can be obtained according to the matrix calculation result. The calculation process can be expressed as Z m×k =X m×n W n×k , the matrix calculation result is a matrix Z m ×k with m rows and k columns, and the parameter value of each row in the matrix Z m×k can be obtained. The dimensionality reduction feature parameter of each said sentence pronunciation, wherein, the k parameter values of the i-th row are the dimensionality reduction feature parameters corresponding to the inputted i-th speech feature parameter, and the dimensionality reduction feature parameters include k-dimensional parameters value.
S130、根据每一所述语句发音的降维特征参数及预置的模型训练规则对初始化的声纹验证模型进行迭代训练得到训练后的声纹验证模型。S130: Perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameter of each sentence pronunciation and the preset model training rule to obtain a trained voiceprint verification model.
根据每一所述语句发音的降维特征参数及预置的模型训练规则对初始化的声纹验证模型进行迭代训练得到训练后的声纹验证模型。其中所述模型训练规则包括损失值计算公式、梯度计算公式及损失阈值。在使用声纹验证模型之前,可对预存的初始化声纹验证模型进行训练,使用训练后的声纹验证模型进行声纹验证以提高验证的准确性,模型训练规则即为对初始化声纹验证模型进行训练的具体规则。The initialized voiceprint verification model is iteratively trained according to the dimensionality reduction feature parameters of the pronunciation of each sentence and the preset model training rules to obtain the trained voiceprint verification model. The model training rule includes a loss value calculation formula, a gradient calculation formula and a loss threshold. Before using the voiceprint verification model, the pre-stored initial voiceprint verification model can be trained, and the trained voiceprint verification model can be used for voiceprint verification to improve the accuracy of verification. Specific rules for conducting training.
在一实施例中,如图7所示,步骤S130包括子步骤S131、S132、S133、S134、S135、S136和S137。In one embodiment, as shown in FIG. 7 , step S130 includes sub-steps S131 , S132 , S133 , S134 , S135 , S136 and S137 .
S131、从所述降维特征参数中随机选择同一考生的两个降维特征参数作为正样本;S132、从所述降维特征参数中随机选择不同考生的两个降维特征参数作为负样本。S131. Randomly select two dimension-reduction characteristic parameters of the same examinee from the dimension-reduction characteristic parameters as positive samples; S132, randomly select two dimension-reduction characteristic parameters of different examinees from the dimension-reduction characteristic parameters as negative samples.
可使用所得到的每一语句发音的降维特征参数作为样本数据对初始化声纹验证模型进行训练,可从降维特征参数获取同一考生的两段语句发音对应的两个降维特征参数作为正样本,从降维特征参数中获取不同考生的两段语音发音对应的两个降维特征参数作为负样本,则可从降维特征参数中选择得到多个正样本及多个负样本,使用一个正样本或一个负样本即可对初始化声纹验证模型进行一次训练。The obtained dimensionality reduction feature parameters of the pronunciation of each sentence can be used as sample data to train the initialized voiceprint verification model, and the two dimensionality reduction feature parameters corresponding to the pronunciation of two sentences of the same examinee can be obtained from the dimensionality reduction feature parameters as positive values. Samples, obtain two dimensionality reduction feature parameters corresponding to two speech pronunciations of different candidates from the dimensionality reduction feature parameters as negative samples, then you can select from the dimensionality reduction feature parameters to obtain multiple positive samples and multiple negative samples, use one A positive sample or a negative sample can be used to train the initialized voiceprint verification model once.
S133、将所述正样本或所述负样本输入所述声纹验证模型以获取对应的模型输出信息。S133. Input the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information.
声纹验证模型即为基于人工智能所构建得到的神经网络模型,声纹验证模型由一个输入层、多个中间层及一个输出层组成,输入层包含多个输入节点,输入节点的数量与两个降维特征参数所包含的维度总数相等,如一个降维特征参数包含k个维度的参数值,则输入层包含2k个输入节点,输出层包含两个输出节点,输入层与中间层之间、中间层与其他相邻的中间层之间、中间层与输出层之间均通过关联公式进行关联,每一关联公式中均包含相应参数,对声纹验证模型进行训练的过程即为对关联公式中参数的参数值进行调整。将正样本或负样本所包含的两个降维特征参数输入声纹验证模型进行计算后得到模型输出信息,模型输出信息即包含两个输出节点的输出节点值,其中,第一个输出节点的输出节点值即为两个降维特征参数相一致的预测概率值,第二个输出节点的输出节点值即为两个降维特征参数不相一致的预测概率值,每一输出节点值的取值范围均为[0,1]。The voiceprint verification model is a neural network model constructed based on artificial intelligence. The voiceprint verification model consists of an input layer, multiple intermediate layers and an output layer. The input layer contains multiple input nodes, and the number of input nodes is related to the two. The total number of dimensions contained in the dimensionality reduction feature parameters is equal. If a dimensionality reduction feature parameter contains parameter values of k dimensions, the input layer contains 2k input nodes, the output layer contains two output nodes, and the gap between the input layer and the middle layer is , between the middle layer and other adjacent middle layers, and between the middle layer and the output layer are related by association formulas, each association formula contains corresponding parameters, the process of training the voiceprint verification model is the association The parameter values of the parameters in the formula are adjusted. Input the two dimensionality reduction feature parameters contained in the positive sample or negative sample into the voiceprint verification model for calculation to obtain the model output information. The model output information includes the output node values of the two output nodes. Among them, the value of the first output node is The output node value is the predicted probability value of the two dimensionality reduction feature parameters that are consistent, and the output node value of the second output node is the predicted probability value of the two dimensionality reduction feature parameters that are inconsistent. The value of each output node is taken. The value range is [0, 1].
S134、根据所述损失值计算公式对所述模型输出信息进行计算以得到损失值;S134. Calculate the model output information according to the loss value calculation formula to obtain a loss value;
可根据损失值计算公式计算得到与模型输出信息对应的损失值。具体的,若将正样本输入声纹验证模型,则对应的损失值计算公式可表示为:
Figure PCTCN2021097100-appb-000003
若将负样本输入声纹验证模型,则对应的损失值计算公式可表示为:
Figure PCTCN2021097100-appb-000004
其中,f 1为模型输出信息中第一个输出节点的输出节点值,f 2为模型输出信息中第二个输出节点的输出节点值。
The loss value corresponding to the model output information can be calculated according to the loss value calculation formula. Specifically, if the positive samples are input into the voiceprint verification model, the corresponding loss value calculation formula can be expressed as:
Figure PCTCN2021097100-appb-000003
If the negative samples are input into the voiceprint verification model, the corresponding loss value calculation formula can be expressed as:
Figure PCTCN2021097100-appb-000004
Among them, f 1 is the output node value of the first output node in the model output information, and f 2 is the output node value of the second output node in the model output information.
S135、判断所述损失值是否小于所述损失阈值;S136、若所述损失值不小于所述损失阈值,根据所述梯度计算公式、所述损失值计算得到所述初始化声纹验证模型中每一参数的更新值以更新所述参数的参数值,并返回执行所述将所述正样本或所述负样本输入所述声纹验证模型以获取对应的模型输出信息的步骤;S137、若所述损失值小于所述损失阈值,将所述声纹验证模型确定为训练后的声纹验证模型。S135, determine whether the loss value is less than the loss threshold; S136, if the loss value is not less than the loss threshold, calculate each value in the initialization voiceprint verification model according to the gradient calculation formula and the loss value An update value of a parameter is used to update the parameter value of the parameter, and returns to the step of inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information; S137, if so If the loss value is less than the loss threshold, the voiceprint verification model is determined as the trained voiceprint verification model.
对损失值是否小于损失阈值进行判断,若损失值小于损失阈值,则表明当前得到的声纹验证模型可满足使用需求,将当前得到的声纹验证模型确定为训练后的声纹验证模型;若损失值不小于损失阈值,则表明当前得到的声纹验证模型还无法满足使用需求,需要对声纹验证模型中参数的参数值进行调整,并基于已调整参数值的声纹验证模型再次计算得到新的损失值,并重复判断新的损失值是否小于损失阈值,直至得到的声纹验证模型满足使用需求。可根据梯度计算公式计算得到声纹验证模型中每一参数的更新值以更新每一参数原始的参数值。具体的,将声纹验证模型中一个参数对一个正样本或一个负样本进行计算所得到的计算值输入梯度计算公式,并结合上述计算得到的损失值,即可计算得到与该参数对应的更新值,这一计算过程也即为梯度下降计算。Judging whether the loss value is less than the loss threshold, if the loss value is less than the loss threshold, it means that the currently obtained voiceprint verification model can meet the needs of use, and the currently obtained voiceprint verification model is determined as the trained voiceprint verification model; If the loss value is not less than the loss threshold, it indicates that the currently obtained voiceprint verification model cannot meet the usage requirements, and the parameter values of the parameters in the voiceprint verification model need to be adjusted, and the voiceprint verification model based on the adjusted parameter values is calculated again. The new loss value, and repeatedly judge whether the new loss value is less than the loss threshold, until the obtained voiceprint verification model meets the needs of use. The updated value of each parameter in the voiceprint verification model can be calculated according to the gradient calculation formula to update the original parameter value of each parameter. Specifically, the calculated value obtained by calculating a positive sample or a negative sample for a parameter in the voiceprint verification model is input into the gradient calculation formula, and combined with the loss value obtained from the above calculation, the update corresponding to the parameter can be calculated. value, this calculation process is also known as gradient descent calculation.
具体的,梯度计算公式可表示为:Specifically, the gradient calculation formula can be expressed as:
Figure PCTCN2021097100-appb-000005
其中,
Figure PCTCN2021097100-appb-000006
为计算得到的参数r的更新值,ω r为参数r的原始参数值,η为梯度计算公式中预置的学习率,
Figure PCTCN2021097100-appb-000007
为基于损失值及参数r对应的计算值对该参 数r的偏导值(这一计算过程中需使用参数对应的计算值)。
Figure PCTCN2021097100-appb-000005
in,
Figure PCTCN2021097100-appb-000006
is the updated value of the calculated parameter r, ω r is the original parameter value of the parameter r, η is the preset learning rate in the gradient calculation formula,
Figure PCTCN2021097100-appb-000007
is the partial derivative value of the parameter r based on the loss value and the calculated value corresponding to the parameter r (the calculated value corresponding to the parameter needs to be used in this calculation process).
S140、若接收到来自任一所述语音采集终端的待分析语音信息,根据所述提取规则及所述特征向量矩阵获取与所述待分析语音信息对应的目标降维特征参数。S140. If the to-be-analyzed speech information from any of the speech collection terminals is received, acquire target dimension reduction feature parameters corresponding to the to-be-analyzed speech information according to the extraction rule and the feature vector matrix.
若接收到来自任一所述语音采集终端的待分析语音信息,根据所述提取规则及所述特征向量矩阵获取与所述待分析语音信息对应的目标降维特征参数。对初始化的声纹验证模型进行训练的过程可在开考前完成,则开考后,通过每一语音采集终端对每一考生周边的说话声进行采集,若接收到来自任一语音采集终端的待分析语音信息,则可根据提取规则及特征向量矩阵获取待分析语音信息对应的目标降维特征参数。具体的,根据提取规则从待分析语音信息中获取得到目标语音特征参数,获取目标语音特征参数的具体方法与获取一段语句发音对应的语音特征参数的具体方法相同,在此不作赘述;将目标语音特征参数与特征向量矩阵相乘即可得到与待分析语音信息对应的目标降维特征参数。If the to-be-analyzed speech information from any of the speech collection terminals is received, the target dimension reduction feature parameter corresponding to the to-be-analyzed speech information is acquired according to the extraction rule and the feature vector matrix. The process of training the initialized voiceprint verification model can be completed before the start of the test. After the test is started, the voices around each candidate are collected through each voice collection terminal. For the speech information to be analyzed, the target dimension reduction feature parameters corresponding to the speech information to be analyzed can be obtained according to the extraction rule and the feature vector matrix. Specifically, the target speech characteristic parameters are obtained from the speech information to be analyzed according to the extraction rules, and the specific method for obtaining the target speech characteristic parameters is the same as the specific method for obtaining the speech characteristic parameters corresponding to the pronunciation of a sentence, which will not be repeated here; The target dimension reduction feature parameters corresponding to the speech information to be analyzed can be obtained by multiplying the feature parameters by the feature vector matrix.
S150、根据预置的评分阈值及所述声纹验证模型验证所述目标降维特征参数与所述待分析语音信息对应考生的降维特征参数是否一致得到声纹验证结果。S150. According to the preset scoring threshold and the voiceprint verification model, verify whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed to obtain a voiceprint verification result.
可根据预置的评分阈值及声纹验证模型对目标降维特征参数是否与相应考生的降维特征参数相一致进行验证,得到声纹验证结果。具体的,根据待分析语音信息对应的语音采集终端,获取与该语音采集终端对应的一个考生的任意一个降维特征参数,将目标降维特征参数与降维特征参数进行组合后输入声纹验证模型得到对应的输出信息,基于输出信息即可计算得到对应的验证评分,判断验证评分是否大于评分阈值,若大于则声纹验证结果为一致;若验证评分不大于评分阈值则声纹验证结果为不一致。According to the preset scoring threshold and the voiceprint verification model, it is possible to verify whether the target dimensionality reduction feature parameters are consistent with the dimensionality reduction feature parameters of the corresponding examinee, and obtain the voiceprint verification result. Specifically, according to the voice collection terminal corresponding to the voice information to be analyzed, any dimension reduction feature parameter of a candidate corresponding to the speech collection terminal is obtained, and the target dimension reduction feature parameter and the dimension reduction feature parameter are combined and input for voiceprint verification The model obtains the corresponding output information. Based on the output information, the corresponding verification score can be calculated to determine whether the verification score is greater than the score threshold. If it is greater, the voiceprint verification result is consistent; if the verification score is not greater than the score threshold, the voiceprint verification result is inconsistent.
在一实施例中,如图8所示,步骤S150包括子步骤S151、S152和S153。In one embodiment, as shown in FIG. 8 , step S150 includes sub-steps S151 , S152 and S153 .
S151、将所述目标降维特征参数与所述待分析语音信息对应考生的任意一条降维特征参数输入所述声纹验证模型得到对应的输出信息;S151, inputting the target dimensionality reduction feature parameter and any dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed into the voiceprint verification model to obtain corresponding output information;
目标降维特征参数及对应考生的一条降维特征参数均包含k个维度的参数值,则对目标降维特征参数及一条降维特征参数进行组合后得到2k个维度的参数值并输入训练后的声纹验证模型,经声纹验证模型中关联公式的计算得到输出信息,输出信息包括两个输出节点的输出节点值。The target dimensionality reduction feature parameter and a dimensionality reduction feature parameter of the corresponding candidate both contain parameter values of k dimensions, then the target dimensionality reduction feature parameter and a dimensionality reduction feature parameter are combined to obtain parameter values of 2k dimensions and input them after training. According to the voiceprint verification model, the output information is obtained through the calculation of the correlation formula in the voiceprint verification model, and the output information includes the output node values of the two output nodes.
S152、根据所述输出信息计算得到与所述目标降维特征参数对应的验证评分;S153、判断所述验证评分是否大于评分阈值得到所述目标降维特征参数与所述降维特征参数是否一致的声纹验证结果。S152. Calculate and obtain a verification score corresponding to the target dimensionality reduction feature parameter according to the output information; S153, determine whether the verification score is greater than a score threshold to obtain whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter voiceprint verification results.
具体的,可通过公式
Figure PCTCN2021097100-appb-000008
计算得到与目标降维特征参数对应的验证评分,其中,f 1为输出信息中第一个输出节点的输出节点值,f 2为输出信息中第二个输出节点的输出节点值。判断验证评分是否大于评分阈值即可得到是否一致的声纹验证结果。
Specifically, through the formula
Figure PCTCN2021097100-appb-000008
The verification score corresponding to the target dimensionality reduction feature parameter is calculated, where f 1 is the output node value of the first output node in the output information, and f 2 is the output node value of the second output node in the output information. By judging whether the verification score is greater than the score threshold, a consistent voiceprint verification result can be obtained.
例如,可设置评分阈值为50,若验证评分大于50则得到声纹验证结果为一致,否则得到声纹验证结果为不一致。For example, the score threshold may be set to 50. If the verification score is greater than 50, the voiceprint verification result is consistent; otherwise, the voiceprint verification result is inconsistent.
S160、根据预存的语音识别模型对所述待分析语音信息进行语音识别得到与所述待分析语音信息对应的目标文本信息。S160. Perform speech recognition on the speech information to be analyzed according to a pre-stored speech recognition model to obtain target text information corresponding to the speech information to be analyzed.
可根据语音识别模型对待分析语音信息进行识别,得到对应的目标文本信息,其中,语音识别模型包括声学模型、语音特征词典及语义解析模型。首先根据所述语音识别模型中的声学模型对所述待分析语音信息进行切分得到所述待分析语音信息中所包含的多个音素。待分析语音信息由多个字符发音的音素而组成,一个字符的音素包括该字符发音的频率和音色。声学模型中包含所有字符发音的音素,通过将待分析语音信息与声学模型中所有的音素进行匹配,即可对待分析语音信息中单个字符的音素进行切分,通过切分最终得到待分析语音信息中所包含的多个音素。The speech information to be analyzed can be recognized according to the speech recognition model to obtain corresponding target text information, wherein the speech recognition model includes an acoustic model, a speech feature dictionary and a semantic analysis model. First, the to-be-analyzed speech information is segmented according to the acoustic model in the speech recognition model to obtain a plurality of phonemes included in the to-be-analyzed speech information. The speech information to be analyzed is composed of phonemes of the pronunciation of a plurality of characters, and the phoneme of a character includes the frequency and timbre of the pronunciation of the character. The acoustic model contains the phonemes of all character pronunciations. By matching the speech information to be analyzed with all the phonemes in the acoustic model, the phonemes of a single character in the speech information to be analyzed can be segmented, and the speech information to be analyzed is finally obtained through segmentation. The multiple phonemes contained in .
其次,根据所述语音识别模型中的语音特征词典对所述音素进行匹配以将所述音素转换 为拼音信息。语音特征词典中包含所有字符拼音对应的音素信息,通过将所得到的音素与字符拼音对应的音素信息进行匹配,即可将单个字符的音素转换为语音特征词典中与该音素相匹配的字符拼音,以实现将语音信息中所包含的所有音素转换为拼音信息。Second, the phonemes are matched according to the speech feature dictionary in the speech recognition model to convert the phonemes into pinyin information. The phoneme information corresponding to all character pinyin is included in the phonetic feature dictionary. By matching the obtained phoneme with the phoneme information corresponding to the character pinyin, the phoneme of a single character can be converted into the phoneme matching the phoneme in the phonetic feature dictionary. , to convert all phonemes contained in the phonetic information into pinyin information.
最后,根据所述语音识别模型中的语义解析模型对所述拼音信息进行语义解析以得到与所述待分析语音信息对应的目标文本信息。语义解析模型中包含拼音信息与文字信息之间所对应的映射关系,通过语义解析模型中所包含的映射关系即可对所得到的拼音信息进行语义解析以将拼音信息转换为对应的目标文本信息。Finally, semantic analysis is performed on the pinyin information according to the semantic analysis model in the speech recognition model to obtain target text information corresponding to the speech information to be analyzed. The semantic parsing model includes the mapping relationship between the pinyin information and the text information, and the obtained pinyin information can be semantically parsed through the mapping relationship included in the semantic parsing model to convert the pinyin information into the corresponding target text information. .
S170、根据预置的文本判断模型对所述目标文本信息是否包含作弊词汇进行判断得到文本判断结果。S170. Determine whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result.
根据预置的文本判断模型对所述目标文本信息是否包含作弊词汇进行判断得到文本判断结果。具体的,文本判断模型中包含多个作弊关键字及文本判断神经网络,首先可判断目标文本信息中是否存在与作弊关键字对应的文字信息,若存在,则文本判断结果为包含作弊词汇,例如作弊关键字可以是“怎么做”、“告诉”、“觉得”等。若目标文本信息中不存在与作弊关键字对应的文字信息,则可将目标文本信息转换为文本编码,并将文本编码输入文本判断神经网络进行识别,以得到文本判断神经网络的输出结果,对输出结果进行判断即可得到目标文本信息是否具有作弊倾向,若判断得到目标文本信息具有作弊倾向,则文本判断结果为包含作弊词汇;若判断得到目标文本信息不具有作弊倾向,则文本判断结果为不包含作弊词汇。A text judgment result is obtained by judging whether the target text information contains cheating words according to a preset text judgment model. Specifically, the text judgment model includes a plurality of cheat keywords and a text judgment neural network. First, it can be determined whether there is text information corresponding to the cheat keywords in the target text information. If so, the text judgment result contains cheat words. For example Cheating keywords can be "how to", "tell", "feel", etc. If there is no text information corresponding to cheating keywords in the target text information, the target text information can be converted into text codes, and the text codes can be input into the text judgment neural network for recognition, so as to obtain the output results of the text judgment neural network. The output result can be judged to determine whether the target text information has cheating tendency. If it is judged that the target text information has cheating tendency, the text judgment result is that it contains cheating words; if it is judged that the target text information does not have cheating tendency, the text judgment result is: Cheating vocabulary is not included.
具体的,则可根据预存的转换词典对目标文本信息进行转换,获取目标文本信息中每一字符对应的编码值并进行组合得到文本编码,所得到的文本编码将该目标文本信息的特征采用数值序列的方式进行表示,文本判断神经网络的结构与声纹验证模型的组成结构相类似,将目标文本信息的文本编码输入文本判断神经网络得到相应输出结果,输出结果可采用数值进行表示,判断输出结果是否大于预置的作弊分数值,若大于则判断得到目标文本信息具有作弊倾向;若输出结果不大于预置的作弊分数值,则判断得到目标文本信息不具有作弊倾向。Specifically, the target text information can be converted according to the pre-stored conversion dictionary, and the encoding value corresponding to each character in the target text information can be obtained and combined to obtain a text code, and the obtained text code adopts a numerical value for the feature of the target text information. The structure of the text judgment neural network is similar to that of the voiceprint verification model. The text encoding of the target text information is input into the text judgment neural network to obtain the corresponding output results. The output results can be represented by numerical values, and the judgment output Whether the result is greater than the preset cheating score value, if it is greater, it is judged that the target text information has a cheating tendency; if the output result is not greater than the preset cheating score value, it is judged that the obtained target text information does not have a cheating tendency.
S180、若所述声纹验证结果为不一致或所述文本判断结果为包含作弊词汇,则判定存在作弊行为并发出报警提示信息。S180. If the voiceprint verification result is inconsistent or the text judgment result is that the text contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued.
若所述声纹验证结果为不一致或所述文本判断结果为包含作弊词汇,则判定存在作弊行为并发出报警提示信息。若声纹验证结果为不一致,则表明当前考生对应的语音采集终端中有其他考生发出声音,也即当前考生正在被动与其他考生进行交流,判定存在作弊行为并发出报警提示信息,以提示用户终端的使用者及时对作弊行为进行处理。若文本判断结果为包含作弊词汇,则表明当前考生正在主动与其他考生进行交流,判定存在作弊行为并发出报警提示信息,以提示用户终端的使用者及时对作弊行为进行处理。If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued. If the voiceprint verification result is inconsistent, it means that there are other candidates in the voice collection terminal corresponding to the current candidate, that is, the current candidate is passively communicating with other candidates, and it is determined that there is cheating and an alarm message is issued to remind the user terminal. The users of the system deal with the cheating behavior in a timely manner. If the result of the text judgment is that it contains cheating words, it means that the current examinee is actively communicating with other examinees, it is determined that there is cheating behavior, and an alarm prompt message is issued to prompt the user of the user terminal to deal with the cheating behavior in time.
本申请中的技术方法可应用于智慧政务/智慧教育等包含对基于语音识别对考试作弊进行判断的应用场景中,从而推动智慧城市的建设。The technical methods in this application can be applied to application scenarios such as smart government affairs/smart education, etc., which include judging cheating in exams based on speech recognition, so as to promote the construction of smart cities.
在本申请实施例所提供的基于语音识别的考试作弊识别方法中,从每一考生的基本语音信息中获取每一语句发音的语音特征参数并进行降维得到降维特征参数,根据降维特征参数对初始化声纹验证模型进行训练,使用声纹验证模型对待分析语音信息的目标降维特征参数是否与相应考生的降维特征参数一致进行验证,对待分析语音信息进行语音识别得到的目标文本信息中是否包含作弊词汇进行判断,若声纹验证结果为不一致或文本判断结果为包含作弊词汇,则判定存在作弊行为并发送报警提示信息。通过上述方法,可基于语音识别对考生的待识别语音信息进行识别,以实现对考生之间的交流作弊行为进行实时、准确判断。In the speech recognition-based exam cheating recognition method provided by the embodiment of the present application, the speech feature parameters of each sentence pronunciation are obtained from the basic speech information of each examinee, and the dimension reduction feature is performed to obtain the dimension reduction feature parameters. The parameters are used to train the initial voiceprint verification model, and use the voiceprint verification model to verify whether the target dimensionality reduction feature parameters of the voice information to be analyzed are consistent with the dimensionality reduction feature parameters of the corresponding candidates, and the target text information obtained by voice recognition for the voice information to be analyzed If the voiceprint verification result is inconsistent or the text judgment result contains cheating words, it will be judged that there is cheating and an alarm message will be sent. Through the above method, the speech information to be recognized of the examinee can be recognized based on the speech recognition, so as to realize the real-time and accurate judgment of the cheating behavior of the examinees.
本申请实施例还提供一种基于语音识别的考试作弊识别装置,该基于语音识别的考试作弊识别装置用于执行前述基于语音识别的考试作弊识别方法的任一实施例。具体地,请参阅图9,图9是本申请实施例提供的基于语音识别的考试作弊识别装置的示意性框图。该基于 语音识别的考试作弊识别装置可配置于用户终端10中。Embodiments of the present application further provide a speech recognition-based exam cheating recognition device, which is used to execute any of the foregoing speech recognition-based exam cheating recognition methods. Specifically, please refer to FIG. 9. FIG. 9 is a schematic block diagram of an apparatus for recognizing exam cheating based on speech recognition provided by an embodiment of the present application. The apparatus for recognizing exam cheating based on speech recognition can be configured in the user terminal 10.
如图9所示,基于语音识别的考试作弊识别装置100包括语音特征参数获取单元110、降维处理单元120、模型训练单元130、目标降维特征参数获取单元140、声纹验证结果获取单元150、目标文本信息获取单元160、文本判断结果获取单元170和提示信息发送单元180。As shown in FIG. 9 , the apparatus 100 for detecting cheating based on speech recognition includes a speech feature parameter obtaining unit 110 , a dimension reduction processing unit 120 , a model training unit 130 , a target dimension reduction feature parameter obtaining unit 140 , and a voiceprint verification result obtaining unit 150 , a target text information acquisition unit 160 , a text judgment result acquisition unit 170 and a prompt information transmission unit 180 .
语音特征参数获取单元110,用于获取所述语音采集终端采集得到的与每一考生对应的基本语音信息,根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数。The voice feature parameter obtaining unit 110 is used to obtain the basic voice information corresponding to each candidate collected by the voice collecting terminal, and obtain the basic voice information corresponding to each candidate from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule. The speech feature parameters corresponding to the pronunciation of the sentence.
在一实施例中,所述语音特征参数获取单元110包括子单元:音频信息获取单元、音频信息转换单元、音频系数信息获取单元和感知系数信息获取单元。In one embodiment, the speech feature parameter acquisition unit 110 includes subunits: an audio information acquisition unit, an audio information conversion unit, an audio coefficient information acquisition unit, and a perceptual coefficient information acquisition unit.
音频信息获取单元,用于对每一所述语句发音进行分帧处理得到对应的多帧音频信息;音频信息转换单元,用于根据所述频谱转换规则将每一所述语句发音对应的多帧音频信息转换为音频频谱;音频系数信息获取单元,用于根据所述音频系数提取规则获取每一所述音频频谱对应的音频系数信息;感知系数信息获取单元,用于根据所述感知系数提取规则获取每一所述音频频谱对应的感知系数信息。The audio information acquisition unit is used to perform frame-by-frame processing on each of the sentence pronunciations to obtain corresponding multi-frame audio information; the audio information conversion unit is used for the multi-frame corresponding to each of the sentence pronunciations according to the spectrum conversion rule. The audio information is converted into an audio frequency spectrum; the audio coefficient information acquisition unit is used to acquire the audio coefficient information corresponding to each of the audio frequency spectrum according to the audio coefficient extraction rule; the perceptual coefficient information acquisition unit is used for according to the perceptual coefficient extraction rule Obtain perceptual coefficient information corresponding to each of the audio frequency spectrums.
在一实施例中,所述音频系数信息获取单元包括子单元:频率转换单元和逆变换处理单元。In an embodiment, the audio coefficient information acquisition unit includes subunits: a frequency conversion unit and an inverse conversion processing unit.
频率转换单元,用于根据所述频率转换公式将每一所述音频频谱转换为对应的非线性音频频谱;逆变换处理单元,用于根据所述逆变换计算公式对每一所述非线性音频频谱进行逆变换得到与每一所述非线性音频频谱对应的多个音频系数作为每一所述音频频谱的音频系数信息。a frequency conversion unit, configured to convert each of the audio frequency spectra into a corresponding nonlinear audio frequency spectrum according to the frequency conversion formula; an inverse transform processing unit, configured to convert each of the non-linear audio frequencies according to the inverse transform calculation formula The frequency spectrum is inversely transformed to obtain a plurality of audio coefficients corresponding to each of the non-linear audio frequency spectra as audio coefficient information of each of the audio frequency spectra.
在一实施例中,所述感知系数信息获取单元包括子单元:频带能量向量获取单元和反变换处理单元。In an embodiment, the perceptual coefficient information obtaining unit includes subunits: a frequency band energy vector obtaining unit and an inverse transform processing unit.
频带能量向量获取单元,用于根据所述频率数组所包含的多个频率值对每一所述音频频谱进行滤波,得到每一所述音频频谱对应的频带能量向量;反变换处理单元,用于对每一所述音频频谱对应的频带能量向量进行压缩后根据所述反变换计算公式进行反变换得到每一所述音频频谱的感知系数信息。a frequency band energy vector obtaining unit, configured to filter each of the audio frequency spectra according to a plurality of frequency values included in the frequency array to obtain a frequency band energy vector corresponding to each of the audio frequency spectra; an inverse transform processing unit, used for After compressing the frequency band energy vector corresponding to each of the audio frequency spectrums, inverse transform is performed according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.
降维处理单元120,用于根据预设的降维数值对每一所述语音特征参数进行降维处理得到特征向量矩阵及与每一所述语句发音对应的降维特征参数。The dimension reduction processing unit 120 is configured to perform dimension reduction processing on each of the speech feature parameters according to a preset dimension reduction value to obtain a feature vector matrix and a dimension reduction feature parameter corresponding to the pronunciation of each sentence.
在一实施例中,所述降维处理单元120包括子单元:协方差矩阵获取单元、协方差矩阵求解单元、特征向量矩阵获取单元和矩阵计算单元。In one embodiment, the dimensionality reduction processing unit 120 includes subunits: a covariance matrix obtaining unit, a covariance matrix solving unit, an eigenvector matrix obtaining unit, and a matrix calculating unit.
协方差矩阵获取单元,用于将所有所述语音特征参数整合为一个参数矩阵并计算所述参数矩阵的协方差矩阵;协方差矩阵求解单元,用于求解所述协方差矩阵的协方差特征值及与每一所述协方差特征值对应的协方差特征向量;特征向量矩阵获取单元,用于根据选择所述协方差特征值最大且与所述降维数值相等的多个协方差特征值对应的协方差特征向量组合得到特征向量矩阵;矩阵计算单元,用于将所述参数矩阵与所述特征向量矩阵相乘,得到与每一所述语句发音对应的降维特征参数。A covariance matrix obtaining unit, used for integrating all the speech feature parameters into a parameter matrix and calculating the covariance matrix of the parameter matrix; a covariance matrix solving unit, used for solving the covariance eigenvalues of the covariance matrix and a covariance eigenvector corresponding to each of the covariance eigenvalues; an eigenvector matrix acquisition unit for selecting a plurality of covariance eigenvalues corresponding to the covariance eigenvalues that are the largest and equal to the dimensionality reduction value The covariance eigenvectors are combined to obtain an eigenvector matrix; a matrix calculation unit is used to multiply the parameter matrix and the eigenvector matrix to obtain a dimension reduction feature parameter corresponding to the pronunciation of each sentence.
模型训练单元130,用于根据每一所述语句发音的降维特征参数及预置的模型训练规则对初始化的声纹验证模型进行迭代训练得到训练后的声纹验证模型。The model training unit 130 is configured to perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each sentence and the preset model training rules to obtain a trained voiceprint verification model.
在一实施例中,所述模型训练单元130包括子单元:正样本获取单元、负样本获取单元、模型输出信息获取单元、损失值计算单元、损失值判断单元、参数值更新单元和声纹验证模型确定单元。In one embodiment, the model training unit 130 includes subunits: a positive sample acquisition unit, a negative sample acquisition unit, a model output information acquisition unit, a loss value calculation unit, a loss value judgment unit, a parameter value update unit, and a voiceprint verification unit. The model determines the unit.
正样本获取单元,用于从所述降维特征参数中随机选择同一考生的两个降维特征参数作为正样本;负样本获取单元,用于从所述降维特征参数中随机选择不同考生的两个降维特征参数作为负样本;模型输出信息获取单元,用于将所述正样本或所述负样本输入所述声纹验 证模型以获取对应的模型输出信息;损失值计算单元,用于根据所述损失值计算公式对所述模型输出信息进行计算以得到损失值;损失值判断单元,用于判断所述损失值是否小于所述损失阈值;参数值更新单元,用于若所述损失值不小于所述损失阈值,根据所述梯度计算公式、所述损失值计算得到所述初始化声纹验证模型中每一参数的更新值以更新所述参数的参数值,并返回执行所述将所述正样本或所述负样本输入所述声纹验证模型以获取对应的模型输出信息的步骤;声纹验证模型确定单元,用于若所述损失值小于所述损失阈值,将所述声纹验证模型确定为训练后的声纹验证模型。The positive sample acquisition unit is used to randomly select two dimensionality reduction feature parameters of the same candidate from the dimensionality reduction feature parameters as positive samples; the negative sample acquisition unit is used to randomly select different candidates from the dimensionality reduction feature parameters. Two dimensionality reduction feature parameters are used as negative samples; a model output information acquisition unit is used to input the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information; a loss value calculation unit is used for Calculate the model output information according to the loss value calculation formula to obtain a loss value; a loss value judgment unit is used to judge whether the loss value is less than the loss threshold value; a parameter value update unit is used to determine if the loss value is less than the loss value The value is not less than the loss threshold, and the update value of each parameter in the initialized voiceprint verification model is calculated according to the gradient calculation formula and the loss value to update the parameter value of the parameter, and returns to execute the The step of inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information; The fingerprint verification model is determined as the trained voiceprint verification model.
目标降维特征参数获取单元140,用于若接收到来自任一所述语音采集终端的待分析语音信息,根据所述提取规则及所述特征向量矩阵获取与所述待分析语音信息对应的目标降维特征参数。The target dimension reduction feature parameter obtaining unit 140 is configured to obtain the target corresponding to the voice information to be analyzed according to the extraction rule and the feature vector matrix if the voice information to be analyzed is received from any of the voice collection terminals. Dimensionality reduction feature parameters.
声纹验证结果获取单元150,用于根据预置的评分阈值及所述声纹验证模型验证所述目标降维特征参数与所述待分析语音信息对应考生的降维特征参数是否一致得到声纹验证结果。The voiceprint verification result obtaining unit 150 is configured to verify whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter of the examinee corresponding to the voice information to be analyzed according to the preset scoring threshold and the voiceprint verification model to obtain a voiceprint Validation results.
在一实施例中,所述声纹验证结果获取单元150包括子单元:输出信息获取单元、验证评分获取单元和验证评分判断单元。In one embodiment, the voiceprint verification result acquisition unit 150 includes subunits: an output information acquisition unit, a verification score acquisition unit, and a verification score judgment unit.
输出信息获取单元,用于将所述目标降维特征参数与所述待分析语音信息对应考生的任意一条降维特征参数输入所述声纹验证模型得到对应的输出信息;验证评分获取单元,用于根据所述输出信息计算得到与所述目标降维特征参数对应的验证评分;验证评分判断单元,用于判断所述验证评分是否大于评分阈值得到所述目标降维特征参数与所述降维特征参数是否一致的声纹验证结果。The output information acquisition unit is used for inputting the target dimensionality reduction feature parameter and any dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed into the voiceprint verification model to obtain corresponding output information; the verification score acquisition unit uses The verification score corresponding to the target dimension reduction feature parameter is calculated according to the output information; the verification score judgment unit is used for judging whether the verification score is greater than a scoring threshold to obtain the target dimension reduction feature parameter and the dimension reduction. The voiceprint verification result of whether the feature parameters are consistent.
目标文本信息获取单元160,用于根据预存的语音识别模型对所述待分析语音信息进行语音识别得到与所述待分析语音信息对应的目标文本信息。The target text information acquisition unit 160 is configured to perform speech recognition on the to-be-analyzed speech information according to a pre-stored speech recognition model to obtain target text information corresponding to the to-be-analyzed speech information.
文本判断结果获取单元170,用于根据预置的文本判断模型对所述目标文本信息是否包含作弊词汇进行判断得到文本判断结果。The text judgment result obtaining unit 170 is configured to judge whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result.
提示信息发送单元180,用于若所述声纹验证结果为不一致或所述文本判断结果为包含作弊词汇,则判定存在作弊行为并发出报警提示信息。The prompt information sending unit 180 is configured to determine that cheating behavior exists and issue an alarm prompt message if the voiceprint verification result is inconsistent or the text judgment result is that the text contains cheating words.
在本申请实施例所提供的基于语音识别的考试作弊识别装置应用上述基于语音识别的考试作弊识别方法,从每一考生的基本语音信息中获取每一语句发音的语音特征参数并进行降维得到降维特征参数,根据降维特征参数对初始化声纹验证模型进行训练,使用声纹验证模型对待分析语音信息的目标降维特征参数是否与相应考生的降维特征参数一致进行验证,对待分析语音信息进行语音识别得到的目标文本信息中是否包含作弊词汇进行判断,若声纹验证结果为不一致或文本判断结果为包含作弊词汇,则判定存在作弊行为并发送报警提示信息。通过上述方法,可基于语音识别对考生的待识别语音信息进行识别,以实现对考生之间的交流作弊行为进行实时、准确判断。The speech recognition-based exam cheating recognition device provided by the embodiment of the present application applies the above-mentioned speech recognition-based exam cheating recognition method, obtains the speech feature parameters of each sentence pronunciation from the basic speech information of each examinee, and performs dimensionality reduction to obtain Dimensionality reduction feature parameters, train the initialized voiceprint verification model according to the dimensionality reduction feature parameters, and use the voiceprint verification model to verify whether the target dimensionality reduction feature parameters of the voice information to be analyzed are consistent with the dimensionality reduction feature parameters of the corresponding candidates, and the speech to be analyzed is verified. It is judged whether the target text information obtained by voice recognition of the information contains cheating words. If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is sent. Through the above method, the speech information to be recognized of the examinee can be recognized based on the speech recognition, so as to realize the real-time and accurate judgment of the cheating behavior of the examinees.
上述基于语音识别的考试作弊识别装置可以实现为计算机程序的形式,该计算机程序可以在如图10所示的计算机设备上运行。The above-mentioned apparatus for recognizing exam cheating based on speech recognition can be implemented in the form of a computer program, and the computer program can be executed on a computer device as shown in FIG. 10 .
请参阅图10,图10是本申请实施例提供的计算机设备的示意性框图。该计算机设备可以是用于执行基于语音识别的考试作弊识别方法以对基于语音识别对考试作弊进行判断的用户终端10。Please refer to FIG. 10. FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the present application. The computer device may be a user terminal 10 for executing a speech recognition-based exam cheating recognition method to judge exam cheating based on speech recognition.
参阅图10,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。Referring to FIG. 10 , the computer device 500 includes a processor 502 , a memory and a network interface 505 connected by a system bus 501 , wherein the memory may include a non-volatile storage medium 503 and an internal memory 504 .
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行基于语音识别的考试作弊识别方法。The nonvolatile storage medium 503 can store an operating system 5031 and a computer program 5032 . When executed, the computer program 5032 can cause the processor 502 to execute a speech recognition-based exam cheating recognition method.
该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。The processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500 .
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算 机程序5032被处理器502执行时,可使得处理器502执行基于语音识别的考试作弊识别方法。The internal memory 504 provides an environment for the execution of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute the speech recognition-based exam cheating recognition method.
该网络接口505用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The network interface 505 is used for network communication, such as providing transmission of data information. Those skilled in the art can understand that the structure shown in FIG. 10 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 may include more or fewer components than shown, or combine certain components, or have a different arrangement of components.
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现上述的基于语音识别的考试作弊识别方法中对应的功能。Wherein, the processor 502 is configured to run the computer program 5032 stored in the memory, so as to realize the corresponding functions in the above-mentioned speech recognition-based examination cheating recognition method.
本领域技术人员可以理解,图10中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图10所示实施例一致,在此不再赘述。Those skilled in the art can understand that the embodiment of the computer device shown in FIG. 10 does not constitute a limitation on the specific structure of the computer device. Either some components are combined, or different components are arranged. For example, in some embodiments, the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are the same as those of the embodiment shown in FIG. 10 , which will not be repeated here.
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that, in this embodiment of the present application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein, the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.
在本申请的另一实施例中提供计算机可读存储介质。该计算机可读存储介质可以为非易失性的计算机可读存储介质,也可以是易失性的计算机可读存储介质。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现上述的基于语音识别的考试作弊识别方法中所包含的步骤。In another embodiment of the present application, a computer-readable storage medium is provided. The computer-readable storage medium may be a non-volatile computer-readable storage medium, or a volatile computer-readable storage medium. The computer-readable storage medium stores a computer program, wherein when the computer program is executed by the processor, the steps included in the above-mentioned speech recognition-based exam cheating recognition method are implemented.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the above-described devices, devices and units, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here. Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the differences between hardware and software Interchangeability, the above description has generally described the components and steps of each example in terms of function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为逻辑功能划分,实际实现时可以有另外的划分方式,也可以将具有相同功能的单元集合成一个单元,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。In the several embodiments provided in this application, it should be understood that the disclosed apparatus, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only logical function division. In actual implementation, there may be other division methods, or units with the same function may be grouped into one Units, such as multiple units or components, may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions of the embodiments of the present application.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该 计算机软件产品存储在一个计算机可读存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的计算机可读存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application are essentially or part of contributions to the prior art, or all or part of the technical solutions can be embodied in the form of software products, and the computer software products are stored in a computer The read storage medium includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned computer-readable storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), magnetic disk or optical disk and other media that can store program codes.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in the present application. Modifications or substitutions shall be covered by the protection scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (20)

  1. 一种基于语音识别的考试作弊识别方法,应用于用户终端,所述用户终端与每一所述考生的语音采集终端通过网络连接以进行数据信息的传输,其中,所述方法包括:A method for recognizing cheating in exams based on speech recognition, applied to a user terminal, the user terminal is connected with the voice collection terminal of each candidate through a network to transmit data information, wherein the method includes:
    获取所述语音采集终端采集得到的与每一考生对应的基本语音信息,根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数;Obtain the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each section of the sentence from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule;
    根据预设的降维数值对每一所述语音特征参数进行降维处理得到特征向量矩阵及与每一所述语句发音对应的降维特征参数;Perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;
    根据每一所述语句发音的降维特征参数及预置的模型训练规则对初始化的声纹验证模型进行迭代训练得到训练后的声纹验证模型;Perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each of the sentences and the preset model training rules to obtain the trained voiceprint verification model;
    若接收到来自任一所述语音采集终端的待分析语音信息,根据所述提取规则及所述特征向量矩阵获取与所述待分析语音信息对应的目标降维特征参数;If receiving the voice information to be analyzed from any of the voice collection terminals, obtain the target dimension reduction feature parameter corresponding to the voice information to be analyzed according to the extraction rule and the feature vector matrix;
    根据预置的评分阈值及所述声纹验证模型验证所述目标降维特征参数与所述待分析语音信息对应考生的降维特征参数是否一致得到声纹验证结果;According to the preset scoring threshold and the voiceprint verification model, verify whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed to obtain a voiceprint verification result;
    根据预存的语音识别模型对所述待分析语音信息进行语音识别得到与所述待分析语音信息对应的目标文本信息;Perform voice recognition on the voice information to be analyzed according to a pre-stored voice recognition model to obtain target text information corresponding to the voice information to be analyzed;
    根据预置的文本判断模型对所述目标文本信息是否包含作弊词汇进行判断得到文本判断结果;Judging whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;
    若所述声纹验证结果为不一致或所述文本判断结果为包含作弊词汇,则判定存在作弊行为并发出报警提示信息。If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued.
  2. 根据权利要求1所述的基于语音识别的考试作弊识别方法,其中,所述提取规则包括频谱转换规则、音频系数提取规则及感知系数提取规则,每一所述语音特征参数包含一个所述语句发音的音频系数信息及感知系数信息,所述根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数,包括:The method for recognizing cheating in exams based on speech recognition according to claim 1, wherein the extraction rules include spectrum conversion rules, audio coefficient extraction rules and perceptual coefficient extraction rules, and each of the speech feature parameters includes a pronunciation of the sentence The audio coefficient information and perceptual coefficient information, described according to the preset extraction rule, from the pronunciation of the multi-segment sentences included in each basic voice information, obtain the speech feature parameters corresponding to the pronunciation of each segment of the sentence, including:
    对每一所述语句发音进行分帧处理得到对应的多帧音频信息;Framing processing is carried out to each described sentence pronunciation to obtain corresponding multi-frame audio information;
    根据所述频谱转换规则将每一所述语句发音对应的多帧音频信息转换为音频频谱;Convert the multi-frame audio information corresponding to the pronunciation of each statement into an audio spectrum according to the spectrum conversion rule;
    根据所述音频系数提取规则获取每一所述音频频谱对应的音频系数信息;Obtain audio coefficient information corresponding to each of the audio frequency spectrums according to the audio coefficient extraction rule;
    根据所述感知系数提取规则获取每一所述音频频谱对应的感知系数信息。The perceptual coefficient information corresponding to each of the audio frequency spectrums is acquired according to the perceptual coefficient extraction rule.
  3. 根据权利要求2所述的基于语音识别的考试作弊识别方法,其中,所述音频系数提取规则包括频率转换公式及逆变换计算公式,所述根据所述音频系数提取规则获取每一所述音频频谱对应的音频系数信息,包括:The method for recognizing cheating in exams based on speech recognition according to claim 2, wherein the audio coefficient extraction rule includes a frequency conversion formula and an inverse conversion calculation formula, and each audio frequency spectrum is obtained according to the audio coefficient extraction rule Corresponding audio coefficient information, including:
    根据所述频率转换公式将每一所述音频频谱转换为对应的非线性音频频谱;converting each of the audio frequency spectra into a corresponding non-linear audio frequency spectrum according to the frequency conversion formula;
    根据所述逆变换计算公式对每一所述非线性音频频谱进行逆变换得到与每一所述非线性音频频谱对应的多个音频系数作为每一所述音频频谱的音频系数信息。Perform inverse transformation on each of the nonlinear audio spectrums according to the inverse transform calculation formula to obtain a plurality of audio coefficients corresponding to each of the nonlinear audio spectrums as audio coefficient information of each of the audio spectrums.
  4. 根据权利要求2所述的基于语音识别的考试作弊识别方法,其中,所述感知系数提取规则包括频率数组及反变换计算公式,所述根据所述感知系数提取规则获取每一所述音频频谱对应的感知系数信息,包括:The method for detecting cheating in exams based on speech recognition according to claim 2, wherein the perceptual coefficient extraction rule comprises a frequency array and an inverse transform calculation formula, and the acquisition of each audio frequency spectrum corresponding to the perceptual coefficient extraction rule The perception coefficient information of , including:
    根据所述频率数组所包含的多个频率值对每一所述音频频谱进行滤波,得到每一所述音频频谱对应的频带能量向量;Filtering each of the audio frequency spectra according to a plurality of frequency values included in the frequency array to obtain a frequency band energy vector corresponding to each of the audio frequency spectra;
    对每一所述音频频谱对应的频带能量向量进行压缩后根据所述反变换计算公式进行反变换得到每一所述音频频谱的感知系数信息。After compressing the frequency band energy vector corresponding to each of the audio frequency spectrums, inverse transform is performed according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.
  5. 根据权利要求1所述的基于语音识别的考试作弊识别方法,其中,所述根据预设的降维数值对每一所述语音特征参数进行降维处理得到特征向量矩阵及与每一所述语句发音对应的降维特征参数,包括:The method for recognizing cheating in exams based on speech recognition according to claim 1, wherein the dimension reduction process is performed on each of the speech feature parameters according to a preset dimension reduction value to obtain a feature vector matrix and a correlation with each of the sentences The dimensionality reduction feature parameters corresponding to pronunciation, including:
    将所有所述语音特征参数整合为一个参数矩阵并计算所述参数矩阵的协方差矩阵;Integrate all described speech characteristic parameters into a parameter matrix and calculate the covariance matrix of described parameter matrix;
    求解所述协方差矩阵的协方差特征值及与每一所述协方差特征值对应的协方差特征向量;Solving the covariance eigenvalues of the covariance matrix and the covariance eigenvectors corresponding to each of the covariance eigenvalues;
    根据选择所述协方差特征值最大且与所述降维数值相等的多个协方差特征值对应的协方差特征向量组合得到特征向量矩阵;An eigenvector matrix is obtained according to the combination of covariance eigenvectors corresponding to a plurality of covariance eigenvalues with the largest covariance eigenvalue and the dimensionality reduction value being equal to the selection;
    将所述参数矩阵与所述特征向量矩阵相乘,得到与每一所述语句发音对应的降维特征参数。Multiplying the parameter matrix and the feature vector matrix to obtain a dimension reduction feature parameter corresponding to the pronunciation of each sentence.
  6. 根据权利要求1所述的基于语音识别的考试作弊识别方法,其中,所述模型训练规则包括损失值计算公式、梯度计算公式及损失阈值,所述根据每一所述语句发音的降维特征参数及预置的模型训练规则对初始化的声纹验证模型进行迭代训练得到训练后的声纹验证模型,包括:The method for recognizing cheating in exams based on speech recognition according to claim 1, wherein the model training rule includes a loss value calculation formula, a gradient calculation formula and a loss threshold, and the dimension reduction feature parameter according to the pronunciation of each sentence and the preset model training rules to iteratively train the initialized voiceprint verification model to obtain the trained voiceprint verification model, including:
    从所述降维特征参数中随机选择同一考生的两个降维特征参数作为正样本;Two dimensionality reduction feature parameters of the same candidate are randomly selected as positive samples from the dimensionality reduction feature parameters;
    从所述降维特征参数中随机选择不同考生的两个降维特征参数作为负样本;Two dimensionality reduction feature parameters of different candidates are randomly selected as negative samples from the dimensionality reduction feature parameters;
    将所述正样本或所述负样本输入所述声纹验证模型以获取对应的模型输出信息;Inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information;
    根据所述损失值计算公式对所述模型输出信息进行计算以得到损失值;Calculate the model output information according to the loss value calculation formula to obtain a loss value;
    判断所述损失值是否小于所述损失阈值;Determine whether the loss value is less than the loss threshold;
    若所述损失值不小于所述损失阈值,根据所述梯度计算公式、所述损失值计算得到所述初始化声纹验证模型中每一参数的更新值以更新所述参数的参数值,并返回执行所述将所述正样本或所述负样本输入所述声纹验证模型以获取对应的模型输出信息的步骤;If the loss value is not less than the loss threshold, calculate the update value of each parameter in the initialized voiceprint verification model according to the gradient calculation formula and the loss value to update the parameter value of the parameter, and return performing the step of inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information;
    若所述损失值小于所述损失阈值,将所述声纹验证模型确定为训练后的声纹验证模型。If the loss value is less than the loss threshold, the voiceprint verification model is determined as the trained voiceprint verification model.
  7. 根据权利要求1所述的基于语音识别的考试作弊识别方法,其中,所述根据预置的评分阈值及所述声纹验证模型验证所述目标降维特征参数与所述待分析语音信息对应考生的降维特征参数是否一致得到声纹验证结果,包括:The method for recognizing cheating in exams based on speech recognition according to claim 1, wherein the verification of the target dimensionality reduction feature parameter and the candidate corresponding to the voice information to be analyzed according to a preset scoring threshold and the voiceprint verification model Whether the dimensionality reduction feature parameters of are consistent to obtain the voiceprint verification results, including:
    将所述目标降维特征参数与所述待分析语音信息对应考生的任意一条降维特征参数输入所述声纹验证模型得到对应的输出信息;Inputting the target dimensionality reduction feature parameter and any dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed into the voiceprint verification model to obtain corresponding output information;
    根据所述输出信息计算得到与所述目标降维特征参数对应的验证评分;The verification score corresponding to the target dimensionality reduction feature parameter is obtained by calculating according to the output information;
    判断所述验证评分是否大于评分阈值得到所述目标降维特征参数与所述降维特征参数是否一致的声纹验证结果。Judging whether the verification score is greater than a score threshold, obtains a voiceprint verification result indicating whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter.
  8. 一种基于语音识别的考试作弊识别装置,包括:A test cheating recognition device based on speech recognition, comprising:
    语音特征参数获取单元,用于获取所述语音采集终端采集得到的与每一考生对应的基本语音信息,根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数;The voice feature parameter acquisition unit is used to obtain the basic voice information corresponding to each candidate collected by the voice acquisition terminal, and obtains the basic voice information corresponding to each candidate from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule. The speech feature parameters corresponding to the pronunciation;
    降维处理单元,用于根据预设的降维数值对每一所述语音特征参数进行降维处理得到特征向量矩阵及与每一所述语句发音对应的降维特征参数;A dimensionality reduction processing unit, configured to perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;
    模型训练单元,用于根据每一所述语句发音的降维特征参数及预置的模型训练规则对初始化的声纹验证模型进行迭代训练得到训练后的声纹验证模型;A model training unit, configured to perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each said sentence and the preset model training rules to obtain the trained voiceprint verification model;
    目标降维特征参数获取单元,用于若接收到来自任一所述语音采集终端的待分析语音信息,根据所述提取规则及所述特征向量矩阵获取与所述待分析语音信息对应的目标降维特征参数;The target dimensionality reduction feature parameter acquisition unit is used to obtain the target reduction corresponding to the to-be-analyzed voice information according to the extraction rule and the feature vector matrix if the to-be-analyzed voice information is received from any of the voice collection terminals. dimensional feature parameters;
    声纹验证结果获取单元,用于根据预置的评分阈值及所述声纹验证模型验证所述目标降维特征参数与所述待分析语音信息对应考生的降维特征参数是否一致得到声纹验证结果;The voiceprint verification result obtaining unit is used for verifying whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter of the examinee corresponding to the voice information to be analyzed according to the preset scoring threshold and the voiceprint verification model to obtain the voiceprint verification result;
    目标文本信息获取单元,用于根据预存的语音识别模型对所述待分析语音信息进行语音识别得到与所述待分析语音信息对应的目标文本信息;a target text information acquisition unit, configured to perform speech recognition on the to-be-analyzed speech information according to a pre-stored speech recognition model to obtain target text information corresponding to the to-be-analyzed speech information;
    文本判断结果获取单元,用于根据预置的文本判断模型对所述目标文本信息是否包含作弊词汇进行判断得到文本判断结果;a text judgment result obtaining unit, configured to judge whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;
    提示信息发送单元,用于若所述声纹验证结果为不一致或所述文本判断结果为包含作弊词汇,则判定存在作弊行为并发出报警提示信息。A prompt information sending unit, configured to determine that cheating behavior exists and issue an alarm prompt message if the voiceprint verification result is inconsistent or the text judgment result contains cheating words.
  9. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现以下步骤:A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the following steps when executing the computer program:
    获取所述语音采集终端采集得到的与每一考生对应的基本语音信息,根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数;Obtain the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each section of the sentence from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule;
    根据预设的降维数值对每一所述语音特征参数进行降维处理得到特征向量矩阵及与每一所述语句发音对应的降维特征参数;Perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;
    根据每一所述语句发音的降维特征参数及预置的模型训练规则对初始化的声纹验证模型进行迭代训练得到训练后的声纹验证模型;Perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each of the sentences and the preset model training rules to obtain the trained voiceprint verification model;
    若接收到来自任一所述语音采集终端的待分析语音信息,根据所述提取规则及所述特征向量矩阵获取与所述待分析语音信息对应的目标降维特征参数;If receiving the voice information to be analyzed from any of the voice collection terminals, obtain the target dimension reduction feature parameter corresponding to the voice information to be analyzed according to the extraction rule and the feature vector matrix;
    根据预置的评分阈值及所述声纹验证模型验证所述目标降维特征参数与所述待分析语音信息对应考生的降维特征参数是否一致得到声纹验证结果;According to the preset scoring threshold and the voiceprint verification model, verify whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed to obtain a voiceprint verification result;
    根据预存的语音识别模型对所述待分析语音信息进行语音识别得到与所述待分析语音信息对应的目标文本信息;Perform voice recognition on the voice information to be analyzed according to a pre-stored voice recognition model to obtain target text information corresponding to the voice information to be analyzed;
    根据预置的文本判断模型对所述目标文本信息是否包含作弊词汇进行判断得到文本判断结果;Judging whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;
    若所述声纹验证结果为不一致或所述文本判断结果为包含作弊词汇,则判定存在作弊行为并发出报警提示信息。If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued.
  10. 根据权利要求9所述的计算机设备,其中,所述提取规则包括频谱转换规则、音频系数提取规则及感知系数提取规则,每一所述语音特征参数包含一个所述语句发音的音频系数信息及感知系数信息,所述根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数,包括:The computer device according to claim 9, wherein the extraction rules include spectrum conversion rules, audio coefficient extraction rules and perceptual coefficient extraction rules, and each of the speech feature parameters includes audio coefficient information and perceptual coefficient information of the pronunciation of the sentence. The coefficient information, described according to the preset extraction rule, obtains the speech feature parameters corresponding to the pronunciation of each paragraph of the sentence from the pronunciation of the multi-segment sentences included in each basic speech information, including:
    对每一所述语句发音进行分帧处理得到对应的多帧音频信息;Framing processing is carried out to each described sentence pronunciation to obtain corresponding multi-frame audio information;
    根据所述频谱转换规则将每一所述语句发音对应的多帧音频信息转换为音频频谱;Convert the multi-frame audio information corresponding to the pronunciation of each statement into an audio spectrum according to the spectrum conversion rule;
    根据所述音频系数提取规则获取每一所述音频频谱对应的音频系数信息;Obtain audio coefficient information corresponding to each of the audio frequency spectrums according to the audio coefficient extraction rule;
    根据所述感知系数提取规则获取每一所述音频频谱对应的感知系数信息。The perceptual coefficient information corresponding to each of the audio frequency spectrums is acquired according to the perceptual coefficient extraction rule.
  11. 根据权利要求10所述的计算机设备,其中,所述音频系数提取规则包括频率转换公式及逆变换计算公式,所述根据所述音频系数提取规则获取每一所述音频频谱对应的音频系数信息,包括:The computer device according to claim 10, wherein the audio coefficient extraction rule includes a frequency conversion formula and an inverse conversion calculation formula, and the audio coefficient information corresponding to each audio frequency spectrum is obtained according to the audio coefficient extraction rule, include:
    根据所述频率转换公式将每一所述音频频谱转换为对应的非线性音频频谱;converting each of the audio frequency spectra into a corresponding non-linear audio frequency spectrum according to the frequency conversion formula;
    根据所述逆变换计算公式对每一所述非线性音频频谱进行逆变换得到与每一所述非线性音频频谱对应的多个音频系数作为每一所述音频频谱的音频系数信息。Perform inverse transformation on each of the nonlinear audio spectrums according to the inverse transform calculation formula to obtain a plurality of audio coefficients corresponding to each of the nonlinear audio spectrums as audio coefficient information of each of the audio spectrums.
  12. 根据权利要求10所述的计算机设备,其中,所述感知系数提取规则包括频率数组及反变换计算公式,所述根据所述感知系数提取规则获取每一所述音频频谱对应的感知系数信息,包括:The computer device according to claim 10, wherein the perceptual coefficient extraction rule comprises a frequency array and an inverse transform calculation formula, and the acquisition of perceptual coefficient information corresponding to each of the audio frequency spectrum according to the perceptual coefficient extraction rule comprises: :
    根据所述频率数组所包含的多个频率值对每一所述音频频谱进行滤波,得到每一所述音频频谱对应的频带能量向量;Filtering each of the audio frequency spectra according to a plurality of frequency values included in the frequency array to obtain a frequency band energy vector corresponding to each of the audio frequency spectra;
    对每一所述音频频谱对应的频带能量向量进行压缩后根据所述反变换计算公式进行反变换得到每一所述音频频谱的感知系数信息。After compressing the frequency band energy vector corresponding to each of the audio frequency spectrums, inverse transform is performed according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.
  13. 根据权利要求9所述的计算机设备,其中,所述根据预设的降维数值对每一所述语音特征参数进行降维处理得到特征向量矩阵及与每一所述语句发音对应的降维特征参数,包括:The computer device according to claim 9, wherein the dimension reduction process is performed on each of the speech feature parameters according to a preset dimension reduction value to obtain a feature vector matrix and a dimension reduction feature corresponding to the pronunciation of each sentence parameters, including:
    将所有所述语音特征参数整合为一个参数矩阵并计算所述参数矩阵的协方差矩阵;Integrate all described speech characteristic parameters into a parameter matrix and calculate the covariance matrix of described parameter matrix;
    求解所述协方差矩阵的协方差特征值及与每一所述协方差特征值对应的协方差特征向量;Solving the covariance eigenvalues of the covariance matrix and the covariance eigenvectors corresponding to each of the covariance eigenvalues;
    根据选择所述协方差特征值最大且与所述降维数值相等的多个协方差特征值对应的协方差特征向量组合得到特征向量矩阵;An eigenvector matrix is obtained according to the combination of covariance eigenvectors corresponding to a plurality of covariance eigenvalues with the largest covariance eigenvalue and the dimensionality reduction value being equal to the selection;
    将所述参数矩阵与所述特征向量矩阵相乘,得到与每一所述语句发音对应的降维特征参数。Multiplying the parameter matrix and the feature vector matrix to obtain a dimension reduction feature parameter corresponding to the pronunciation of each sentence.
  14. 根据权利要求9所述的计算机设备,其中,所述模型训练规则包括损失值计算公式、梯度计算公式及损失阈值,所述根据每一所述语句发音的降维特征参数及预置的模型训练规则对初始化的声纹验证模型进行迭代训练得到训练后的声纹验证模型,包括:The computer device according to claim 9, wherein the model training rule includes a loss value calculation formula, a gradient calculation formula and a loss threshold, and the model training is based on the dimensionality reduction feature parameters of the pronunciation of each sentence and a preset model The rules iteratively train the initialized voiceprint verification model to obtain the trained voiceprint verification model, including:
    从所述降维特征参数中随机选择同一考生的两个降维特征参数作为正样本;Two dimensionality reduction feature parameters of the same candidate are randomly selected as positive samples from the dimensionality reduction feature parameters;
    从所述降维特征参数中随机选择不同考生的两个降维特征参数作为负样本;Two dimensionality reduction feature parameters of different candidates are randomly selected as negative samples from the dimensionality reduction feature parameters;
    将所述正样本或所述负样本输入所述声纹验证模型以获取对应的模型输出信息;Inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information;
    根据所述损失值计算公式对所述模型输出信息进行计算以得到损失值;Calculate the model output information according to the loss value calculation formula to obtain a loss value;
    判断所述损失值是否小于所述损失阈值;Determine whether the loss value is less than the loss threshold;
    若所述损失值不小于所述损失阈值,根据所述梯度计算公式、所述损失值计算得到所述初始化声纹验证模型中每一参数的更新值以更新所述参数的参数值,并返回执行所述将所述正样本或所述负样本输入所述声纹验证模型以获取对应的模型输出信息的步骤;If the loss value is not less than the loss threshold, calculate the update value of each parameter in the initialized voiceprint verification model according to the gradient calculation formula and the loss value to update the parameter value of the parameter, and return performing the step of inputting the positive sample or the negative sample into the voiceprint verification model to obtain corresponding model output information;
    若所述损失值小于所述损失阈值,将所述声纹验证模型确定为训练后的声纹验证模型。If the loss value is less than the loss threshold, the voiceprint verification model is determined as the trained voiceprint verification model.
  15. 根据权利要求9所述的计算机设备,其中,所述根据预置的评分阈值及所述声纹验证模型验证所述目标降维特征参数与所述待分析语音信息对应考生的降维特征参数是否一致得到声纹验证结果,包括:The computer device according to claim 9, wherein, according to the preset scoring threshold and the voiceprint verification model, it is to verify whether the target dimensionality reduction feature parameter and the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed are not Consistently obtain voiceprint verification results, including:
    将所述目标降维特征参数与所述待分析语音信息对应考生的任意一条降维特征参数输入所述声纹验证模型得到对应的输出信息;Inputting the target dimensionality reduction feature parameter and any dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed into the voiceprint verification model to obtain corresponding output information;
    根据所述输出信息计算得到与所述目标降维特征参数对应的验证评分;The verification score corresponding to the target dimensionality reduction feature parameter is obtained by calculating according to the output information;
    判断所述验证评分是否大于评分阈值得到所述目标降维特征参数与所述降维特征参数是否一致的声纹验证结果。Judging whether the verification score is greater than a score threshold, obtains a voiceprint verification result indicating whether the target dimension reduction feature parameter is consistent with the dimension reduction feature parameter.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下操作:A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program that, when executed by a processor, causes the processor to perform the following operations:
    获取所述语音采集终端采集得到的与每一考生对应的基本语音信息,根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数;Obtain the basic voice information corresponding to each candidate collected by the voice collection terminal, and obtain the voice feature parameters corresponding to the pronunciation of each section of the sentence from the pronunciation of multiple sentences contained in each basic voice information according to a preset extraction rule;
    根据预设的降维数值对每一所述语音特征参数进行降维处理得到特征向量矩阵及与每一所述语句发音对应的降维特征参数;Perform dimensionality reduction processing on each of the speech feature parameters according to a preset dimensionality reduction value to obtain a feature vector matrix and a dimensionality reduction feature parameter corresponding to the pronunciation of each of the sentences;
    根据每一所述语句发音的降维特征参数及预置的模型训练规则对初始化的声纹验证模型进行迭代训练得到训练后的声纹验证模型;Perform iterative training on the initialized voiceprint verification model according to the dimensionality reduction feature parameters of the pronunciation of each of the sentences and the preset model training rules to obtain the trained voiceprint verification model;
    若接收到来自任一所述语音采集终端的待分析语音信息,根据所述提取规则及所述特征向量矩阵获取与所述待分析语音信息对应的目标降维特征参数;If receiving the voice information to be analyzed from any of the voice collection terminals, obtain the target dimension reduction feature parameter corresponding to the voice information to be analyzed according to the extraction rule and the feature vector matrix;
    根据预置的评分阈值及所述声纹验证模型验证所述目标降维特征参数与所述待分析语音信息对应考生的降维特征参数是否一致得到声纹验证结果;According to the preset scoring threshold and the voiceprint verification model, verify whether the target dimensionality reduction feature parameter is consistent with the dimensionality reduction feature parameter of the examinee corresponding to the voice information to be analyzed to obtain a voiceprint verification result;
    根据预存的语音识别模型对所述待分析语音信息进行语音识别得到与所述待分析语音信息对应的目标文本信息;Perform voice recognition on the voice information to be analyzed according to a pre-stored voice recognition model to obtain target text information corresponding to the voice information to be analyzed;
    根据预置的文本判断模型对所述目标文本信息是否包含作弊词汇进行判断得到文本判断结果;Judging whether the target text information contains cheating words according to a preset text judgment model to obtain a text judgment result;
    若所述声纹验证结果为不一致或所述文本判断结果为包含作弊词汇,则判定存在作弊行为并发出报警提示信息。If the voiceprint verification result is inconsistent or the text judgment result is that it contains cheating words, it is determined that cheating behavior exists and an alarm prompt message is issued.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述提取规则包括频谱转换规则、音频系数提取规则及感知系数提取规则,每一所述语音特征参数包含一个所述语句发音 的音频系数信息及感知系数信息,所述根据预置的提取规则从每一基本语音信息包含的多段语句发音中获取与每一段语句发音对应的语音特征参数,包括:The computer-readable storage medium according to claim 16, wherein the extraction rules include spectral conversion rules, audio coefficient extraction rules, and perceptual coefficient extraction rules, and each of the speech feature parameters includes an audio coefficient for the pronunciation of the sentence Information and perception coefficient information, described according to preset extraction rules from the pronunciation of the multi-segment sentences included in each basic voice information to obtain the speech feature parameters corresponding to the pronunciation of each segment of the sentence, including:
    对每一所述语句发音进行分帧处理得到对应的多帧音频信息;Framing processing is carried out to each described sentence pronunciation to obtain corresponding multi-frame audio information;
    根据所述频谱转换规则将每一所述语句发音对应的多帧音频信息转换为音频频谱;Convert the multi-frame audio information corresponding to the pronunciation of each statement into an audio spectrum according to the spectrum conversion rule;
    根据所述音频系数提取规则获取每一所述音频频谱对应的音频系数信息;Obtain audio coefficient information corresponding to each of the audio frequency spectrums according to the audio coefficient extraction rule;
    根据所述感知系数提取规则获取每一所述音频频谱对应的感知系数信息。The perceptual coefficient information corresponding to each of the audio frequency spectrums is acquired according to the perceptual coefficient extraction rule.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述音频系数提取规则包括频率转换公式及逆变换计算公式,所述根据所述音频系数提取规则获取每一所述音频频谱对应的音频系数信息,包括:The computer-readable storage medium according to claim 17, wherein the audio coefficient extraction rule comprises a frequency conversion formula and an inverse conversion calculation formula, and the audio frequency corresponding to each audio frequency spectrum is acquired according to the audio coefficient extraction rule Coefficient information, including:
    根据所述频率转换公式将每一所述音频频谱转换为对应的非线性音频频谱;converting each of the audio frequency spectra into a corresponding non-linear audio frequency spectrum according to the frequency conversion formula;
    根据所述逆变换计算公式对每一所述非线性音频频谱进行逆变换得到与每一所述非线性音频频谱对应的多个音频系数作为每一所述音频频谱的音频系数信息。Perform inverse transformation on each of the nonlinear audio spectrums according to the inverse transform calculation formula to obtain a plurality of audio coefficients corresponding to each of the nonlinear audio spectrums as audio coefficient information of each of the audio spectrums.
  19. 根据权利要求17所述的计算机可读存储介质,其中,所述感知系数提取规则包括频率数组及反变换计算公式,所述根据所述感知系数提取规则获取每一所述音频频谱对应的感知系数信息,包括:The computer-readable storage medium according to claim 17, wherein the perceptual coefficient extraction rule comprises a frequency array and an inverse transform calculation formula, and the perceptual coefficient corresponding to each audio frequency spectrum is acquired according to the perceptual coefficient extraction rule information, including:
    根据所述频率数组所包含的多个频率值对每一所述音频频谱进行滤波,得到每一所述音频频谱对应的频带能量向量;Filtering each of the audio frequency spectra according to a plurality of frequency values included in the frequency array to obtain a frequency band energy vector corresponding to each of the audio frequency spectra;
    对每一所述音频频谱对应的频带能量向量进行压缩后根据所述反变换计算公式进行反变换得到每一所述音频频谱的感知系数信息。After compressing the frequency band energy vector corresponding to each of the audio frequency spectrums, inverse transform is performed according to the inverse transform calculation formula to obtain perceptual coefficient information of each of the audio frequency spectrums.
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述根据预设的降维数值对每一所述语音特征参数进行降维处理得到特征向量矩阵及与每一所述语句发音对应的降维特征参数,包括:The computer-readable storage medium according to claim 16, wherein the feature vector matrix and the corresponding pronunciation of each sentence are obtained by performing dimension reduction processing on each of the speech feature parameters according to a preset dimension reduction value. Dimensionality reduction feature parameters, including:
    将所有所述语音特征参数整合为一个参数矩阵并计算所述参数矩阵的协方差矩阵;Integrate all described speech characteristic parameters into a parameter matrix and calculate the covariance matrix of described parameter matrix;
    求解所述协方差矩阵的协方差特征值及与每一所述协方差特征值对应的协方差特征向量;Solving the covariance eigenvalues of the covariance matrix and the covariance eigenvectors corresponding to each of the covariance eigenvalues;
    根据选择所述协方差特征值最大且与所述降维数值相等的多个协方差特征值对应的协方差特征向量组合得到特征向量矩阵;An eigenvector matrix is obtained according to the combination of covariance eigenvectors corresponding to a plurality of covariance eigenvalues that have the largest covariance eigenvalue and are equal to the dimensionality reduction value;
    将所述参数矩阵与所述特征向量矩阵相乘,得到与每一所述语句发音对应的降维特征参数。Multiplying the parameter matrix and the feature vector matrix to obtain a dimension reduction feature parameter corresponding to the pronunciation of each sentence.
PCT/CN2021/097100 2020-12-16 2021-05-31 Examination cheating recognition method and apparatus based on speech recognition, and computer device WO2022127042A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011490833.2 2020-12-16
CN202011490833.2A CN112669820B (en) 2020-12-16 2020-12-16 Examination cheating recognition method and device based on voice recognition and computer equipment

Publications (1)

Publication Number Publication Date
WO2022127042A1 true WO2022127042A1 (en) 2022-06-23

Family

ID=75404281

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097100 WO2022127042A1 (en) 2020-12-16 2021-05-31 Examination cheating recognition method and apparatus based on speech recognition, and computer device

Country Status (2)

Country Link
CN (1) CN112669820B (en)
WO (1) WO2022127042A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253474A (en) * 2023-07-06 2023-12-19 北京梦见星科技有限公司 Online examination cheating behavior detection system and detection method based on voice recognition

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669820B (en) * 2020-12-16 2023-08-04 平安科技(深圳)有限公司 Examination cheating recognition method and device based on voice recognition and computer equipment
CN113345434A (en) * 2021-05-31 2021-09-03 平安科技(深圳)有限公司 Network appointment vehicle user alarm method and device, computer equipment and storage medium
CN116153312A (en) * 2023-03-05 2023-05-23 广州网才信息技术有限公司 Online pen test method and device using voice recognition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080120109A1 (en) * 2006-11-16 2008-05-22 Institute For Information Industry Speech recognition device, method, and computer readable medium for adjusting speech models with selected speech data
CN107368724A (en) * 2017-06-14 2017-11-21 广东数相智能科技有限公司 Anti- cheating network research method, electronic equipment and storage medium based on Application on Voiceprint Recognition
CN107610707A (en) * 2016-12-15 2018-01-19 平安科技(深圳)有限公司 A kind of method for recognizing sound-groove and device
CN108922515A (en) * 2018-05-31 2018-11-30 平安科技(深圳)有限公司 Speech model training method, audio recognition method, device, equipment and medium
CN111833884A (en) * 2020-05-27 2020-10-27 北京三快在线科技有限公司 Voiceprint feature extraction method and device, electronic equipment and storage medium
CN112669820A (en) * 2020-12-16 2021-04-16 平安科技(深圳)有限公司 Examination cheating recognition method and device based on voice recognition and computer equipment

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847292B (en) * 2017-02-16 2018-06-19 平安科技(深圳)有限公司 Method for recognizing sound-groove and device
CN107993071A (en) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 Electronic device, auth method and storage medium based on vocal print
CN108305633B (en) * 2018-01-16 2019-03-29 平安科技(深圳)有限公司 Speech verification method, apparatus, computer equipment and computer readable storage medium
CN108777146A (en) * 2018-05-31 2018-11-09 平安科技(深圳)有限公司 Speech model training method, method for distinguishing speek person, device, equipment and medium
CN109192202B (en) * 2018-09-21 2023-05-16 平安科技(深圳)有限公司 Voice safety recognition method, device, computer equipment and storage medium
CN110335608B (en) * 2019-06-17 2023-11-28 平安科技(深圳)有限公司 Voiceprint verification method, voiceprint verification device, voiceprint verification equipment and storage medium
CN111128160B (en) * 2019-12-19 2024-04-09 中国平安财产保险股份有限公司 Receipt modification method and device based on voice recognition and computer equipment
CN111370032B (en) * 2020-02-20 2023-02-14 厦门快商通科技股份有限公司 Voice separation method, system, mobile terminal and storage medium
CN111695352A (en) * 2020-05-28 2020-09-22 平安科技(深圳)有限公司 Grading method and device based on semantic analysis, terminal equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080120109A1 (en) * 2006-11-16 2008-05-22 Institute For Information Industry Speech recognition device, method, and computer readable medium for adjusting speech models with selected speech data
CN107610707A (en) * 2016-12-15 2018-01-19 平安科技(深圳)有限公司 A kind of method for recognizing sound-groove and device
CN107368724A (en) * 2017-06-14 2017-11-21 广东数相智能科技有限公司 Anti- cheating network research method, electronic equipment and storage medium based on Application on Voiceprint Recognition
CN108922515A (en) * 2018-05-31 2018-11-30 平安科技(深圳)有限公司 Speech model training method, audio recognition method, device, equipment and medium
CN111833884A (en) * 2020-05-27 2020-10-27 北京三快在线科技有限公司 Voiceprint feature extraction method and device, electronic equipment and storage medium
CN112669820A (en) * 2020-12-16 2021-04-16 平安科技(深圳)有限公司 Examination cheating recognition method and device based on voice recognition and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253474A (en) * 2023-07-06 2023-12-19 北京梦见星科技有限公司 Online examination cheating behavior detection system and detection method based on voice recognition
CN117253474B (en) * 2023-07-06 2024-02-13 北京梦见星科技有限公司 Online examination cheating behavior detection system and detection method based on voice recognition

Also Published As

Publication number Publication date
CN112669820B (en) 2023-08-04
CN112669820A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
WO2022127042A1 (en) Examination cheating recognition method and apparatus based on speech recognition, and computer device
US10176811B2 (en) Neural network-based voiceprint information extraction method and apparatus
CN109215632B (en) Voice evaluation method, device and equipment and readable storage medium
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
Morrison A comparison of procedures for the calculation of forensic likelihood ratios from acoustic–phonetic data: Multivariate kernel density (MVKD) versus Gaussian mixture model–universal background model (GMM–UBM)
WO2017133165A1 (en) Method, apparatus and device for automatic evaluation of satisfaction and computer storage medium
TWI396184B (en) A method for speech recognition on all languages and for inputing words using speech recognition
CN111311327A (en) Service evaluation method, device, equipment and storage medium based on artificial intelligence
CN108520753B (en) Voice lie detection method based on convolution bidirectional long-time and short-time memory network
CN112233698B (en) Character emotion recognition method, device, terminal equipment and storage medium
Darabkh et al. An efficient speech recognition system for arm‐disabled students based on isolated words
CN108766415B (en) Voice evaluation method
Kelley et al. A comparison of four vowel overlap measures
CN113243918B (en) Risk detection method and device based on multi-mode hidden information test
JP2021152682A (en) Voice processing device, voice processing method and program
CN112017694A (en) Voice data evaluation method and device, storage medium and electronic device
Nirjon et al. sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices--a feasibility study
Khanna et al. Application of vector quantization in emotion recognition from human speech
US20230368777A1 (en) Method And Apparatus For Processing Audio, Electronic Device And Storage Medium
JP5091202B2 (en) Identification method that can identify any language without using samples
JP6996627B2 (en) Information processing equipment, control methods, and programs
CN110675858A (en) Terminal control method and device based on emotion recognition
Herrera-Camacho et al. Design and testing of a corpus for forensic speaker recognition using MFCC, GMM and MLE
Hughes et al. Variability in analyst decisions during the computation of numerical likelihood ratios
CN114822557A (en) Method, device, equipment and storage medium for distinguishing different sounds in classroom

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21904977

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21904977

Country of ref document: EP

Kind code of ref document: A1