CN116778422A - Unmanned invigilation method, device, equipment and computer readable storage medium - Google Patents

Unmanned invigilation method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN116778422A
CN116778422A CN202310797564.1A CN202310797564A CN116778422A CN 116778422 A CN116778422 A CN 116778422A CN 202310797564 A CN202310797564 A CN 202310797564A CN 116778422 A CN116778422 A CN 116778422A
Authority
CN
China
Prior art keywords
cheating
image
unmanned
audio data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310797564.1A
Other languages
Chinese (zh)
Inventor
郭小璇
朱赵虎
李鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Tosun Intelligent Technology Inc
Original Assignee
Qingdao Tosun Intelligent Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Tosun Intelligent Technology Inc filed Critical Qingdao Tosun Intelligent Technology Inc
Priority to CN202310797564.1A priority Critical patent/CN116778422A/en
Publication of CN116778422A publication Critical patent/CN116778422A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Strategic Management (AREA)
  • Educational Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The application discloses an unmanned invigilation method, device and equipment and a computer readable storage medium, which belong to the field of data analysis and are used for invigilating by utilizing audio and video data analysis. The processor in the application can firstly acquire video data and audio data of the examination site in the examination process in real time, and can detect the cheating behavior in the examination process by detecting voice keywords for the audio data, and can detect the cheating behavior in the examination process by the image recognition technology for the video data, and can process the real-time data due to the application of the processor, thereby reducing the labor cost and improving the working efficiency.

Description

Unmanned invigilation method, device, equipment and computer readable storage medium
Technical Field
The application relates to the field of data analysis, in particular to an unmanned invigilation method, and also relates to an unmanned invigilation device, equipment and a computer readable storage medium.
Background
Because of the existence of various objective conditions, the requirement of carrying out the test in an unmanned test-monitoring mode is more and more increased, the test is carried out usually in a single test scene, and under the test mode, how to carry out reliable test is a difficult problem.
Therefore, how to provide a solution to the above technical problem is a problem that a person skilled in the art needs to solve at present.
Disclosure of Invention
The application aims to provide an unmanned invigilation method which is applied to a processor and can process real-time audio and video data so as to realize automatic invigilation, thereby reducing the labor cost and improving the working efficiency; another object of the present application is to provide an unmanned invigilation apparatus, device and computer readable storage medium, which can process real-time audio and video data to realize automatic invigilation, thereby reducing labor cost and improving work efficiency.
In order to solve the technical problems, the application provides an unmanned invigilation method, which is applied to a processor and comprises the following steps:
acquiring video data and audio data of an examination site in the examination process in real time;
based on the audio data, detecting cheating behaviors in the examination process by detecting voice keywords;
based on the video data, detecting cheating behaviors in the examination process through an image recognition technology;
and the detection result of any one time is independently effective, wherein the detection of cheating based on the audio data and the detection of cheating based on the video data are performed.
Preferably, the detecting cheating behavior in the examination process by the image recognition technology based on the video data specifically includes:
preprocessing before feature extraction is carried out on images in the video data;
extracting image characteristics of the preprocessed image;
identifying a preset target in an image through the image characteristics and obtaining a predicted target;
judging whether a specified cheating object exists in the prediction target;
if so, judging that cheating is possible;
judging whether the predicted target contains a preset exam face image or not;
if the data is not contained, judging that cheating is possible;
the preset targets comprise the appointed cheating articles and the preset face images of the students.
Preferably, the preset target further comprises a human body part;
before judging whether the specified cheating object exists in the prediction target, the unmanned invigilation method further comprises the following steps:
judging whether the human body part in the predicted target belongs to a plurality of owners or not;
if the system belongs to a plurality of owners, judging that cheating is possible;
and if the target is not attributed to a plurality of owners, executing the step of judging whether the appointed cheating article exists in the prediction target.
Preferably, the preset target further comprises two hands;
after the preset target in the image is identified and the predicted target is obtained through the image characteristics, the unmanned invigilation method further comprises the following steps:
judging whether the two hands in the predicted target are positioned in the examination limiting area or not;
if not, it is determined that there is a possibility of cheating.
Preferably, after determining whether the predicted target includes the preset face image of the candidate, the unmanned invigilation method further includes:
determining coordinates of five key points of the face image of the examinee in the prediction target in a scale-aware thermodynamic diagram mode;
determining the face deflection angle through the coordinates of the five key points;
judging whether the face deflection angle is larger than a preset threshold value or not;
if the data is larger than the threshold value, judging that cheating is possible.
Preferably, the detecting the cheating behavior in the examination process by detecting the voice keyword based on the audio data specifically includes:
preprocessing the audio data so as to strengthen voice characteristics and remove redundant data;
extracting voice characteristics of voice frames in the preprocessed audio data through mel-frequency cepstrum coefficients;
performing feature fusion on the voice features corresponding to the continuous preset number of voice frames in the audio data to obtain voice features to be recognized;
judging whether a preset keyword exists in the voice feature to be identified;
if so, it is determined that there is a possibility of cheating.
Preferably, before the video data and the audio data of the examination site in the examination process are acquired in real time, the unmanned invigilation method further includes:
acquiring a target image of a handheld identity card of an examinee;
acquiring face information in the target image, signing the face information as a test taker, and acquiring an identity card portrait and character information on an identity card in the target image;
judging whether the text information is consistent with the pre-recorded examinee information;
if the test is inconsistent, terminating the test;
if so, judging whether the identity card portrait, the pre-recorded examinee face and the examinee sign-in face correspond to the same person;
if the video data and the audio data of the examination site in the examination process are corresponding to the same person, executing the step of acquiring the video data and the audio data of the examination site in real time;
if the test does not correspond to the same person, the test is terminated.
In order to solve the technical problem, the application also provides an unmanned invigilation device, which is applied to a processor and comprises:
the acquisition module is used for acquiring video data and audio data of an examination site in the examination process in real time;
the voice detection module is used for detecting cheating behaviors in the examination process by detecting voice keywords based on the audio data;
the video detection module is used for detecting cheating behaviors in the examination process through an image recognition technology based on the video data;
and the detection result of any one time is independently effective, wherein the detection of cheating based on the audio data and the detection of cheating based on the video data are performed.
In order to solve the technical problem, the application also provides unmanned invigilation equipment, which comprises:
a memory for storing a computer program;
and a processor for implementing the steps of the unmanned invigilation method described above when executing the computer program.
To solve the above technical problem, the present application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the unmanned invigilation method as described above.
The application provides an unmanned invigilation method, which considers that a processor can perform cheating behavior identification based on various image processing technologies and combines with consideration that in an examination site, an examinee can perform cheating in a voice message mode, so that the processor in the application can firstly acquire video data and audio data of the examination site in the examination process in real time, can detect the cheating behavior in the examination process in a mode of detecting voice keywords for the audio data, can detect the cheating behavior in the examination process for the video data by an image identification technology, and can process the real-time data due to the fact that the processor is applied to the processor, thereby reducing the labor cost and improving the working efficiency.
The application also provides an unmanned invigilation device, equipment and a computer readable storage medium, which have the same beneficial effects as the unmanned invigilation method.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required in the prior art and the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of an unmanned invigilation method provided by the application;
FIG. 2 is a schematic flow chart of another method of unmanned invigilation provided by the application;
FIG. 3 is a schematic flow chart of image object recognition according to the present application;
FIG. 4 is a schematic diagram of a flow chart of a twist detection provided by the present application;
FIG. 5 is a schematic flow chart of feature extraction of a single-frame speech frame according to the present application;
FIG. 6 is a schematic diagram of a delayed neural network with frame duration 2 according to the present application;
FIG. 7 is a schematic flow chart of MFCC feature extraction provided by the present application;
FIG. 8 is a schematic diagram of a flow chart of identity card OCR recognition provided by the application;
FIG. 9 is a schematic flow chart of face comparison provided by the application;
fig. 10 is a schematic structural diagram of an unmanned invigilation device provided by the application;
fig. 11 is a schematic structural diagram of an unmanned invigilation identification provided by the application.
Detailed Description
The core of the application is to provide an unmanned invigilation method which is applied to a processor and can process real-time audio and video data so as to realize automatic invigilation, thereby reducing the labor cost and improving the working efficiency; the application further provides an unmanned invigilation device, equipment and a computer readable storage medium, which are applied to a processor and can process real-time audio and video data so as to realize automatic invigilation, thereby reducing labor cost and improving working efficiency.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, fig. 1 is a schematic flow chart of an unmanned invigilation method provided by the application, and the unmanned invigilation method includes:
s101: acquiring video data and audio data of an examination site in the examination process in real time;
specifically, in consideration of the technical problems in the background art, the fact that the processor can conduct cheating behavior recognition based on various image processing technologies is combined, and in consideration of the fact that in an examination site, an examinee can conduct cheating through voice messages is combined, so that the method and the device are used for automatically recognizing cheating conditions based on processing of audio and video data acquired by the processor in real time from the examination site, and video data and audio data of the examination site in the examination process can be acquired in real time.
Whether students take after home examination, social examination or staff in enterprises, most of the examination methods are in an on-line examination or unmanned monitoring self-service equipment examination mode. However, fairness and standardization of the test set great requirements for anti-cheating mechanisms of the unmanned prison, and the application can be applied to various examination scenes of the unmanned prison, such as single examination or multi-person examination, and the embodiment of the application is not limited herein.
Specifically, the video data and the audio data may be acquired in multiple manners, for example, the video data and the audio data may be acquired by using a camera with a pickup, and the number of the cameras may be flexibly set, for example, a plurality of cameras may be set in different directions, so as to form a dead-angle-free camera.
S102: based on the audio data, detecting cheating behaviors in the examination process by detecting voice keywords;
specifically, in consideration of the fact that an examinee may cheat by means of voice information transmission in the examination process, for example, cheat by means of voice sent by other cheating devices on the examination site, or cheat by means of speaking by the examinee and a person outside the camera shooting picture, the cheat mode is often ignored, so that in the embodiment of the application, cheat behaviors in the examination process can be detected by means of detecting voice keywords based on audio data.
The keywords may be flexibly set, for example, a "help" keyword, a "check" keyword, etc. may be set, which is not limited herein.
In addition, considering that the content required to be spoken is not involved in the examination process, even when any person is speaking, the captured sound can be judged that cheating is possible, that is, the examination is not allowed to speak.
S103: based on the video data, detecting cheating behaviors in the examination process through an image recognition technology;
wherein, the cheating detection based on the audio data and the cheating detection based on the video data take effect independently.
Specifically, considering that the video data is the data type containing the most information of the examination site, most types of cheating behaviors can be distinguished through the video data, and particularly when some cheating articles or cheating behaviors are involved, the cheating behaviors in the examination process can be rapidly and accurately identified through the video data, so that the cheating behaviors in the examination process can be detected through an image identification technology based on the video data.
The detection results of the two detection modes are not in a dependency relationship and have no conflict, and the detection results of any one time can be independently effective, for example, when the detection of the cheating based on the audio data judges that the cheating is possible, an alarm can be given or even an examination can be terminated, and the detection results of the other type (based on the video data) of cheating detection are not considered.
The application provides an unmanned invigilation method, which considers that a processor can perform cheating behavior identification based on various image processing technologies and combines with consideration that in an examination site, an examinee can perform cheating in a voice message mode, so that the processor in the application can firstly acquire video data and audio data of the examination site in the examination process in real time, can detect the cheating behavior in the examination process in a mode of detecting voice keywords for the audio data, can detect the cheating behavior in the examination process for the video data by an image identification technology, and can process the real-time data due to the fact that the processor is applied to the processor, thereby reducing the labor cost and improving the working efficiency.
Based on the above embodiments:
as a preferred embodiment, the detection of cheating behavior in the examination process by image recognition technology based on video data is specifically:
preprocessing before feature extraction is carried out on images in video data;
extracting image characteristics of the preprocessed image;
identifying a preset target in an image through image characteristics and obtaining a predicted target;
judging whether a specified cheating object exists in the prediction target;
if so, judging that cheating is possible;
judging whether a predicted target contains a preset exam face image or not;
if the data is not contained, judging that cheating is possible;
the preset targets comprise appointed cheating articles and preset face images of the students.
Specifically, for better explanation of the embodiments of the present application, please refer to fig. 2 and fig. 3, and fig. 2 is a schematic flow chart of another method for unmanned invigilation provided by the present application; fig. 3 is a schematic flow chart of image object recognition provided by the present application, in fig. 3, preprocessing may include framing of a video frame, size conversion, normal (image normalization), to Tensor (conversion To Tensor format), the video frame is framed To obtain a single frame image, the size conversion of the Resize is firstly performed To adjust the image To a unified size of a subsequent network input, then the normalization of the normal image is performed To facilitate subsequent data processing, and then the To Tensor normalizes various possible format input images and converts them To a Tensor format To facilitate subsequent network reasoning. And the extraction of the image features can be as follows: the preprocessed image firstly utilizes the CSPDarknet53 network to extract image characteristics, and the CSPDarknet53 can rapidly extract the image characteristics. After the feature extraction, a multi-size feature fusion image can be obtained by adopting a feature fusion mode, so that the recognition performance of the network is improved, namely, a multi-scale feature fusion image is obtained by adopting a Neck feature fusion method in FIG. 3; the identifying prediction target is prediction and post-processing in the graph, that is, the prediction information such as the position, the category, the confidence and the like of the prediction target is obtained through the prediction of the detector, and the post-processing can be: and screening part of cross redundant information through Non-maximum suppression (NMS for short) to obtain and output the final prediction target information, thereby reducing interference information.
Specifically, various cheating articles or cheating behaviors can be flexibly identified through setting the preset targets, for example, the preset targets can be various cheating articles, and the situation that the examinee cannot leave in the examination process is considered, so that the preset examinee face image can be added into the preset targets, and whether the examinee leaves or not can be judged through detecting the preset examinee face image.
Specifically, the judging whether the predicted target includes the preset face image of the candidate may specifically be: judging whether the predicted target contains a human face, if so, comparing whether the human face contained in the predicted target and a preset exam face pre-recorded by a system are the same person or not in a human face comparison mode, if so, judging that the predicted target contains a preset exam face image, and judging that cheating is possible if the predicted target does not contain the human face or contains a face image different from the preset exam face at the moment when the examination is normal.
The face image may be obtained by a specially configured camera for directly examining the face direction, and other images may also be obtained by other cameras, which is not limited herein.
The cheating articles may be of various types, and may include, for example, a mobile phone, a small handphone, a bracelet, an earphone, etc., and the embodiments of the present application are not limited herein.
Of course, the preset target may be various other types besides the above, and the embodiments of the present application are not limited herein.
As a preferred embodiment, the preset target further comprises a human body part;
before judging whether the appointed cheating article exists in the prediction target, the unmanned invigilation method further comprises the following steps:
judging whether human body parts in the prediction target belong to a plurality of owners or not;
if the system belongs to a plurality of owners, judging that cheating is possible;
if the target is not owned by a plurality of owners, a step of judging whether the specified cheating object exists in the prediction target is executed.
Specifically, considering that most of the scenes of the unmanned invigilation are single examination scenes, under the scenes, the situation that multiple persons are prohibited to appear on the examination scene at the same time is needed to be avoided, so that the human body parts are added into the preset targets in the embodiment of the application.
In some cases, in order to prevent the found multiple persons from being present in the examination room, the cheater may only expose a certain part (for example, a hand) of the body to the video image, and in such cases, if the person faces are wanted to identify that the multiple persons are present in the examination room, it is not realistic, so that the accuracy of the multiple person detection can be improved by the "multiple person detection" performed by the human body part in the embodiment of the present application.
As a preferred embodiment, the preset target further comprises two hands;
after the preset target in the image is identified through the image characteristics and the predicted target is obtained, the unmanned invigilation method further comprises the following steps:
judging whether the two hands in the predicted target are positioned in the examination limiting area or not;
if not, it is determined that there is a possibility of cheating.
Specifically, the detection for the hands may be performed by a separately provided camera for the directions of the hands, which is not limited herein.
Specifically, considering that under the normal condition, the hands of the examinee are in a certain area in the examination process, if the hands move out of the specific area, the possibility of cheating is considered, so that in the embodiment of the application, the hands can be set as a preset target, whether the hands in the predicted target are positioned in the examination limiting area can be judged, and under the condition that the hands are not positioned in the limiting area, the possibility of cheating can be judged.
The examination limiting area may be an area such as an examination console, and the embodiment of the application is not limited herein.
As a preferred embodiment, after determining whether the predicted target includes the preset face image of the candidate, the unmanned invigilation method further includes:
determining coordinates of five key points of the face image of the examinee in the prediction target in a scale-aware thermodynamic diagram mode;
determining the face deflection angle through the coordinates of the five key points;
judging whether the face deflection angle is larger than a preset threshold value or not;
if the data is larger than the threshold value, judging that cheating is possible.
Specifically, considering that the examinee does not twist the head at a large angle under normal conditions in the examination process, and the large-angle twist may generate a cheating behavior, the embodiment of the application considers the detection of the large-angle twist behavior, and firstly mentions that in the face detection process, the large-angle twist can cause the failure of face comparison, so that the face comparison can substantially prevent the examinee from generating the large-angle twist behavior.
Specifically, for better explaining the embodiment of the present application, please refer to fig. 4, fig. 4 is a schematic flow chart of the detection of the twist provided by the present application, which includes steps of face extraction, SSH five key point determination, angle calculation, etc., wherein SSH (Scale-Sensitive Heatmap, scale-aware thermodynamic diagram) is a method for determining five key points of a face image, and by the method in the embodiment of the present application, the detection of the twist can be rapidly and accurately performed.
The preset threshold may be set autonomously, which is not limited in the embodiment of the present application.
Specifically, the five key points are the center of the left eye, the center of the right eye, the center of the tip of the nose, the left mouth corner and the right mouth corner respectively. The geometric angle based on five key points, adopted by the angle of the torsion head, is calculated, and the calculation is performedThe calculation formula is shown as follows, y in the formula l An ordinate representing the center of the left eye, y n Representing the ordinate, y, of the tip of the nose r An ordinate, x, representing the center of the right eye l X represents the abscissa of the center of the left eye n Representing the abscissa of the tip of the nose, x r An abscissa representing the center of the right eye, arctan representing an arctangent function, θ representing the calculated face deflection angle,
of course, the twist detection may be performed in other ways besides this, and embodiments of the present application are not limited herein.
As a preferred embodiment, the detection of cheating behavior in the examination process by detecting the voice keyword based on the audio data is specifically:
preprocessing the audio data to enhance the speech characteristics and remove redundant data;
extracting voice characteristics of voice frames in the preprocessed audio data through the mel cepstrum coefficient;
carrying out feature fusion on voice features corresponding to a continuous preset number of voice frames in the audio data to obtain voice features to be recognized;
judging whether preset keywords exist in the voice features to be identified;
if so, it is determined that there is a possibility of cheating.
Specifically, for better explaining the embodiments of the present application, please refer to fig. 5, fig. 6 and fig. 7, fig. 5 is a schematic flow chart of single-frame speech frame feature extraction provided by the present application; fig. 6 is a schematic structural diagram of a delayed neural network with a frame duration of 2, fig. 7 is a schematic flow chart of MFCC feature extraction, fig. 5 illustrates processing of a speech frame in audio data, and black dots of an input layer represent speech features of the frame. The hidden layer is to extract the characteristics of the voice, and then obtain the corresponding characters of the frame voice through a discriminator of the recognition network. Meanwhile, in order to improve the recognition effect, it is considered to use features combined with multiple frames (continuous preset number of voice frames) for recognition, for example, in fig. 6, the frame is adopted to continue to be 2, that is, voice features combined with continuous 3 frames (preset number) simultaneously. The characteristic fusion process of multi-frame voice can be performed by adopting a time-delay neural network, so that the accuracy of voice recognition can be improved.
Specifically, the audio preprocessing is first performed after the audio sequence (i.e., the audio data) is acquired. The preprocessing of the audio includes digitizing, pre-emphasis, framing, windowing, etc. the audio data, making the speech features more pronounced and removing redundant data. The preprocessed audio is subjected to speech feature extraction, and Mel-frequency cepstral coefficient (Mel-scaleFrequency Cepstral Coefficients, MFCC for short) is generally adopted. As shown in fig. 4, the preprocessed audio is first converted from the time domain to the frequency domain by FFT (fast fourier transform), and then passes through a Mel filter bank to obtain a Mel energy spectrum simulating human ear, which is dense at low frequency and sparse at high frequency. The speech convolution signal is then converted to an additive signal by logarithmic operation and the difference at low frequencies is amplified. Finally, the required channel information is separated from the mixed information of the fundamental tone information and the channel information through discrete cosine transform (Discrete Cosine Transform, DCT for short), and the MFCC static characteristic of the voice can be obtained. Finally, the dynamic and static characteristics are required to be combined to improve the identification performance. Whereas the dynamics of speech can be described by the differential spectrum of the static features obtained above. Combining the static features and the dynamic features can obtain the MFCC features.
Of course, other methods may be used to extract the speech features besides MFCC, and embodiments of the present application are not limited herein.
As a preferred embodiment, before acquiring video data and audio data of an examination site in real time during an examination, the unmanned invigilation method further includes:
acquiring a target image of a handheld identity card of an examinee;
acquiring face information in a target image, signing the face information as a test taker, and acquiring an identity card portrait and character information on an identity card in the target image;
judging whether the text information is consistent with the pre-recorded examinee information;
if the test is inconsistent, terminating the test;
if so, judging whether the identity card face, the prerecorded examinee face and the check-in face correspond to the same person;
if the video data and the audio data correspond to the same person, executing the step of acquiring the video data and the audio data of the examination site in the examination process in real time;
if the test does not correspond to the same person, the test is terminated.
Specifically, in order to prevent the occurrence of a tilmicosin action, strict identity authentication of an examinee can be considered before formally starting an examination, in the embodiment of the application, the accurate identity authentication can be performed through the target image of the examinee holding the identity card, on one hand, two figures of the examinee sign-in face and the identity card figure can be obtained from the target image, character information in the identity card can be obtained, then the character information can be subjected to consistency comparison with pre-recorded examinee information, after the identity card figure, the pre-recorded examinee face and the examinee sign-in face can be subjected to consistency comparison, and after the identity card figure and the three figures of the examinee face can be accurately compared, the identity of the examinee can be authenticated, and the occurrence of a tilmicosin condition can be prevented.
For better explaining the embodiments of the present application, please refer to fig. 8 and 9, fig. 8 is a schematic flow chart of OCR recognition of an identity card provided by the present application; FIG. 9 is a schematic flow chart of face comparison provided by the application; the OCR (Optical Character Recognition ) recognition in fig. 8 is used for recognizing the text information of the identification card, as shown in fig. 2, after the image cut by the identification card is obtained, the orientation of the image of the identification card may be corrected to facilitate subsequent detection, for example, whether the identification card is judged to be at an angle of 0 ° and 180 ° according to the relative positions extracted from the identification card and the image of the identification card, and when the image is on the left side of the identification card, the identification card needs to be corrected by 180 ° rotation. Meanwhile, the portrait is also subjected to 180-degree rotation correction so as to facilitate subsequent face comparison, the corrected identity card firstly utilizes a DBNet (Deep Bidirectional network ) network to identify a text region, and then utilizes a CRNN (Convolutional Recurrent Neural Network ) network to identify text information after the text region is obtained, so that information on the identity card is obtained.
For the face comparison detection before the beginning of the examination, the face comparison mode shown in fig. 9 is adopted for any two faces in the face comparison detection, namely, the face comparison mode is adopted for the face (prerecorded face) in the examinee information, the identity card face and the face sign face (checked-in face of the examinee). As shown in fig. 9, the input face image is first scaled to a fixed size, and then subjected to brightness equalization and image normalization processing, so that the convolutional neural network CNN (Convolutional Neural Network ) can learn the features better. And inputting the preprocessed face image into the CNN, and extracting the high-dimensional feature vector of the face image through multi-layer convolution and pooling operation. And carrying out L2 normalization processing on the extracted feature vectors, namely dividing the vectors by the module length of the vectors so that the distances among different faces are comparable. And respectively extracting corresponding feature vectors of the two face images, and calculating the distance between the two vectors by using cosine similarity. If the distance between the two vectors is smaller than a preset threshold value, the same face is judged, otherwise, different faces are judged.
Referring to fig. 10, fig. 10 is a schematic structural diagram of an unmanned invigilation device provided by the present application, where the unmanned invigilation device is applied to a processor, and includes:
the acquisition module 101 is used for acquiring video data and audio data of an examination site in the examination process in real time;
the voice detection module 102 is used for detecting cheating behaviors in the examination process by detecting voice keywords based on the audio data;
the video detection module 103 is used for detecting cheating behaviors in the examination process through an image recognition technology based on video data;
wherein, the cheating detection based on the audio data and the cheating detection based on the video data take effect independently.
For the description of the unmanned invigilation device provided in the embodiment of the present application, please refer to the embodiment of the unmanned invigilation method described above, and the embodiment of the present application is not limited herein.
Referring to fig. 11, fig. 11 is a schematic structural diagram of an unmanned invigilation device provided by the present application, where the unmanned invigilation device includes:
a memory 111 for storing a computer program;
a processor 112 for implementing the steps of the unmanned invigilation method in the previous embodiment when executing a computer program.
For the introduction of the unmanned invigilation device provided by the embodiment of the present application, please refer to the embodiment of the unmanned invigilation method described above, and the embodiment of the present application is not limited herein.
The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of unmanned invigilation as in the previous embodiments.
For the introduction of the unmanned invigilation device provided by the embodiment of the present application, please refer to the embodiment of the unmanned invigilation method described above, and the embodiment of the present application is not limited herein.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An unmanned invigilation method, applied to a processor, comprising:
acquiring video data and audio data of an examination site in the examination process in real time;
based on the audio data, detecting cheating behaviors in the examination process by detecting voice keywords;
based on the video data, detecting cheating behaviors in the examination process through an image recognition technology;
and the detection result of any one time is independently effective, wherein the detection of cheating based on the audio data and the detection of cheating based on the video data are performed.
2. The method for unmanned invigilating the test according to claim 1, wherein the detecting the cheating behavior in the test process by the image recognition technology based on the video data is specifically:
preprocessing before feature extraction is carried out on images in the video data;
extracting image characteristics of the preprocessed image;
identifying a preset target in an image through the image characteristics and obtaining a predicted target;
judging whether a specified cheating object exists in the prediction target;
if so, judging that cheating is possible;
judging whether the predicted target contains a preset exam face image or not;
if the data is not contained, judging that cheating is possible;
the preset targets comprise the appointed cheating articles and the preset face images of the students.
3. The method of unmanned invigilation of claim 2, wherein the predetermined target further comprises a human body part;
before judging whether the specified cheating object exists in the prediction target, the unmanned invigilation method further comprises the following steps:
judging whether the human body part in the predicted target belongs to a plurality of owners or not;
if the system belongs to a plurality of owners, judging that cheating is possible;
and if the target is not attributed to a plurality of owners, executing the step of judging whether the appointed cheating article exists in the prediction target.
4. The method for unmanned invigilating in accordance with claim 3, wherein said predetermined target further comprises two hands;
after the preset target in the image is identified and the predicted target is obtained through the image characteristics, the unmanned invigilation method further comprises the following steps:
judging whether the two hands in the predicted target are positioned in the examination limiting area or not;
if not, it is determined that there is a possibility of cheating.
5. The method for unmanned invigilating in claim 2, wherein after determining whether the predicted target includes the preset face image, the method further comprises:
determining coordinates of five key points of the face image of the examinee in the prediction target in a scale-aware thermodynamic diagram mode;
determining the face deflection angle through the coordinates of the five key points;
judging whether the face deflection angle is larger than a preset threshold value or not;
if the data is larger than the threshold value, judging that cheating is possible.
6. The method for unmanned invigilating the test according to claim 1, wherein the method for detecting the cheating behavior in the test process by detecting the voice keyword based on the audio data comprises the following steps:
preprocessing the audio data so as to strengthen voice characteristics and remove redundant data;
extracting voice characteristics of voice frames in the preprocessed audio data through mel-frequency cepstrum coefficients;
performing feature fusion on the voice features corresponding to the continuous preset number of voice frames in the audio data to obtain voice features to be recognized;
judging whether a preset keyword exists in the voice feature to be identified;
if so, it is determined that there is a possibility of cheating.
7. The method of unmanned invigilation of any one of claims 1 to 6, wherein prior to said acquiring video data and audio data of the test site during the test in real time, the method further comprises:
acquiring a target image of a handheld identity card of an examinee;
acquiring face information in the target image, signing the face information as a test taker, and acquiring an identity card portrait and character information on an identity card in the target image;
judging whether the text information is consistent with the pre-recorded examinee information;
if the test is inconsistent, terminating the test;
if so, judging whether the identity card portrait, the pre-recorded examinee face and the examinee sign-in face correspond to the same person;
if the video data and the audio data of the examination site in the examination process are corresponding to the same person, executing the step of acquiring the video data and the audio data of the examination site in real time;
if the test does not correspond to the same person, the test is terminated.
8. An unmanned invigilator, applied to a processor, comprising:
the acquisition module is used for acquiring video data and audio data of an examination site in the examination process in real time;
the voice detection module is used for detecting cheating behaviors in the examination process by detecting voice keywords based on the audio data;
the video detection module is used for detecting cheating behaviors in the examination process through an image recognition technology based on the video data;
and the detection result of any one time is independently effective, wherein the detection of cheating based on the audio data and the detection of cheating based on the video data are performed.
9. An unmanned invigilation device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the unmanned invigilation method of any one of claims 1 to 7 when said computer program is executed.
10. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the method for unmanned invigilation of any of claims 1 to 7.
CN202310797564.1A 2023-06-30 2023-06-30 Unmanned invigilation method, device, equipment and computer readable storage medium Pending CN116778422A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310797564.1A CN116778422A (en) 2023-06-30 2023-06-30 Unmanned invigilation method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310797564.1A CN116778422A (en) 2023-06-30 2023-06-30 Unmanned invigilation method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116778422A true CN116778422A (en) 2023-09-19

Family

ID=88011427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310797564.1A Pending CN116778422A (en) 2023-06-30 2023-06-30 Unmanned invigilation method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116778422A (en)

Similar Documents

Publication Publication Date Title
JP7109634B2 (en) Identity authentication method and device
US10878824B2 (en) Speech-to-text generation using video-speech matching from a primary speaker
CN109726624B (en) Identity authentication method, terminal device and computer readable storage medium
US10109277B2 (en) Methods and apparatus for speech recognition using visual information
WO2019210796A1 (en) Speech recognition method and apparatus, storage medium, and electronic device
WO2018018906A1 (en) Voice access control and quiet environment monitoring method and system
JP2001092974A (en) Speaker recognizing method, device for executing the same, method and device for confirming audio generation
CN109448759A (en) A kind of anti-voice authentication spoofing attack detection method based on gas explosion sound
JP7412496B2 (en) Living body (liveness) detection verification method, living body detection verification system, recording medium, and training method for living body detection verification system
CN111881726A (en) Living body detection method and device and storage medium
CN108877787A (en) Audio recognition method, device, server and storage medium
CN109829691B (en) C/S card punching method and device based on position and deep learning multiple biological features
CN111341350A (en) Man-machine interaction control method and system, intelligent robot and storage medium
CN112286364A (en) Man-machine interaction method and device
CN111739534A (en) Processing method and device for assisting speech recognition, electronic equipment and storage medium
Tsai et al. Sentiment analysis of pets using deep learning technologies in artificial intelligence of things system
CN116631380B (en) Method and device for waking up audio and video multi-mode keywords
JP6916130B2 (en) Speaker estimation method and speaker estimation device
CN116778422A (en) Unmanned invigilation method, device, equipment and computer readable storage medium
KR20230013236A (en) Online Test System using face contour recognition AI to prevent the cheating behaviour by using speech recognition and method thereof
US11238289B1 (en) Automatic lie detection method and apparatus for interactive scenarios, device and medium
Bredin et al. Making talking-face authentication robust to deliberate imposture
Hazen et al. Multimodal face and speaker identification for mobile devices
CN114512133A (en) Sound object recognition method, sound object recognition device, server and storage medium
CN112035639B (en) Intelligent automatic question answering robot system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination