CN115830517B

CN115830517B - Video-based examination room abnormal frame extraction method and system

Info

Publication number: CN115830517B
Application number: CN202310110116.XA
Authority: CN
Inventors: 夏迪
Original assignee: Jiangxi Yunyan Shijie Technology Co ltd
Current assignee: Jiangxi Yunyan Shijie Technology Co ltd
Priority date: 2023-02-14
Filing date: 2023-02-14
Publication date: 2023-06-13
Anticipated expiration: 2043-02-14
Also published as: CN115830517A

Abstract

The invention discloses an examination room abnormal frame extraction method and system based on video. According to the method, a first video frame of an examination room is obtained, examination room background data is generated according to the first video frame, then a plurality of desktops are captured from the examination room background data, a differential image is obtained through a differential algorithm, and an abnormal frame is marked according to edge data of a test paper and upper limb data of a target object. The method comprises the steps of presetting reference objects such as a desktop, an active interval, a forbidden interval and the like in background data of an examination room, and determining a plurality of abnormal frames through pixel relations between examination papers and target objects and the reference objects. The method can more accurately extract the abnormal frames in the examination, greatly reduces the workload and provides convenience for the follow-up identification of cheating behaviors.

Description

Video-based examination room abnormal frame extraction method and system

Technical Field

The invention relates to a video processing technology, in particular to an examination room abnormal frame extraction method and system based on video.

Background

There may be cheating actions in the examination that affect examination fairness, and with the popularity of multimedia education, the prior art hopes to replace on-site patrol by remote video. However, in the real examination room, because the monitoring scene is single, the monitoring cameras are all installed at the fixed positions of the classrooms, and the action range of abnormal behaviors such as cheating of examinees is small, so that remote monitoring staff are difficult to discover the abnormal behaviors in time. CN202010893771.3 describes that a convolutional neural network is adopted to determine the area where the pixels of the examinee are located, and the key parts of the examinee are tracked to complete the recognition of abnormal behaviors. However, in reality, cheating behaviors of examinees are various, difficulty in tracking all examinees in real time is high, recognition of pixels such as skin colors depends on light rays, and errors are high. The prior art needs a more accurate extraction method of abnormal information of examination rooms, and more accurately extracts video frames possibly having abnormal behaviors from video streams.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides a video-based examination room abnormal frame extraction method and system. According to the method, different reference objects are defined according to pixel characteristics of an examination room, images of the examination room are processed by using a difference method, and then an abnormal frame is extracted according to the relation between a target object and the reference object. The method can extract the abnormal frames more accurately, greatly reduces the workload and provides convenience for identifying cheating.

The technical scheme of the invention is realized as follows:

a video-based examination room abnormal frame extraction method comprises the following steps:

step 1: storing identity data and face data of a plurality of target objects corresponding to any examination room number;

step 2: before an examination starts, at least one first video frame of an examination room is obtained, and examination room background data is generated according to the first video frame;

step 3: capturing a plurality of desktops from the examination room background data, and determining an activity interval corresponding to any identity data and a forbidden interval corresponding to the examination room number according to desktop coordinates;

step 4: after the examination starts, extracting a plurality of second video frames from the video stream at intervals of t time, performing first differential operation on the second video frames and examination room background data to obtain differential images, and capturing target objects and test papers in the differential images;

step 5: extracting upper limb data and head data of a target object and edge data of a test paper, extracting real-time face data from the head data, and entering a step 6 if the real-time face data is matched with the face data;

step 6: if the edge data of the test paper is located outside the desktop or the upper limb data of the target object is located outside the active interval, marking the second video frame as a first abnormal frame;

step 7: extracting a plurality of third video frames from the video stream at intervals of 4t, performing second differential operation on any third video frame and two adjacent third video frames to obtain a binary image, and marking the third video frame as a second abnormal frame if forbidden intervals of the binary image are not communicated;

step 8: abnormal behavior in the first abnormal frame and the second abnormal frame is identified.

In the present invention, in step 2, pixels conforming to gaussian distribution of a first video frame are used as background pixels, and background pixels of a plurality of first video frames are combined to generate examination room background data.

In the present invention, in step 4, the pixel value of the second video frame pixel point (x, y) at time t is F _t (x, y) pixel value D of pixel point (x, y) in differential image _t (x,y)=|F _t (x, y) -M (x, y) |, M (x, y) is examination room background data.

In the invention, in step 5, if the real-time face data is not matched with the face data, an identity review notification is sent to the corresponding examination room.

In the present invention, in step 7, the second differential operation includes two sets of first differential operations, wherein the pixel value F of the pixel point (x, y) of the third video frame at 4 (t-1) times _4(t-1) Pixel value F of (x, y) and (x, y) of the third video frame pixel at 4t time _4t (x, y) obtaining a first intermediate image R after a first differential operation and a binary operation ₁ (x, y), the pixel value F of the third video frame pixel point (x, y) at 4t time _4t Pixel value F of (x, y) and (t+1) time third video frame pixel point (x, y) _4(t+1) (x, y) obtaining a second intermediate image R after the first differential operation and the binary operation ₂ (x, y), binary image

。

In the present invention,

，

，T ₂ is the binarization threshold for the second differential operation.

In the invention, in step 8, a plurality of first and second abnormal frames are respectively sent to a feature recognition network, and a parameter transfer channel of a feature extraction network from the first abnormal frame to the second abnormal frame is established, and abnormal behaviors in the first and second abnormal frames are recognized based on the feature recognition network.

In the invention, the length of the period t is 0.2s, and the identity data is an identity card number.

An examination room abnormal frame extraction system according to the video-based examination room abnormal frame extraction method, comprising: an object database, a camera shooting unit, a model database, a frame processing unit and a frame identification unit, wherein,

the object database is used for storing identity data and face data of a plurality of target objects;

the camera shooting unit is used for acquiring a video stream of the examination room;

the model database is used for storing desktop coordinates, an active interval and a forbidden interval in the video stream;

the frame processing unit is used for extracting a second video frame and a third video frame from the video stream and marking a first abnormal frame and a second abnormal frame;

the frame identification unit comprises an input layer, a characteristic identification network, a self-adaptive space-time fusion module and a prediction module, wherein the self-adaptive space-time fusion module is connected with the characteristic identification network of the first abnormal frame and the second abnormal frame.

The video-based examination room abnormal frame extraction method and system have the following beneficial effects: the method comprises the steps of presetting reference objects such as a desktop, an active interval and a forbidden interval in the background data of the examination room, determining a plurality of abnormal frames through the pixel relation between the examination paper and the target object and the reference objects, and improving the efficiency of extracting abnormal behaviors from the video stream. And background modeling is carried out by adopting mixed Gaussian, so that the influence of factors such as environment and the like on the background data of the examination room is reduced. On the basis, the first differential operation and the second differential operation are utilized to segment the foreground and the background of the image, and the detection accuracy is greatly improved. On the other hand, different extraction frame rates are adopted for different types of areas, so that the operation amount is reduced and the recognition efficiency is ensured. Different characteristic information is fused based on the combination of video frames with different frame rates and the self-adaptive space-time fusion module, so that the accuracy of identifying abnormal behaviors is improved.

Drawings

FIG. 1 is a flow chart of a video-based examination room anomaly frame extraction method of the present invention;

FIG. 2 is a single frame schematic of a video stream of an examination room;

FIG. 3 is a schematic diagram of the extraction of desktop information from FIG. 2;

FIG. 4 is a schematic diagram of the distribution of forbidden and active intervals according to the present invention;

FIG. 5 is a schematic diagram illustrating correction of edge data and a desktop of a test paper according to the present invention;

FIG. 6 is a schematic diagram illustrating the correction of upper limb data and table top of the subject of the present invention;

FIG. 7 is a schematic diagram of a first and second anomaly frame store structure of the present invention;

FIG. 8 is a block diagram of a video-based examination room anomaly frame extraction system of the present invention;

fig. 9 is a schematic diagram of a depth convolution algorithm of the frame identification unit of the present invention.

Description of the embodiments

For a clearer understanding of the objects, technical solutions and advantages of the present application, the present application is described and illustrated below with reference to the accompanying drawings and examples.

In the prior art, abnormal behaviors in an examination room are confirmed through analysis of video streams, so that the workload of manual invigilation is reduced. In practical application, the electronic invigilation can reduce the manual workload, but the identification accuracy is lower in the development cases such as Huang Zhiwei. Therefore, the user requests the system to more accurately determine the video frame where the abnormal behavior is located, and specifically determine the attribute of the abnormal behavior and the corresponding examinee.

Example 1

As shown in fig. 1 to 7, in the video-based examination room abnormal frame extraction method of the present invention, reference objects such as a desktop, an active section, a forbidden section, etc. are preset in background data of an examination room, and a plurality of abnormal frames are determined by a test paper and a pixel relationship between a target object and the reference object. The examination room abnormal frame extraction method comprises the following steps:

step 1: and storing the identity data and the face data of a plurality of target objects corresponding to any examination room numbers. The identity data in this embodiment is an identification card number. A database is established in the back-end server, examinee information needing to take an examination is collected in advance and is input into the system database, the information comprises identity card data and face data of the examinee, and the face data of the second-generation identity card is imported into the system, so that the subsequent face comparison and identification are facilitated. In order to reduce redundancy of the system, the embodiment of the invention adopts image graying to preprocess the video frame, namely, gray values Gray (x, y) with values of 0-255 are used for representing the frame pixels, wherein Gray (x, y) =0.299×r (x, y) +0.587×g (x, y) +0.144×b (x, y). Red (R), green (G), and blue (B) are three primary colors of pixels in the original video, respectively.

Step 2: before an examination starts, at least one first video frame of the examination room is acquired, and examination room background data is generated according to the first video frame. After the examination room is arranged, selecting a video of which no person is in the examination room before any section of examination room, and acquiring at least one image, namely a first video frame. The first video frame contains no pictures of any target object at all, only the scene graph of the examination room. And taking the pixel points of the first video frames which accord with Gaussian distribution as background pixels, and combining the background pixels of a plurality of first video frames to generate examination room background data. Referring to the schematic diagram of the examination room video frame shown in fig. 2, the partially fixed pixel characteristics of the examination room background data in the examination room remain stable during the examination before and after the examination, such as a table top, aisle, wall surface, etc. The continuously moving pixel features are mainly objects such as examinees, test papers and the like. The method and the system for extracting the abnormal frames of the examination room based on the video can distinguish the video frames before the examination from the video frames in the examination, and can avoid that a target object (examinee) is identified as background data when the background data is extracted in the examination.

According to the invention, background modeling is performed by adopting the Gaussian mixture, so that the influence of factors such as environment on the image is reduced. Specifically, the invention selects N first video frames before examination in the video stream, and constructs a plurality of Gaussian distributions (generally taking integers between 3 and 5) for each pixel point. The pixel mean and variance of the pixel values of the pixel points (x, y) in the first N first video frames are calculated. If the difference between the pixel value of the pixel point (x, y) in the current first video frame and the pixel mean value is less than the variance by a factor D, the pixel point (x, y) is the background point. And merging the plurality of background points meeting the requirements to generate the background data of the examination room. D is a coefficient, typically 2.5-3.

Step 3: capturing a plurality of desktops from the examination room background data, and determining an activity interval corresponding to any identity data and a forbidden interval corresponding to the examination room number according to the desktop coordinates. In a specific embodiment, the size and color of the table top of the examination room are kept stable, the background data of the examination room can be processed according to a preset table top pixel value pair, for example, the background data of the examination room is binarized by taking the table top pixel value as a standard threshold value, a binarized image of a salient table top area is obtained, extraction of each table top is achieved, and extracted table top information is shown in fig. 3. It should be noted that, due to the view angle of the imaging unit, part of the target objects overlap in a normal state. The present embodiment incorporates only forbidden intervals between target objects that do not overlap in a normal examination state. In another embodiment, the examination room has a plurality of image capturing units, and the different image capturing units respectively define the corresponding forbidden regions.

In the prior art, when the characteristics of a moving target object are extracted, sometimes, the foreground is easily treated as the background when the foreground is segmented because the background is static or the action amplitude of the target object is small and the speed is low. According to the position relation of the desktop, the invention determines the activity interval of the desktop corresponding to the examinee (target object) and the forbidden interval in the examination room, as shown in fig. 4. The method adopts the predetermined active interval and the predetermined forbidden interval to identify the target object in the predetermined area, thereby greatly improving the accuracy.

Step 4: after the examination starts, extracting a plurality of second video frames from the video stream at intervals of t time, performing first differential operation on the second video frames and examination room background data to obtain differential images, and capturing the differential imagesTarget object and test paper. The frame interval in the video stream is very short, and in order to reduce the system operation amount, the embodiment sets a period larger than the frame interval to extract part of the video frames in consideration of the low abnormal behavior speed. And then adopting the first differential operation to process the second video frame, and extracting foreground pixels of the second video frame. the pixel value of the pixel point (x, y) of the second video frame at time t is F _t (x, y) pixel value D of pixel point (x, y) in differential image _t (x,y)=| F _t (x, y) -M (x, y) |, M (x, y) examination room background data.

The present embodiment does not limit the extraction manner of the target object and the test paper. For a target object, the pixel area of an omega-shaped target can be matched, the omega-shaped target comprises various postures of the shoulder part of the head of a person, and also comprises long hair shawl and various short hairs, so that the area where an examinee is can be accurately identified. For the test paper, the area where the test paper is located can be identified through the main concentrated area of the pixel values of the test paper. Can be used for data extraction _t (x, y) at T ₁ Binarizing the threshold value to obtain R _t (x, y), in R _t (x, y) is an extracted sample of the target object. Wherein R is _t （x,y）=

，T ₁ Typically a value of from 12 to 48.

Step 5: extracting upper limb data (such as omega) and head data of a target object and edge data of a test paper, extracting real-time face data from the head data, and entering a step 6 if the real-time face data is matched with the face data, otherwise, sending an identity review notification to a corresponding examination room. The present embodiment can detect the position of the hand, face, etc. by skin color detection and segmentation. The method comprises the steps of combining hair color to extract data characteristics of the head, establishing a hair color model by black hair, setting a hair color threshold value, and selecting image points with the R, G, B channel gray values below the threshold value as hair color reference values. And performing image morphology processing on the differential image, performing continuous opening and closing operation, removing white isolated points and filling black holes. When the color-developed region is adjacent to and the color developed above the skin-tone region, the region is identified as the head region of the target object. Upper limb data and head data of the target object can be captured. The image edge is the place where the image gray level suddenly changes, and the paper color of the test paper is usually white and is regular rectangle. The edge data of the test paper can be obtained by extracting the edge area of the test paper based on the color of the test paper by the same method. The matching algorithm of the real-time face data and the face data can be completed by means of an opencv platform, and details are omitted here.

Step 6: and if the edge data of the test paper is positioned outside the desktop or the upper limb data of the target object is positioned outside the active interval, marking the second video frame as a first abnormal frame. After the edge data of the test paper are identified, the edge of the test paper is continuously tracked. The correction diagram of the edge data of the test paper and the desktop shown in fig. 5 shows the relationship between the edge data of the test paper and the desktop. If the coordinates of the edge data exceed the desktop area, the test paper is not completely placed on the desktop, and then the second video frame is marked as a corresponding test paper abnormal frame. The invention continuously tracks the limbs of the target object and judges the relation between the upper limb data and the activity interval. As shown in fig. 6, a correction diagram of upper limb data and a table top of the target object shows a positional relationship between the upper limb data and the activity zone of the target object. If the upper limb data exceeds the coordinate area of the active zone, the target object may have abnormal behavior, and the second video frame is marked as an object abnormal frame.

Step 7: extracting a plurality of third video frames from the video stream at intervals of 4t, performing second difference operation on any third video frame and two adjacent third video frames to obtain a binary image, and marking the third video frame as a second abnormal frame if forbidden intervals of the binary image are not communicated. The prohibition region is a region that the target object cannot pass during the examination, and is usually continuous and cannot be disconnected. The on-off interval communication is represented by the background discontinuity of the off-interval, and the foreground pixels on two sides of the off-interval are connected (adjacent examinee limbs are contacted). Abnormal behavior may occur. In this embodiment, when it is determined that the target object is in the active zone, the relationship between the forbidden zone and the target object is examined, and the logic is that the electronic imaging unit in the real examination room cannot be right against shooting each examinee, that is, the limb behavior of the examinee is represented in the video frame and has an error, so that in order to improve the accuracy of abnormal behavior identification, it is necessary to determine whether the forbidden zones are connected. The second differential operation is further described in embodiment two.

Step 8: abnormal behavior in the first abnormal frame and the second abnormal frame is identified. In this step, firstly, the abnormal frames are stored in sequence, and then the behavior in the abnormal frames is identified by adopting a convolution network. As shown in fig. 7, the data structure of the database of the present application adopts a b+ tree index structure, the root node adopts the above examination room number, the leaf node is identity data, and the index sequence of the storage unit is period t. Through the storage structure, the first abnormal frame and the second abnormal frame of the same target object can be ensured to be continuous, and the abnormal frames generated in the same time period are ensured to be continuous, so that convenience is provided for subsequent machine or manual identification.

Example two

The embodiment discloses a preferred processing method of a third video frame. The method comprises the steps of firstly preprocessing an obtained third video frame, and then subtracting two or three continuous frames of the third video frame to obtain a corresponding difference value. The pixel value F of the third video frame pixel point (x, y) in time (t-1) of this embodiment 4 _4(t-1) Pixel value F of (x, y) and (x, y) of the third video frame pixel at 4t time _4t (x, y) obtaining a first intermediate image R after a first differential operation and a binary operation ₁ (x, y), the pixel value F of the third video frame pixel point (x, y) at 4t time _4t Pixel value F of (x, y) and (t+1) time third video frame pixel point (x, y) _4(t+1) (x, y) obtaining a second intermediate image R after the first differential operation and the binary operation ₂ (x, y). First intermediate image R ₁ (x, y) and a second intermediate image R ₂ (x, y) is calculated to obtain a binary image P _4t (x, y). The first intermediate image and the second intermediate image pass through inter-frame difference and binarization threshold T ₂ Obtained by processing, threshold T ₂ Typically values 12 to 48 (pixel values 0 to 256).

，/>

。

。

The present embodiment can determine whether or not the prohibition regions of the binary image are connected by the following method. Defining a stack structure, selecting a background pixel point as a seed in the forbidden interval, and giving a parameter value label to the background pixel point. Background pixels adjacent to the seed are then pushed onto the stack. And popping up the pixel at the stack top, giving the same parameter value label, and pushing all foreground pixels adjacent to the pixel at the stack top into the stack. Repeating the step of stacking until the stack is empty. And continuing scanning other background pixels in the forbidden region, and repeating the stacking operation until the scanning is finished. According to two basic elements (the pixels are the same and the coordinates are adjacent) of the connected region, combining all background pixels adjacent to the seeds into the same pixel set, and finally obtaining the pixel set which is the connected region. If the forbidden regions are not communicated, the fact that limbs or other objects are communicated in the forbidden regions and the forbidden regions are beyond the specified movable range is indicated. For example, the step can extract abnormal behaviors such as data transmission between two people. In the present embodiment, the second abnormal frame corresponds to a plurality of target objects adjacent to the prohibited area. In another embodiment, the second abnormal frame corresponds to a plurality of target objects of the examination room to which the forbidden area belongs.

Example III

The video-based examination room abnormal frame extraction method of the embodiment further discloses a method for identifying abnormal behaviors in the first abnormal frame and the second abnormal frame. The frame rate of the first abnormal frame is greater than that of the second abnormal frame, and two parallel network branches, namely a Slow branch and a Fast branch, of the Slow Fast identification network are adopted in the embodiment and are respectively used for capturing time domain and space characteristics. The former is sensitive to changes in spatial detail and color ratio, but responds slower to stimuli, compared to P-and M-cells in the cell. The latter, in contrast, can be sensitive to rapidly changing times. In this embodiment, a plurality of first abnormal frames and second abnormal frames are sent to a feature recognition network, respectively, and a parameter transfer channel of a feature extraction network from the first abnormal frames to the second abnormal frames is established. The output characteristics of the first abnormal frame are projected to the output characteristics of the second abnormal frame, and the motion information is extracted with fine time resolution while the spatial semantic information is captured.

In addition, the embodiment can also obtain a preset model based on the SlowFast recognition network training, so that the parameters of convolution operation are more accurate. The sample data set is correspondingly marked by using an open source marking tool CVAT, for example, the examination hall behaviors are divided into five categories, including standing, writing, lifting hands, returning, lying on a table and the like. The sample data set is divided into a training set and a verification set, the training set is used for training the parameters of the SlowFast recognition network, and the verification set is used for reasoning and calculating the related weight.

Example IV

As shown in fig. 8, the present embodiment discloses an examination room abnormal frame extraction system according to the video-based examination room abnormal frame extraction method, which includes an object database, an image capturing unit, a model database, a frame processing unit, a detecting unit, a tracking unit, a frame identifying unit, and the like. The object database is used for storing identity data and face data of a plurality of target objects. The camera shooting unit is used for acquiring video streams of the examination room. The model database is used for storing desktop coordinates, active intervals and forbidden intervals in the video stream. The frame processing unit is used for extracting a second video frame and a third video frame from the video stream and marking the first abnormal frame and the second abnormal frame. The frame identifying unit identifies an abnormal behavior from the first abnormal frame and the second abnormal frame. The detection unit is used for calibrating a moving object in the video stream, and the tracking unit is used for tracking the moving object, so that the extraction efficiency of the second video frame and the third video frame is improved.

The frame recognition unit extracts features of the first abnormal frame by using Fast branches and features of the second abnormal frame by using Slow branches according to the principle of a depth convolution algorithm of the frame recognition unit shown in fig. 9. Input layer: the iterative process of each iterative training takes 3.2s (0.2×16) video segments as input, the frame rate is 0.2s, and the size is c×h×w, which correspond to the number of input channels, the image height and the image width, respectively.

Feature recognition network: the feature recognition network adopts a fast-slow parallel double-branch structure, and takes 3D-ResNet50 as a main feature extraction algorithm. The slow branch is mainly used for capturing the spatial semantic information, and the fast branch is mainly used for capturing the temporal motion information.

And the self-adaptive space-time fusion module is as follows: in order to better perceive the space-time motion characteristics, the embodiment adopts an adaptive space-time fusion module (ASF for short) to fuse the information of the space dimension and the time dimension more efficiently. Specifically, the output characteristics of the first abnormal frame and the second abnormal frame in each layer in the characteristic recognition network are K _fj And K _sj J is the number of downsampling layers. First, K is calculated by using projection matrix _fj Alignment from time dimension and channel dimension to K _sj Feature fusion is carried out through a 3D sparse matrix, and finally a weight activation graph is output through a normalized exponential function (softmax for short) and is superimposed on K _sj So as to obtain the space-time perception characteristics with more discriminant.

And a prediction module: after obtaining the feature map K output by the feature recognition network _f5 And K _s5 And (5 layers of downsampling layers, j=5), compressing the characteristics by using a 3D global average pooling layer, connecting two branch characteristic graphs in series, and finally predicting the final behavior category by using a full-connection layer.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims

1. The examination room abnormal frame extraction method based on the video is characterized by comprising the following steps of:

step 8: identifying an anomalous behavior in the first anomalous frame and the second anomalous frame, wherein,

4 (t-1) the pixel value F of the third video frame pixel point (x, y) at time _4(t-1) Pixel value F of (x, y) and (x, y) of the third video frame pixel at 4t time _4t (x, y) obtaining a first intermediate image R after a first differential operation and a binary operation ₁ (x, y), the pixel value F of the third video frame pixel point (x, y) at 4t time _4t Pixel value F of (x, y) and (t+1) time third video frame pixel point (x, y) _4(t+1) (x, y) obtaining a second intermediate image R after the first differential operation and the binary operation ₂ (x, y), a first intermediate image R ₁ (x, y) and a second intermediate image R ₂ (x, y) is calculated to obtain a binary image P _4t (x,y)，

Binary image

。

2. A method for extracting an abnormal frame of an examination room based on video as claimed in claim 1, wherein in step 2, pixels conforming to gaussian distribution of the first video frame are used as background pixels, and a plurality of background pixels of the first video frame are combined to generate background data of the examination room.

3. A method for extracting an abnormal frame of a video-based examination room according to claim 1, wherein in step 4, the pixel value of the pixel point (x, y) of the second video frame at time t is F _t (x, y) pixel value D of pixel point (x, y) in differential image _t (x,y)=| F _t (x, y) -M (x, y) |, M (x, y) is examination room background data.

4. A video-based examination room anomaly frame extraction method according to claim 1, wherein in step 5, if the real-time face data does not match the face data, an identity review notification is sent to the corresponding examination room.

5. The method for video-based examination room anomaly frame extraction of claim 1,

，

，T ₂ is the binarization threshold for the second differential operation. />

6. A video-based examination room anomaly frame extraction method according to claim 1, wherein in step 8, a plurality of first anomaly frames and second anomaly frames are respectively fed into a feature recognition network, and a parameter transfer path of the feature extraction network from the first anomaly frames to the second anomaly frames is established, and anomaly behaviors in the first anomaly frames and the second anomaly frames are recognized based on the feature recognition network.

7. The video-based examination room anomaly frame extraction method of claim 1, wherein the length of t is 0.2s, and the identity data is an identity card number.

8. An examination room anomaly frame extraction system according to the video-based examination room anomaly frame extraction method of claim 1, comprising: an object database, a camera shooting unit, a model database, a frame processing unit and a frame identification unit, wherein,