CN116503805A

CN116503805A - Examination room monitoring method, electronic equipment and storage medium

Info

Publication number: CN116503805A
Application number: CN202310480669.4A
Authority: CN
Inventors: 温才镇; 朱龙柏; 李凯; 李福海; 刘应斌
Original assignee: Jitter Technology Shenzhen Co ltd; Shenzhen Instant Construction Technology Co ltd
Current assignee: Jitter Technology Shenzhen Co ltd; Shenzhen Instant Construction Technology Co ltd
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-07-28

Abstract

The application relates to the technical field of monitoring, and provides an examination room monitoring method, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a picture sequence of a plurality of view angles acquired by an examination room environment; performing scene detection on each picture in the picture sequence to obtain a first detection result; performing environment detection on the examination room to obtain a second detection result; detecting illegal objects in each picture to obtain a third detection result; calculating a normalization score of the examination room according to the first detection result and the second detection result; and carrying out normalization monitoring on the examination room based on the normalization score and the third detection result of the examination room to obtain a monitoring result. According to the method and the device, scene detection, environment detection and illegal object detection are carried out on the examination room through the collected picture sequences with multiple visual angles, the examination room is monitored from multiple dimensions, and the monitoring accuracy is improved.

Description

Examination room monitoring method, electronic equipment and storage medium

Technical Field

The application relates to the technical field of monitoring, in particular to an examination room monitoring method, electronic equipment and a storage medium.

Background

Compared with the traditional offline examination, the online examination can track the examination room performance of the examinee in real time by using networking equipment such as monitoring equipment (e.g. cameras, mobile phones) and the like. The manpower requirement of the prisoner can be reduced through the online examination, and the operation cost of the examination is reduced.

However, in an actual examination scene, illegal objects are easy to appear in the examination room, and monitoring the behavior of the examinee through camera monitoring cannot ensure that the examination room meets the standardized requirements, so that the monitoring accuracy of the examination room is low.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a test room monitoring method, an electronic device and a storage medium, which can solve the technical problem of low test room monitoring accuracy caused by that the test room does not meet the standardization requirement when performing test room monitoring.

A first aspect of the present application provides an examination room monitoring method, the method comprising: acquiring a picture sequence of a plurality of view angles acquired by an examination room environment; performing scene detection on each picture in the picture sequence to obtain a first detection result; performing environment detection on the examination room to obtain a second detection result; detecting illegal objects in each picture to obtain a third detection result; calculating a normalization score of the examination room according to the first detection result and the second detection result; and carrying out normalization monitoring on the examination room based on the normalization score of the examination room and the third detection result to obtain a monitoring result.

In one embodiment, the performing scene detection on each picture in the sequence of pictures to obtain a first detection result includes: inputting each picture into a feature pyramid network to obtain a spliced feature map of each picture; inputting the spliced characteristic images of each picture into a classification network to obtain a plurality of classification characteristic images of each picture; inputting each classification characteristic diagram into an attention network to obtain an attention characteristic diagram of each classification characteristic diagram; feature fusion is carried out on each classification feature map and the corresponding attention feature map, so that a semantic feature map of the corresponding classification feature map is obtained; determining a semantic segmentation result of each picture according to the semantic feature map corresponding to each classification feature map of each picture; acquiring semantic information in the semantic segmentation result of each picture, matching the semantic information with semantic information of each scene in a preset first database, and calculating a scene matching degree score of the corresponding scene; and taking the highest scene matching degree score as a first detection result of each picture.

In one embodiment, the performing scene detection on each picture in the sequence of pictures to obtain a first detection result includes: extracting a first characteristic value of each picture; inputting each picture into a feature pyramid network, and obtaining feature fusion graphs of different spatial scales of each picture; inputting the feature fusion graphs of different spatial scales of each picture into a first scene classification model, and determining a first scene corresponding to each picture; extracting a second characteristic value of a picture corresponding to the first scene from a preset second database; normalizing the first characteristic value to obtain a first normalized characteristic, and normalizing the second characteristic value to obtain a second normalized characteristic; and calculating cosine distances of the first normalized feature and the second normalized feature to obtain a first detection result of each picture.

In one embodiment, the performing scene detection on each picture in the sequence of pictures to obtain a first detection result includes: extracting a third characteristic value of each picture; acquiring RGB images and depth images of each picture; extracting local features of the RGB image and global features of the depth image; carrying out fusion processing on the local features and the global features to obtain final fusion features; inputting the final fusion characteristics into a second scene classification model to obtain a second scene corresponding to each picture; extracting a fourth characteristic value of a picture corresponding to the second scene from a preset third database; normalizing the third characteristic value to obtain a third normalized characteristic, and normalizing the fourth characteristic value to obtain a fourth normalized characteristic; and calculating cosine distances of the third normalized feature and the fourth normalized feature to obtain a first detection result of each picture.

In one embodiment, the performing environmental detection on the examination room to obtain a second detection result includes: acquiring the environmental noise information, the environmental brightness information and the environmental network information of the examination room; inputting the environmental noise information, the environmental brightness information and the environmental network information into a pre-trained hash network model to obtain a hash code; calculating the Hamming distance between the Hamming code and the Hamming code of each environment in a preset fourth database to obtain the environment matching degree of the corresponding environment; and determining the minimum environment matching degree score as a second detection result of the examination room.

In one embodiment, the calculating the normalization score of the examination room according to the first detection result and the second detection result includes: and carrying out weighted summation on the first detection result and the second detection result to obtain the normalization score of the examination room.

In one embodiment, the performing normative monitoring on the examination room based on the normative score of the examination room and the third detection result, and obtaining the monitoring result includes: if the normalization score of the examination room does not meet the preset normalization threshold, or illegal objects exist in the third detection result, determining that the monitoring result is that the examination room fails examination room normalization detection.

In one embodiment, after the determining that the monitoring result is that the examination room fails examination room normalization detection, the method further includes: if the monitoring result is that the normalization score of the examination room does not meet the preset normalization threshold, a prompt message of the correction examination room is sent; and if the monitoring result is that the illegal object exists in the third detection result, sending prompt information for removing the illegal object.

A second aspect of the present application provides an examination room monitoring apparatus, the apparatus comprising: the acquisition module is used for acquiring picture sequences of a plurality of view angles acquired by the examination room environment; the scene detection module is used for detecting the scene of each picture in the picture sequence to obtain a first detection result; the environment detection module is used for carrying out environment detection on the examination room to obtain a second detection result; the illegal object detection module is used for detecting the illegal object of each picture to obtain a third detection result; the calculating module is used for calculating the normalization score of the examination room according to the first detection result and the second detection result; and the monitoring module is used for carrying out normalization monitoring on the examination room based on the normalization score of the examination room and the third detection result to obtain a monitoring result.

A third aspect of the present application provides an electronic device, where the electronic device includes a processor and a memory, where the processor is configured to implement the examination room monitoring method when executing a computer program stored in the memory.

A fourth aspect of the present application provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor implements the examination room monitoring method.

In summary, according to the examination room monitoring method, the electronic device and the storage medium, the picture sequences of the examination room environment are obtained from multiple view angles, the examination room and the examination room environment in the examination room are subjected to all-dimensional picture acquisition, the scene detection, the environment detection and the illegal object detection are further carried out on the examination room through the acquired picture sequences, the examination room is monitored from multiple dimensions, normal behaviors of the examination room are ensured, the examination room meets the normalization requirement, and the examination room monitoring accuracy is improved.

Drawings

FIG. 1 is an application environment diagram of an examination room monitoring method according to an embodiment of the present application.

FIG. 2 is a flowchart of an examination room monitoring method according to an embodiment of the present application.

FIG. 3 is a flowchart of an examination room monitoring method according to an embodiment of the present application.

Fig. 4 is a first flowchart for obtaining a first detection result according to an embodiment of the present application.

Fig. 5 is a second flowchart for obtaining the first detection result according to the embodiment of the present application.

Fig. 6 is a third flowchart for obtaining the first detection result according to the embodiment of the present application.

Fig. 7 is a flowchart of obtaining a second detection result provided in an embodiment of the present application.

FIG. 8 is a block diagram of an examination room monitoring apparatus according to an embodiment of the present application.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in detail below with reference to the accompanying drawings and specific embodiments.

It should be noted that "at least one" in this application means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and the representation may have three relationships, for example, a and/or B may represent: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion. The following embodiments and features of the embodiments may be combined with each other without conflict.

FIG. 1 is a diagram of an application environment of a method for monitoring an examination room according to an embodiment of the present application.

The examination room monitoring method is applied to the electronic device 1, the electronic device 1 is communicated with the shooting device 2, the shooting device 2 is used for shooting various types of images from multiple view angles, for example, the shooting device 2 can shoot images of an examinee and an examination room environment where the examinee is located. The electronic device 1 is used for monitoring images captured by the capturing device 2.

The electronic device 1 provided in the embodiment of the present application may be any electronic product that can perform man-machine interaction with a user, for example, a computer, a monitoring device, a mobile phone, a tablet computer, etc.

The electronic device 1 may comprise a network device and/or a user device. Among them, network devices include, but are not limited to, a single network electronic device, an electronic device group composed of a plurality of network electronic devices, or a Cloud composed of a large number of hosts or network electronic devices based on Cloud Computing (Cloud Computing).

The network in which the electronic device 1 is located includes, but is not limited to: the internet, wide area networks, metropolitan area networks, local area networks, virtual private networks (Virtual Private Network, VPN), etc.

The photographing apparatus 2 may be an apparatus having a photographing function, for example, a cellular phone, a computer, or the like. The shooting device 2 comprises an audio collector 21, an ambient light sensor 22 and a network speed measuring device 23, wherein the audio collector 21 is used for collecting ambient noise information of an examination room, the ambient light sensor 22 is used for measuring ambient brightness information, and the network speed measuring device 23 is used for measuring network uplink and downlink speeds of the environment of the examination room, namely environment network information.

Fig. 2 is a schematic flow chart of an examination room monitoring method according to an embodiment of the present application.

In one embodiment, when taking an online examination, a plurality of photographing devices can be deployed to collect videos or pictures of the examination room environment from a plurality of view angles, the collected pictures or image frames in the videos are utilized to detect examination room scenes and examination room illegal objects, and meanwhile, the audio collector 21, the ambient light sensor 22 and the network speed measuring device 23 in the photographing devices are utilized to detect the examination room environment.

In one embodiment, for examination room scene detection, the electronic device may extract scene feature data from a picture acquired by the photographing device, and complete detection of the examination room scene based on analysis of the scene feature data; aiming at detection of illegal articles in an examination room, the electronic equipment can extract article information of the illegal articles such as a computer, a mobile phone and a tablet computer from pictures acquired by the shooting equipment, and the detection of the illegal articles in the examination room is completed based on analysis of the article information of the illegal articles; for examination room environment detection, the electronic equipment can acquire data such as environmental noise information, environmental brightness information, environmental network information and the like from the shooting equipment, and the detection of the examination room environment of the examination room is completed based on the analysis of the data.

In one embodiment, based on the examination room environment detection results and the examination room scene detection results, the electronic device may determine whether the normalization score of the examination room meets a preset normalization threshold. If the normalization score of the examination room meets a preset normalization threshold value or no illegal object exists in the examination room, determining that the examination room passes examination room normalization detection; if the normalization score of the examination room does not meet the preset normalization threshold, or illegal objects exist in the examination room, determining that the examination room does not pass the examination room normalization detection.

FIG. 3 is a flowchart of an examination room monitoring method according to an embodiment of the present application. The examination room monitoring method is applied to an electronic device (such as the electronic device 1 in fig. 1), and specifically includes the following steps, the order of the steps in the flowchart may be changed according to different requirements, and some may be omitted.

And 10, acquiring a picture sequence of a plurality of view angles acquired for the examination room environment.

In one embodiment, the examination room can be shot from multiple view angles by multiple shooting devices, and a picture sequence of the multiple view angles is obtained. The image sequence of the plurality of views includes images obtained after the photographing device photographs the plurality of views of the examination room, and in particular, the image sequence may include a plurality of images of each view. In another embodiment, an examination room video shot by the shooting device may be acquired, and image frames of multiple views are acquired from the video, so that multiple pictures of each view are obtained according to the image frames.

In one embodiment, based on the picture sequence, the electronic device can monitor the examinees and the examination room environment in the examination room in all directions, so that the comprehensiveness of the picture sequence adopted by the examination room monitoring is ensured, and the examination room monitoring accuracy is improved.

And 11, performing scene detection on each picture in the picture sequence to obtain a first detection result.

In one embodiment, scene detection includes detecting a scene of an examination room, e.g., the examination room scene may include: a pen test examination room, a machine test examination room, a interview examination room, etc.

In one embodiment, detection may be performed according to objects (tables, computers, examination papers, etc.) existing in different examination hall scenes, positions corresponding to the objects, and corresponding labels (examinees, staff, seat numbers) of the objects, so as to determine a detection result of the scene matching degree of the examination hall, and use the detection result of the scene matching degree as a first detection result.

The method comprises the steps of obtaining each picture of each view angle in a picture sequence, detecting a table in each picture, a position of the table and/or a label corresponding to the table, specifically, if one table is detected in the picture of the first view angle, detecting a seat card with a test taker mark on the table; detecting a plurality of tables in the picture of the second visual angle, and detecting a seat card with an examinee mark on each table; detecting a plurality of tables in the picture of the third visual angle, detecting a seat card with a staff mark on each table, combining detection results of the first visual angle, the second visual angle and the third visual angle, and judging that the detection result of the scene matching degree of the examination room is an interview examination room based on the detection results.

In one embodiment, the specific method for the electronic device to perform scene detection on each picture in the sequence of pictures to obtain the first detection result may be referred to as the following detailed description of the flow shown in fig. 4 to 6.

And 12, performing environment detection on the examination room to obtain a second detection result.

In one embodiment, the environmental detection includes detecting environmental conditions in the examination room, e.g., the environmental conditions may include noise, ambient brightness, and ambient network information, etc., of the examination room, where the noise may include detection of a foot strike, a table strike, a teacher step, an air conditioner, etc., in the examination room; ambient brightness may include detecting light conditions in the examination room; the environmental network information may include detecting basic information services (e-mail, file transfer, telnet, etc.) in the examination room, and network information retrieval services (interactive services, digital library services, etc.), and using the detection results of noise, environmental brightness, and environmental network information of the examination room to determine the detection result of the environmental matching degree of the examination room, and taking the detection result of the environmental matching degree as a second detection result.

In one embodiment, for a specific method for performing environmental detection on an examination room by using an electronic device to obtain a second detection result, refer to the following detailed description of the flow shown in fig. 7.

And 13, detecting illegal objects in each picture to obtain a third detection result.

In one embodiment, detection of the offending item in the examination room can be completed by monitoring the offending item in real time, for example, the offending item can include a mobile phone, a tablet computer, a book, and the like, and the third detection result indicates a detection result of the offending item in the examination room.

In one embodiment, the offending item in each picture may be detected using a pre-trained target detection model, which may be, for example, a yolov5, ppyoloe, etc. model. Before detecting the illegal object, training the target detection model by using the historical picture containing the illegal object, so that the target detection model obtained by training can detect the illegal object in each picture.

The method includes the steps that a pre-trained target detection model is adopted, the positions of illegal objects in each picture are detected, the types (such as mobile phones, tablet computers and books) of the illegal objects are determined, for example, the target detection model of the illegal objects can be based on yolo algorithm, the positions of the illegal objects are obtained from each picture, the positions of the illegal objects are marked by rectangular frames, and meanwhile, the types of the illegal objects marked by the rectangular frames are distinguished by preset type labels, so that detection results of the illegal objects are obtained.

In one embodiment, if the offending item is detected in each picture, it is determined that the offending item is present in the third detection result, and if the offending item is not detected in each picture, it is determined that the offending item is not present in the third detection result.

In one embodiment, when an examinee takes an examination, some illegal objects are often brought into the examination room, and electronic equipment detects the illegal objects on each picture, so that the objects irrelevant to the examination are prevented from being generated in the examination room, the standardization of the examination room is ensured, and the monitoring accuracy of the examination room is further improved. 14, calculating the normalization score of the examination room according to the first detection result and the second detection result.

In one embodiment, the normalization score may represent a normalization degree of the examination room, and is used to determine whether the examination room meets a normalization requirement, where the normalization degree of the examination room is obtained after the electronic device performs a scene detection environment detection result on the examination room.

In one embodiment, the electronic device calculating the normalization score of the examination room according to the first detection result and the second detection result includes: and the electronic equipment performs weighted summation on the first detection result and the second detection result to obtain the normalization score of the examination room.

The electronic device obtains a weight value of the first detection result and a weight value of the second detection result, calculates a first product of the first detection result and a corresponding weight value, and calculates a second product of the second detection result and a corresponding weight value, and calculates a sum of the first product and the second product to obtain a normalization score of the examination room.

In one embodiment, by calculating the normative score of the examination room, illegal objects which influence examination or are possibly cheated cannot appear in the examination room, and the monitoring accuracy of the examination room is improved.

And 15, performing normalization monitoring on the examination room based on the normalization score and the third detection result of the examination room to obtain a monitoring result.

In one embodiment, the electronic device performs normalization monitoring on the examination room based on the normalization score and the third detection result of the examination room, and the obtaining the monitoring result includes: if the normalization score of the examination room does not meet the preset normalization threshold value or illegal objects exist in the third detection result, determining that the monitoring result is that the examination room fails the examination room normalization detection; if the normalization score of the examination room meets a preset normalization threshold and no illegal object exists in the third detection result, determining that the monitoring result is that the examination room passes examination room normalization detection, and the examination is normally carried out. If the examination room does not pass the normalization detection, prompting that the examination room needs to be normalized and then taking an examination.

In one embodiment, the normalization threshold may be preset, for example, the normalization threshold may be set to 90% or 95%, or may be set according to a specific examination room scenario.

In one embodiment, if the monitoring result is that the normalization score of the examination room does not meet the preset normalization threshold, the electronic device may send a prompt message for rectifying the examination room; if the monitoring result is that the illegal object exists in the third detection result, the electronic device can send prompt information for removing the illegal object to a preset terminal device, for example, the electronic device can be a device such as a mobile phone of a monitoring person to prompt standardized processing of an examination room.

In one embodiment, the prompt message is used to instruct the user to modify the current examination room environment, actively replace the examination room environment, or remove illegal objects in the examination room, where the user may be an examinee or an invigorator.

Further, after the examination room is normalized through the prompt message, the normalized examination room environment needs to be subjected to image sequence collection at a plurality of visual angles again, the normalized examination room is subjected to normalized detection again, and after the examination room passes the normalized detection, the normal examination can be started by prompting.

According to the examination room monitoring method, the picture sequences of the examination room environment are obtained from multiple visual angles, the examination staff in the examination room and the examination room environment are subjected to all-round picture acquisition, scene detection, environment detection and illegal object detection are further carried out on the examination room through the acquired picture sequences, the examination room is monitored from multiple dimensions, normal behaviors of the examination staff in the examination room are ensured, the examination room meets the normalization requirement, and the examination room monitoring accuracy is improved.

Fig. 4 is a first flowchart of acquiring a first detection result in an application embodiment, where a method for acquiring the first detection result is applied to an electronic device, and includes the following steps:

1110, the electronic device inputs each picture into the feature pyramid network to obtain a stitched feature map for each picture.

In one embodiment, the feature pyramid network may construct feature graphs of different spatial scales through an image pyramid, obtain context information of each picture through the image pyramid, and then splice the context information to obtain a spliced feature graph of each picture. Specifically, high semantic information of high-level features of the feature map of each spatial scale is output through a line from bottom to top, high semantic information of low-level features of the feature map of each spatial scale is output through a line from top to bottom, the high semantic information of the feature map of each spatial scale is fused, and the fused feature maps of multiple scales are spliced to obtain a spliced feature map.

In one embodiment, since multiple scales exist in objects (such as tables, computers, examination papers and the like) in an examination room scene during examination of the examination room scene, semantic features of small-scale objects are difficult to detect, in the embodiment, each picture is input into a feature pyramid network, and is constructed into feature graphs of four spatial scales A1, A2, A3 and A4 through an image pyramid, high semantic information corresponding to the objects in the examination room scene with different spatial scales is extracted from different layers of the network through lines from bottom to top and lines from top to bottom respectively to obtain a feature graph, and the feature graph output by a route from bottom to top and the feature graph output by a route from top to bottom with the same spatial scale are connected from top to bottom to obtain a fusion feature graph with each spatial scale, for example, the A1 spatial scale corresponds to one fusion feature graph being P1; a2, the spatial scale corresponds to a fusion feature map which is P2; a3, the spatial scale corresponds to a fusion feature map which is P3; the A4 space scale corresponds to a fusion characteristic diagram which is P4.

Further, pooling the fusion feature images with different scales to obtain high semantic information of each object in the examination scene, and then splicing to obtain a spliced feature image, wherein the spliced feature image contains the high semantic information of the objects in the examination scene.

In one embodiment, the feature pyramid network can connect the high-level features of low-resolution and high-semantic information with the low-level features of high-resolution and low-semantic information from top to bottom, so that the object features in the examination scene in all spatial scales have rich semantic information, and the accuracy of object detection in the examination scene is improved.

1111, the electronic device inputs the spliced feature map of each picture to the classification network, so as to obtain multiple classification feature maps of each picture.

In one embodiment, a classification network may be trained in advance, and the classification network is used to classify the spliced feature images of each picture to obtain classified feature images corresponding to the classes of the examination scenes respectively.

1112, the electronic device inputs each classification feature map to the attention network to obtain an attention feature map of each classification feature map.

In one embodiment, the attention network may be used in tasks such as image restoration, motion recognition, and object detection to enhance the weight of objects related to the classification of the examination scene in the classification feature map while weakening the weight of objects unrelated to the classification of the examination scene.

In one embodiment, the pooling layer of the attention network is utilized to carry out average pooling processing on each classification characteristic map, the convolution layer of the attention network is utilized to carry out convolution processing on each classification characteristic map, and the characteristic map obtained through the average pooling processing is multiplied by the characteristic map obtained through the convolution processing to obtain the attention characteristic map of each classification characteristic map.

1113, the electronic device performs feature fusion on each classification feature map and the corresponding attention feature map to obtain a semantic feature map of the corresponding classification feature map.

In one embodiment, feature fusion adds pixels in each classification feature map and corresponding attention feature map. Specifically, the attention feature map includes probabilities that a plurality of attention pixels respectively correspond to each other, the classification feature map includes probabilities that a plurality of classification pixels respectively correspond to each other, wherein the probability of each pixel in the classification feature map indicates the probability that the pixel belongs to a class corresponding to the classification feature map, and each pixel in the attention feature map is a fusion of information in a corresponding classification layer.

In one embodiment, a plurality of attention pixel points in the classification feature map and the corresponding attention feature map are in one-to-one correspondence with the plurality of classification pixel points, the value of the same attention pixel point is added with the value of the corresponding classification pixel point to obtain a pixel value of each pixel point, the pixel value of each pixel point is determined to be the probability of the corresponding examination hall scene category, and then the semantic feature map of the corresponding classification feature map is obtained.

1114, the electronic device determines a semantic segmentation result of each picture according to the semantic feature map corresponding to each classification feature map of each picture.

In one embodiment, each picture corresponds to a plurality of classification feature images, each classification feature image corresponds to a semantic feature image, for each pixel point, the maximum pixel value of the corresponding pixel point is screened out from the plurality of semantic classification feature images, and the semantic segmentation result of each picture is determined according to the scene category identifier corresponding to the maximum pixel value.

For example, if each picture corresponds to 3 semantic feature graphs, B1, B2, and B3, respectively, corresponding to 3 examination room scene categories, the method includes: the category of the test examination field is identified as 1, the category of the machine test examination field is identified as 2, the category of the interview examination field is identified as 3, for example, the pixel value of the pixel point Q in B1 is 0.2, the pixel value of the pixel point Q in B2 is 0.15, the pixel value of the pixel point Q in B3 is 0.65, the maximum pixel value is determined to be 0.65, then 3 is marked on the position of the pixel point Q, for example, the category of the examination field scene corresponding to 3 is the interview examination field, and then all the pixel points belonging to 3 are marked as interview examination fields.

1115, the electronic device acquires semantic information in the semantic segmentation result of each picture, matches the semantic information with the semantic information of each scene in the preset first database, and calculates a scene matching degree score of the corresponding scene.

In one embodiment, the preset first database stores semantic information of a scene corresponding to a preset examination scene category, the scene matching degree is used for representing the matching degree of the scene of each picture and the preset examination scene, for example, the semantic information of the scene corresponding to each examination scene category is obtained from the preset first database, and the scene matching degree score between the semantic information in the semantic segmentation result of each picture and the semantic information of the scene corresponding to each examination scene category is calculated.

The electronic device 1116 takes the highest scene matching score as the first detection result for each picture.

In one embodiment, after obtaining the scene matching score, the scene corresponding to the highest scene matching score is determined as the scene of each picture.

In the embodiment of the application, as the feature pyramid network can enable the features under all spatial scales to have rich semantic information, when scene detection is carried out later, the feature mosaic outputted by the feature pyramid network is considered, and the accuracy of the first detection result is improved.

Fig. 5 is a second flowchart of acquiring a first detection result in the application embodiment, where a method for acquiring the first detection result is applied to an electronic device, and includes the following steps:

1121, the electronic device extracts a first feature value of each picture.

In one embodiment, the first feature value is used to characterize a scene feature value in each picture, and specifically, a feature extraction network such as vgg network or resnet network may be used to extract the first feature value from each picture. For example, each picture is subjected to mean preprocessing by adopting a vgg network, the preprocessed pictures are subjected to continuous convolution by using a continuous convolution kernel (3 multiplied by 3), wherein the fixed step length of the convolution is set to be 1, 1 pixel is filled at the edge of the picture, and transmitting the convolution feature map generated after the continuous convolution to a pooling layer of the vgg network to perform feature extraction to obtain a first feature value, wherein the first feature value comprises all pixel values in the extracted picture.

1122, the electronic device inputs each picture into the feature pyramid network, and obtains feature fusion graphs of different spatial scales of each picture.

In one embodiment, a detailed description of step 1122 may refer to step 1110 in the embodiment provided in FIG. 4, and will not be repeated here.

1123, the electronic device inputs the feature fusion graphs of different spatial scales of each picture into the first scene classification model to determine a first scene corresponding to each picture.

In one embodiment, a picture sequence set of a plurality of examination hall scene categories shot in a history mode is obtained, the picture sequence set comprises picture sequences of a plurality of view angles of each examination hall scene category, each picture in the picture sequence set is input into a feature pyramid network to obtain feature fusion graphs of different spatial scales of each picture, and a preset first scene classification model is trained based on the feature fusion graphs of different spatial scales corresponding to the picture sequence set to obtain a first scene classification model.

Specifically, feature fusion graphs of different spatial scales of each picture are input into a full-connection layer of a first scene classification model, a feature set output by the full-connection layer is input into a convolution layer containing a plurality of neurons to obtain a plurality of examination scene categories of each picture, prediction probability values of the plurality of examination scene categories are obtained based on a softmax layer, and an examination scene category corresponding to the largest prediction probability value is screened from the prediction probability values of the plurality of examination scene categories to be determined as a first scene corresponding to each picture.

For example, if each picture corresponds to 3 examination field scene categories, a prediction probability value of 0.8 of the test examination field is obtained, a prediction probability value of 0.1 of the machine test examination field and a prediction probability value of 0.2 of the interview examination field are obtained, and the test examination field is determined as a first scene corresponding to each picture.

1124, the electronic device extracts a second feature value of the picture corresponding to the first scene from a preset second database.

In one embodiment, the preset second database stores a plurality of pictures of preset examination room scene categories, and the specific description of step 1124 may refer to step 1121 in the embodiment provided in fig. 5, which is not repeated here.

1125, the electronic device performs normalization processing on the first feature value to obtain a first normalized feature, and performs normalization processing on the second feature value to obtain a second normalized feature.

In one embodiment, after the first feature value and the second feature value are extracted, normalization processing is performed on the first feature value and the second feature value, respectively, specifically, normalization processing refers to converting each pixel value included in the first feature value and each pixel value included in the second feature value into a range of [0,1], specifically, normalization processing may use the following formula:

Wherein norm represents normalized eigenvalue, x of each pixel after normalization _i The pixel value of the i-th pixel is represented, min (x) represents the minimum pixel value in each picture, and max (x) represents the maximum pixel value in each picture.

1126, the electronic device calculates cosine distances of the first normalized feature and the second normalized feature, and a first detection result of each picture is obtained.

In one embodiment, the cosine distance is used to characterize the scene matching degree between each picture and the predicted picture corresponding to the first scene.

In the embodiment of the application, as the feature pyramid network can enable the features under all spatial scales to have rich semantic information, the scene prediction of the examination room is completed by combining the feature pyramid network and the first scene classification model, the scene matching degree score is output, and the accuracy of the first detection result is improved.

Fig. 6 is a third flowchart for obtaining the first detection result in the application embodiment, including the following steps:

1131, the electronic device extracts a third feature value of each picture.

In one embodiment, the third feature value is used to characterize the scene feature value in each picture, and the specific extraction process of the third feature value may refer to step 1121 in the embodiment provided in fig. 5, which is not repeated here.

1132, the electronic device acquires an RGB image and a depth image of each picture.

In one embodiment, the RGB image information and the corresponding depth image information of each picture are collected by using a feature extraction network to obtain an RGB image and a depth image of each picture, specifically, the RGB image includes colors of three channels of red, green and blue, and the value of each pixel point in the depth image is a depth value, which characterizes the distance between the object corresponding to the pixel point and the photographing device.

1133, the electronic device extracts the local features of the RGB image, and the global features of the depth image.

In one embodiment, the local features are features extracted from the RGB image local region, including edge regions, corner regions, and occlusion regions, and the global features refer to the overall properties of the depth image, including color features, texture features, shape features, and the like.

In one embodiment, a deep neural network may be employed to extract local features of the RGB image, as well as global features of the depth image. The deep neural network comprises a local feature extraction layer, a global feature extraction layer and a fusion layer, wherein the local feature extraction layer can be constructed based on a feature pyramid network and is used for extracting local features of RGB images, and the obtained local features contain local feature information of the RGB images; the global feature extraction layer can be constructed based on a MobileNet 1 lightweight network and is used for carrying out global feature extraction on the depth image, and the obtained global features comprise global feature information of the depth image.

Specifically, inputting the RGB image into a local feature extraction layer of the deep neural network, and extracting local features of the RGB image by the local feature extraction layer to obtain local features output by the local feature extraction layer; and inputting the depth image into a global feature extraction layer of the depth neural network, and performing global feature extraction on the depth image by the global feature extraction layer to obtain global features output by the global feature extraction layer.

1134, the electronic device performs fusion processing on the local feature and the global feature to obtain a final fusion feature.

In one embodiment, the local features and the global features are input into a fusion layer of the deep neural network, and the fusion layer performs feature fusion on the local features and the global features to obtain final fusion features output by the fusion layer.

In one embodiment, as the global feature is not suitable for the condition of occlusion, the feature of the object with occlusion cannot be extracted from the depth image, and the local feature of the extracted RGB image and the global feature of the depth image are fused, so that the fused final fusion feature not only contains the global feature information of each image, but also contains the local feature information of each image, and the integrity of the extracted feature is further ensured.

1135, the electronic device inputs the final fusion features into the second scene classification model to obtain a second scene corresponding to each picture.

In one embodiment, after the picture sequence set is extracted, extracting local features of RGB images and global features of depth images of each picture in the picture sequence set, performing fusion processing on the local features and the global features of each picture, and training a preset second scene classification model based on the final fusion features after the fusion processing to obtain the second scene classification model.

In one embodiment, the process of obtaining the picture sequence set and the specific process of determining the second scene corresponding to each picture by using the second scene classification model may refer to step 1123 in the embodiment as provided in fig. 5.

1136, the electronic device extracts a fourth characteristic value of the picture corresponding to the second scene from a preset third database.

In one embodiment, the preset third database stores a plurality of pictures of preset examination room scene categories, and the specific description of step 1136 may refer to step 1121 in the embodiment provided in fig. 5, which is not repeated here.

1137, the electronic device performs normalization processing on the third feature value to obtain a third normalized feature, and performs normalization processing on the fourth feature value to obtain a fourth normalized feature.

In one embodiment, a specific description of step 1137 may refer to step 1125 in the embodiment provided in fig. 5, and the description will not be repeated here.

1138, the electronic device calculates cosine distances of the third normalized feature and the fourth normalized feature, and obtains a first detection result of each picture.

In one embodiment, a detailed description of step 1138 may refer to step 1126 in the embodiment provided in FIG. 5, and the description will not be repeated here.

In the embodiment of the application, when the scene category classification of the picture is performed, the final fusion feature is considered, the final fusion feature not only comprises the global feature information of each image, but also comprises the local feature information of each image, and in the process of predicting the second scene, the algorithm complexity is reduced, meanwhile, the accuracy of object identification is improved, the accuracy of the identified second scene is ensured, and the accuracy of the first detection result is further improved.

Fig. 7 is a flowchart of obtaining a second detection result in the application embodiment, including the following steps:

1210, the electronic device obtains environmental noise information, environmental brightness information, and environmental network information for the examination room.

In one embodiment, the electronic device obtains the environmental noise information of the examination room from the audio collector 21 of the photographing device 1, obtains the environmental brightness information from the ambient light sensor 22, and obtains the environmental network information of the examination room environment from the network speed measuring device 23, and detects the examination room environment based on the obtained environmental noise information, environmental brightness information, and environmental network information of the examination room.

Illustratively, aiming at the environmental noise information of the examination room, obtaining the environmental noise information by detecting the sound of stamping feet, the sound of knocking a table, the sound of monitoring the footsteps of a teacher, the sound of an air conditioner and the like in the examination room; aiming at the environment brightness information of the examination room, the environment brightness information is obtained by detecting the light brightness degree in the examination room; aiming at the environmental network information of the examination room, the environmental network information is obtained by detecting the information such as computer remote login, email sending and receiving in the examination room.

1211, the electronic device inputs the environmental noise information, the environmental brightness information, and the environmental network information into a pre-trained hash network model, and obtains a hash code.

In one embodiment, a hash network model may be trained in advance, and the hash network model encodes mode information by using a hash encoding layer to obtain hash codes, specifically, after environmental noise information, environmental brightness information and environmental network information of an examination room are obtained, the environmental noise information is used as audio information, the environmental brightness information is used as image information, the environmental network information is used as text information, three mode data of the audio information, the image information and the text information are input into the hash network model to perform hash encoding, multiple mode data are effectively screened and utilized, and the hash codes are output, and in the environment matching degree, the hash codes output by the hash network model are considered, so that the accuracy of the environment matching degree can be improved.

And 1212, the electronic device calculates the Hamming distance between the Hamming code and the Hamming code of each environment in the preset fourth database to obtain the environment matching degree of the corresponding environment.

1213, the electronic device determines the minimum environmental match score as a second detection result of the examination room.

In one embodiment, a preset hash code of each environment is stored in a preset fourth database, where the hash code is used to characterize the environment noise information, the environment brightness information and the environment network information of the environment. The Hamming distance refers to the number of characters in character strings with different lengths, the Hamming distance between the Hamming code output by the Hamming network model and the Hamming code corresponding to each environment in the preset fourth database is calculated, and the corresponding environment is determined according to the Hamming distance.

For example, if the preset third database stores hash codes of 3 environments, the method includes: hash codes corresponding to the first environment: 1011101, hash code corresponding to the second environment: 2043891, hash code corresponding to the third environment: 2153896, hash codes corresponding to the environment of the examination room output by the hash network model: the hamming distance between the calculations 1011101 and 1001001 is 2, the hamming distance between the calculations 2043891 and 1001001 is 5, the hamming distance between the calculations 2153896 and 1001001 is 7, and the first environment corresponding to the hamming distance of 2 is determined as the second detection result of the examination room.

In the embodiment of the application, when the environment of the examination room is detected through the hash network model, the environment noise, the environment brightness, the environment network and other multi-mode data in the examination room are considered, the environment of the examination room is detected through the multi-mode data, and the accuracy of the second detection result is improved.

In some embodiments, the examination room monitoring apparatus 20 may comprise a plurality of functional modules consisting of program code segments. Program code for each program segment in the examination room monitoring apparatus 20 may be stored in a memory of the electronic device and executed by at least one processor to perform the functions of examination room monitoring (described in detail with reference to fig. 2-7).

In this embodiment, the examination room monitoring apparatus 20 may be divided into a plurality of functional modules according to the functions performed by the apparatus. The functional module may include: the system comprises an acquisition module 201, a scene detection module 202, an environment detection module 203, a illegal item detection module 204, a calculation module 205 and a monitoring module 206. A module as referred to herein refers to a series of computer readable instructions capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.

An acquisition module 201, configured to acquire a sequence of pictures of multiple views acquired by an examination room environment; the scene detection module 202 is configured to perform scene detection on each picture in the sequence of pictures to obtain a first detection result; the environment detection module 203 is configured to perform environment detection on the examination room to obtain a second detection result; the illegal item detection module 204 is configured to detect illegal items in each picture, so as to obtain a third detection result; the calculating module 205 is configured to calculate a normalization score of the examination room according to the first detection result and the second detection result; the monitoring module 206 is configured to perform normalization monitoring on the examination room based on the normalization score and the third detection result of the examination room, so as to obtain a monitoring result.

According to the examination room monitoring device, the picture sequences of the examination room environment are acquired from a plurality of view angles, the examination staff in the examination room and the examination room environment are subjected to all-dimensional picture acquisition, scene detection, environment detection and illegal object detection are further carried out on the examination room through the acquired picture sequences, the examination room is monitored from a plurality of dimensions, normal behaviors of the examination staff in the examination room are ensured, the examination room meets the normalization requirement, and the examination room monitoring accuracy is improved.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 1 provided in the embodiment of the present application includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions stored in the memory 12 and executable on the processor 13, such as an image library construction program.

It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, and may include more or fewer components than shown, or may combine certain components, or different components, e.g. the electronic device 1 may also include input-output devices, network access devices, buses, etc.

The processor 13 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor 13 is an operation core and a control center of the electronic device 1, connects various parts of the entire electronic device 1 using various interfaces and lines, and executes an operating system of the electronic device 1 and various applications, program codes, etc. installed.

For example, computer readable instructions may be partitioned into one or more modules/units, which are stored in memory 12 and executed by processor 13 to complete the present application. One or more of the modules/units may be a series of computer readable instructions capable of performing a particular function, the computer readable instructions describing a process of execution of the computer readable instructions in the electronic device 1. For example, the computer-readable instructions may be partitioned into an acquisition module 201, a scene detection module 202, an environment detection module 203, an offending item detection module 204, a calculation module 205, and a monitoring module 206.

The memory 12 may be used to store computer readable instructions and/or modules, and the processor 13 implements the various functions of the electronic device 1 by executing or executing the computer readable instructions and/or modules stored within the memory 12 and invoking data stored within the memory 12. The memory 12 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data created according to the use of the electronic device, etc. Memory 12 may include non-volatile and volatile memory, such as: a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other storage device.

The memory 12 may be an external memory and/or an internal memory of the electronic device 1. Further, the memory 12 may be a physical memory, such as a memory bank, a TF Card (Trans-flash Card), and the like.

The integrated modules/units of the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present application implements all or part of the flow of the method of the above-described embodiments, and may also be implemented by means of computer readable instructions to instruct related hardware, where the computer readable instructions may be stored in a computer readable storage medium, where the computer readable instructions, when executed by a processor, implement the steps of the method embodiments described above.

The computer readable instructions include computer readable instruction code, which may be in the form of source code, object code, executable files, or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer readable instruction code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory).

The memory 12 in the electronic device 1 stores computer readable instructions and the processor 13 can execute the computer readable instructions stored in the memory 12 to implement the examination room monitoring method as in the various embodiments above.

In particular, the specific implementation method of the processor 13 on the computer readable instructions may refer to descriptions of related steps in the corresponding embodiments of fig. 2 to 7, which are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of modules is merely a logical function division, and other manners of division may be implemented in practice.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Finally, it should be noted that the above embodiments are merely for illustrating the technical solution of the present application and not for limiting, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application.

Claims

1. An examination room monitoring method, characterized in that the method comprises the following steps:

acquiring a picture sequence of a plurality of view angles acquired by an examination room environment;

performing scene detection on each picture in the picture sequence to obtain a first detection result;

performing environment detection on the examination room to obtain a second detection result;

detecting illegal objects in each picture to obtain a third detection result;

Calculating a normalization score of the examination room according to the first detection result and the second detection result;

and carrying out normalization monitoring on the examination room based on the normalization score of the examination room and the third detection result to obtain a monitoring result.

2. The examination room monitoring method of claim 1, wherein the performing scene detection on each picture in the sequence of pictures to obtain a first detection result comprises:

inputting each picture into a feature pyramid network to obtain a spliced feature map of each picture;

inputting the spliced characteristic images of each picture into a classification network to obtain a plurality of classification characteristic images of each picture;

inputting each classification characteristic diagram into an attention network to obtain an attention characteristic diagram of each classification characteristic diagram;

feature fusion is carried out on each classification feature map and the corresponding attention feature map, so that a semantic feature map of the corresponding classification feature map is obtained;

determining a semantic segmentation result of each picture according to the semantic feature map corresponding to each classification feature map of each picture;

acquiring semantic information in the semantic segmentation result of each picture, matching the semantic information with semantic information of each scene in a preset first database, and calculating a scene matching degree score of the corresponding scene;

And taking the highest scene matching degree score as a first detection result of each picture.

3. The examination room monitoring method of claim 1, wherein the performing scene detection on each picture in the sequence of pictures to obtain a first detection result comprises:

extracting a first characteristic value of each picture;

inputting each picture into a feature pyramid network, and obtaining feature fusion graphs of different spatial scales of each picture;

inputting the feature fusion graphs of different spatial scales of each picture into a first scene classification model, and determining a first scene corresponding to each picture;

extracting a second characteristic value of a picture corresponding to the first scene from a preset second database;

normalizing the first characteristic value to obtain a first normalized characteristic, and normalizing the second characteristic value to obtain a second normalized characteristic;

and calculating cosine distances of the first normalized feature and the second normalized feature to obtain a first detection result of each picture.

4. The examination room monitoring method of claim 1, wherein the performing scene detection on each picture in the sequence of pictures to obtain a first detection result comprises:

Extracting a third characteristic value of each picture;

acquiring RGB images and depth images of each picture;

extracting local features of the RGB image and global features of the depth image;

carrying out fusion processing on the local features and the global features to obtain final fusion features;

inputting the final fusion characteristics into a second scene classification model to obtain a second scene corresponding to each picture;

extracting a fourth characteristic value of a picture corresponding to the second scene from a preset third database;

normalizing the third characteristic value to obtain a third normalized characteristic, and normalizing the fourth characteristic value to obtain a fourth normalized characteristic;

and calculating cosine distances of the third normalized feature and the fourth normalized feature to obtain a first detection result of each picture.

5. The examination room monitoring method of claim 1, wherein the performing environmental detection on the examination room to obtain a second detection result comprises:

acquiring the environmental noise information, the environmental brightness information and the environmental network information of the examination room;

inputting the environmental noise information, the environmental brightness information and the environmental network information into a pre-trained hash network model to obtain a hash code;

Calculating the Hamming distance between the Hamming code and the Hamming code of each environment in a preset fourth database to obtain the environment matching degree of the corresponding environment;

and determining the minimum environment matching degree score as a second detection result of the examination room.

6. The examination room monitoring method of claim 1, wherein calculating the normalization score of the examination room based on the first detection result and the second detection result comprises:

and carrying out weighted summation on the first detection result and the second detection result to obtain the normalization score of the examination room.

7. The examination room monitoring method of claim 1, wherein the performing normalization monitoring on the examination room based on the normalization score of the examination room and the third detection result, and obtaining a monitoring result comprises:

if the normalization score of the examination room does not meet the preset normalization threshold, or illegal objects exist in the third detection result, determining that the monitoring result is that the examination room fails examination room normalization detection.

8. The examination room monitoring method of claim 7, wherein after the determining that the monitoring result is that the examination room fails examination room normative detection, the method further comprises:

If the monitoring result is that the normalization score of the examination room does not meet the preset normalization threshold, a prompt message of the correction examination room is sent;

and if the monitoring result is that the illegal object exists in the third detection result, sending prompt information for removing the illegal object.

9. An electronic device comprising a processor and a memory, wherein the processor is configured to implement the examination room monitoring method of any one of claims 1 to 8 when executing a computer program stored in the memory.

10. A computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the examination room monitoring method of any one of claims 1 to 8.