CN116977916A

CN116977916A - Security scene recognition method, device, equipment and readable storage medium

Info

Publication number: CN116977916A
Application number: CN202310722182.2A
Authority: CN
Inventors: 李耀; 梁秉豪; 唐政; 袁明明; 王凯
Original assignee: Inspur Communication Information System Co Ltd
Current assignee: Inspur Communication Information System Co Ltd
Priority date: 2023-06-16
Filing date: 2023-06-16
Publication date: 2023-10-31

Abstract

The application relates to the field of artificial intelligence security, and provides a security scene recognition method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: screening security scene images in the historical scene images, and training a target detection model based on the security scene images; generating a first scene image through the trained target detection model, wherein the first scene image is used for training a network model added with skeleton point detection; and identifying the acquired target scene image based on the trained network model to obtain a security scene identification result. According to the application, the network model for training the detection of the added bone points is generated through the trained target detection model, so that the deployment and maintenance cost of the network model is reduced, the security scene recognition is performed through the trained network model for adding the detection of the bone points, and the accuracy of the security scene recognition is improved.

Description

Security scene recognition method, device, equipment and readable storage medium

Technical Field

The present application relates to the field of artificial intelligence security, and in particular, to a security scene recognition method, apparatus, device, and readable storage medium.

Background

With the continuous development of deep learning networks and the progress of computer hardware technologies, related applications of artificial intelligence have penetrated into various fields of production and life, particularly in daily security scenes, the mode of gradually becoming the mainstream of monitoring is replaced by visual security technology, places and areas of manual monitoring are limited in time, the cost is high, and computer vision can play a role in monitoring more comprehensively and widely.

In campus security scene, the monitoring is mainly arranged in areas such as playgrounds, school gates and teaching buildings, the existing security scene recognition is mainly scene recognition carried out by a cloud or a local server based on the existing image recognition technology, the deployment and maintenance cost is relatively high, and the recognition accuracy is low.

Disclosure of Invention

The application provides a security scene recognition method, a security scene recognition device, security scene recognition equipment and a readable storage medium, which are used for solving the technical problems of relatively high deployment and maintenance cost and low recognition accuracy in the existing security scene recognition scheme.

The application provides a security scene recognition method, which comprises the following steps:

screening security scene images in the historical scene images, and training a target detection model based on the security scene images;

generating a first scene image through the trained target detection model, wherein the first scene image is used for training a network model added with skeleton point detection;

and identifying the acquired target scene image based on the trained network model to obtain a security scene identification result.

According to the security scene recognition method provided by the application, the training target detection model based on the security scene image comprises the following steps:

labeling the security scene image, and inputting the security scene image labeled with the classification label into a target detection model;

according to the loss value output by the target detection model, adjusting the parameters of the target detection model;

and determining that the training of the target detection model is completed under the condition that the parameter adjustment times of the target detection model reach a first preset threshold value or the loss value is smaller than a second preset threshold value.

According to the security scene recognition method provided by the application, the generating of the first scene image through the trained target detection model comprises the following steps:

inputting the acquired video stream into the trained target detection model, and intercepting and detecting the video stream through the trained target detection model to obtain a first scene image.

According to the security scene recognition method provided by the application, the network model added with skeleton point detection based on the first scene image training comprises the following steps:

inputting a first scene image with a scene tag into a network model with skeleton point detection, and determining the gesture characteristics of the first scene image through the skeleton point detection;

determining a scene probability result output by the network model according to the gesture features;

based on the scene probability result and the scene tag, adjusting parameters of the network model;

and determining whether the network model is trained according to the adjustment result of the parameters of the network model.

According to the security scene recognition method provided by the application, the recognition of the collected target scene image based on the trained network model, and the obtaining of the security scene recognition result comprises the following steps:

inputting the acquired target scene image into the trained network model, and extracting the gesture information of the target scene image through the trained network model;

determining whether the target scene image belongs to a target security scene or not based on the gesture information;

and under the condition that the target scene image belongs to the target security scene, determining the target security scene as a security scene identification result of the target scene image.

According to the security scene recognition method provided by the application, the acquired target scene image is recognized based on the trained network model, and the security scene recognition result is obtained by the method comprising the following steps:

and generating a quantized network model according to an application programming interface of the framework matched with the edge computing end.

According to the security scene recognition method provided by the application, the method comprises the following steps of generating a quantized network model according to an application programming interface of a framework matched with an edge computing end:

and adjusting parameters of the quantized network model in the weight file to enable the quantized network model to run at the edge computing end.

The application also provides a security scene recognition device, which comprises:

the target detection model training module is used for screening security scene images in the historical scene images and training a target detection model based on the security scene images;

the first scene image generation module is used for generating a first scene image through the trained target detection model, and the first scene image is used for training a network model added into skeleton point detection;

and the security scene recognition module is used for recognizing the acquired target scene image based on the trained network model to obtain a security scene recognition result.

The application also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the security scene recognition method according to any one of the above when executing the program.

The application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a security scene recognition method as described in any of the above.

According to the security scene recognition method, the security scene recognition device, the security scene recognition equipment and the readable storage medium, security scene images in historical scene images are screened, the security scene images are used for training a target detection model, then a first scene image is generated through the trained target detection model, the first scene image is used for training a network model added with a skeleton point detection function, finally the acquired target scene images are recognized based on the trained network model to obtain a security scene recognition result of the target scene images, the trained target detection model is used for generating the network model used for training the added skeleton point detection, so that deployment and maintenance cost of the network model is reduced, security scene recognition is performed through the trained network model added with the skeleton point detection, and accuracy of security scene recognition is improved.

Drawings

In order to more clearly illustrate the application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a security scene recognition method provided by the application;

FIG. 2 is a second flow chart of the security scene recognition method according to the present application;

fig. 3 is a schematic structural diagram of a security scene recognition device provided by the application;

fig. 4 is a schematic structural diagram of an electronic device provided by the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, the present application provides a security scene recognition method, which includes:

step 100, screening security scene images in historical scene images, and training a target detection model based on the security scene images;

specifically, by means of manual or automatic computer recognition, an image containing a specific scene is screened from the scene images collected in the history, namely, the security scene image in the embodiment, and taking security scene recognition in a campus as an example, the image containing the specific scene can be an image containing scenes such as personnel falling, personnel pursuing and alarming, personnel frame beating and the like. Screening security scene images containing specific scenes, and labeling the screened security scene images based on the scene categories (personnel fall, personnel chase alarm, personnel frame alarm and the like) to obtain security scene images with specific scene labels.

And inputting the security scene image with the specific scene label into the target detection model, training the target detection model, adjusting related parameters of the target detection model according to the loss value output by the target detection model in the training process, and continuously optimizing the loss value output by the target detection model through parameter adjustment to minimize the loss value output by the target detection model so as to complete the training of the target detection model.

Step 200, generating a first scene image through a trained target detection model, wherein the first scene image is used for training a network model added into skeleton point detection;

specifically, the trained object detection model is used for generating an image including a specific scene, that is, the first scene image in this embodiment, it can be understood that the above method for screening security scene images from historical scene images has high time cost and limited acquisition range, and can only screen from the historical scene images. The method has the advantages that the images containing the specific scenes are generated through the trained target detection model, so that the time cost for acquiring the security scene images is reduced, the problem of limited acquisition range is solved, the first scene image is generated through the trained target detection model, and the primary screening of a required scene image data set is facilitated.

And modifying based on the deep learning target detection network, adding skeleton point detection into a main network of the deep learning target detection network, wherein a first scene image generated by the target detection model is used for training a network model added with the skeleton point detection, and the network model added with the skeleton point detection is used for capturing a part needing to use gesture information in a security scene (campus).

And 300, identifying the acquired target scene image based on the trained network model to obtain a security scene identification result.

Specifically, training a network model with skeleton point detection, training a target detection model of a required scene, detecting the gesture of a pedestrian in a scene image by using skeleton points in a real detection scene, primarily judging the gesture such as a strike, a fall, a chase strike and the like, screening out a scene image with a certain tendency, and further carrying out target detection to determine whether the acquired target scene image has the above actions, namely obtaining a security scene recognition result of the target scene image.

According to the embodiment, security scene images in historical scene images are screened, a target detection model is trained through the security scene images, then a first scene image is generated through the trained target detection model, the first scene image is used for training a network model added with a skeleton point detection function, finally the acquired target scene images are identified based on the trained network model to obtain a security scene identification result of the target scene images, the trained target detection model is used for generating the network model used for training the detection of the added skeleton points, so that the deployment and maintenance cost of the network model is reduced, security scene identification is carried out through the trained network model added with the detection of the skeleton points, and the accuracy of security scene identification is improved.

In one embodiment, the security scene recognition method provided by the embodiment of the application further comprises the following steps:

step 110, labeling the security scene image, and inputting the security scene image labeled with the classification label into a target detection model;

step 120, according to the loss value output by the target detection model, adjusting the parameters of the target detection model;

and step 130, determining that the training of the target detection model is completed when the parameter adjustment times of the target detection model reach a first preset threshold value or the loss value is smaller than a second preset threshold value.

Specifically, before training a target detection model, labeling security scene images screened from historical scene images to obtain a set of security scene images with classification labels, wherein the classification labels comprise the scene categories, personnel fall, personnel chase alarm, personnel frame strike and the like. Then, the security scene image marked with the classification label is input into a target detection model, and the target detection model calculates the prediction probability of the security scene image belonging to each classification, for example, the prediction probability corresponding to the fourth classification represents: under the condition that the weight is known, the probability that the security scene image belongs to the fourth class is determined based on the prediction probability that the security scene image calculated by the target detection model belongs to each class, and the loss value output by the target detection model is determined. And on the basis of the loss value output by the target detection model, the parameters (such as the known weight) of the target detection model are adjusted in a direction that the prediction probability value of the accurate classification corresponding to the security scene image is larger.

The parameters of the target detection model further comprise iteration times, namely parameter adjustment times, and when the parameter adjustment times of the target detection model reach a first preset threshold value or the loss value is smaller than a second preset threshold value, the target detection model is determined to be trained.

According to the embodiment, the target detection model is trained so as to facilitate the generation of the subsequent images containing the specific scene, the time cost for acquiring the security scene images is reduced, and the problem that the acquisition range of the images of the specific scene is limited is solved.

step 210, inputting the collected video stream into the trained target detection model, and intercepting and detecting the video stream through the trained target detection model to obtain a first scene image.

Specifically, firstly, acquiring data of campus security scenes by adopting a high-frame rate and high-image quality camera, and selecting data of scenes such as personnel falling, personnel frame beating, personnel chasing alarm and the like from the historical scene images; labeling the selected scene data in a manual labeling or automatic labeling mode of a computer, primarily training a target detection model of a simple scene, inputting the acquired video stream into the trained target detection model, accessing the video stream to primarily screen the scene data, and reserving the scene data. In the process of accessing the video stream, a large number of false positive pictures can be captured because of the problem of too small quantity of model training data, and the false positive pictures captured in the process can provide sufficient negative samples for the following formal training so as to improve the generalization capability of the model.

According to the embodiment, scene data is added through the trained target detection model, so that the time cost for acquiring the security scene image is reduced, and the problem that the acquisition range of the specific scene image is limited is solved.

step 220, inputting a first scene image with a scene tag into a network model with skeleton point detection, and determining the gesture characteristics of the first scene image through the skeleton point detection;

step 230, determining a scene probability result output by the network model according to the gesture features;

step 240, adjusting parameters of the network model based on the scene probability result and the scene tag;

step 250, determining whether the training of the network model is completed according to the adjustment result of the parameters of the network model.

Specifically, the training process of the network model with skeleton point detection is similar to that of the target detection model, a first scene image with scene labels is input into the network model with skeleton point detection, the gesture features of the first scene image are extracted through skeleton point detection, the scene probability result output by the network model, namely the probability of a certain scene, is determined based on the extracted gesture features, and then parameters of the network model are adjusted based on the scene probability result output by the network model and the real scene labels. And finally, determining whether the network model is trained according to the adjustment result of the parameters of the network model.

According to the embodiment, the accuracy of security scene recognition is improved through training of the network model with skeleton point detection.

Referring to fig. 2, in an embodiment, the security scene recognition method provided by the embodiment of the present application may further include:

step 310, inputting the acquired target scene image into the trained network model, and extracting the gesture information of the target scene image through the trained network model;

step 320, determining whether the target scene image belongs to a target security scene based on the gesture information;

step 330, determining the target security scene as a security scene recognition result of the target scene image under the condition that the target scene image belongs to the target security scene.

Specifically, a skeleton point detection part is added into a main network based on the modification of a deep learning target detection network, and the skeleton point detection part is mainly used for capturing a part needing to use gesture information in a specific security scene (such as a campus); compared with the traditional method, for example, the frame-by-frame image or the optical flow is directly analyzed, and the mode of skeleton gesture is more matched with the human body action and can reflect the main characteristics; after the trained network model of the skeleton points is obtained, training of a target detection model of a scene is carried out at the same time, the gesture of a pedestrian in an image is detected by the skeleton points in a real detection scene, the gestures such as frame beating, falling, chasing and alarming are initially judged, images with certain tendencies are screened out, next target detection is carried out to determine whether the behaviors exist, namely whether a target scene image belongs to the target security scene or not is determined based on gesture information, and the security scene recognition result that the target security scene is the target scene image is determined under the condition that the target scene image belongs to the target security scene.

According to the embodiment, the security scene is identified through the network model with skeleton point detection, and the accuracy of security scene identification is improved.

step 400, generating a quantized network model according to an application programming interface of a framework matched with an edge computing end;

and 500, adjusting parameters of the quantized network model in the weight file to enable the quantized network model to run at the edge computing end.

Specifically, because the recognition scene is to be placed at the edge computing end side, generating a quantization model according to an API (Application Programming Interface, application program interface) interface of the framework matched with the edge computing module, and using the related computing structure of the modified operator involved in the first code writing step; meanwhile, the frame extraction decoding speed of the video stream is increased by decoding through a GPU (graphics processing unit, a graphic processor), so that the number of recognized frames and the number of video paths supported at maximum are increased; carrying out the quantization compression of an int8 bit or float16 bit model according to the requirement of an edge calculation box, wherein int is a data type and is used for defining an identifier of an integer type variable, float stores single-precision floating point numbers or double-precision floating point numbers, converts weight parameters of float32 data bit width in a weight file into weight parameters of the int8 data bit width, and if the calculation deviation caused by insufficient precision is generated by generating the int8 bit model, certain correction is needed, and a simple network input part training set picture is needed to be used for fine adjustment; after the int8 data bit width weight model is generated, relevant coding and decoding codes and identification codes are written according to the identification interface of the framework of the edge calculation module, identification is carried out, and performance is tested.

According to the method and the device, the scene recognition process with high speed and precision at the edge end side is guaranteed through the model optimization adaptation process of the edge module.

The security scene recognition device provided by the application is described below, and the security scene recognition device described below and the security scene recognition method described above can be referred to correspondingly.

Referring to fig. 3, the present application further provides a security scene recognition device, including:

the target detection model training module 301 is configured to screen security scene images in the historical scene images, and train a target detection model based on the security scene images;

a first scene image generation module 302, configured to generate a first scene image through a trained target detection model, where the first scene image is used to train a network model that joins in skeletal point detection;

the security scene recognition module 303 is configured to recognize the collected target scene image based on the trained network model, and obtain a security scene recognition result.

Optionally, the target detection model training module includes:

the image labeling unit is used for labeling the security scene images and inputting the security scene images labeled with the classification labels into the target detection model;

the first parameter adjusting unit is used for adjusting the parameters of the target detection model according to the loss value output by the target detection model;

the target detection model training unit is used for determining that the target detection model training is completed when the parameter adjustment times of the target detection model reach a first preset threshold value or the loss value is smaller than a second preset threshold value.

Optionally, the first scene image generation module includes:

the first scene image generation unit is used for inputting the acquired video stream into the trained target detection model, and intercepting and detecting the video stream through the trained target detection model to obtain a first scene image.

Optionally, the first scene image generation module further includes:

the gesture feature determining unit is used for inputting a first scene image with a scene tag into a network model with skeleton point detection, and determining gesture features of the first scene image through the skeleton point detection;

the scene probability result determining unit is used for determining a scene probability result output by the network model according to the gesture characteristics;

the second parameter adjusting unit is used for adjusting parameters of the network model based on the scene probability result and the scene label;

and the network model training unit is used for determining whether the network model is trained according to the adjustment result of the parameters of the network model.

Optionally, the security scene recognition module includes:

the gesture information extraction unit is used for inputting the acquired target scene image into the trained network model, and extracting gesture information of the target scene image through the trained network model;

the target security scene determining unit is used for determining whether the target scene image belongs to a target security scene or not based on the gesture information;

and the security scene identification unit is used for determining the target security scene as a security scene identification result of the target scene image under the condition that the target scene image belongs to the target security scene.

Optionally, the security scene recognition device further includes:

and the quantized network model generation module is used for generating a quantized network model according to an application programming interface of the framework matched with the edge computing end.

Optionally, the security scene recognition device further includes:

and the weight file parameter adjustment module is used for adjusting parameters of the quantized network model in the weight file so that the quantized network model runs at the edge computing end.

Fig. 4 illustrates a physical schematic diagram of an electronic device, as shown in fig. 4, which may include: processor 410, communication interface (Communications Interface) 420, memory 430 and communication bus 440, wherein processor 410, communication interface 420 and memory 430 communicate with each other via communication bus 440. The processor 410 may call logic instructions in the memory 430 to perform the security scene recognition method.

Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In yet another aspect, the present application further provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the security scene recognition method provided by the above methods.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. The security scene recognition method is characterized by comprising the following steps of:

2. The security scene recognition method of claim 1, wherein the training a target detection model based on the security scene image comprises:

3. The security scene recognition method of claim 1, wherein the generating a first scene image from the trained object detection model comprises:

4. The security scene recognition method of claim 1, wherein the training the network model incorporating skeletal point detection based on the first scene image comprises:

5. The security scene recognition method according to claim 1, wherein the step of recognizing the collected target scene image based on the trained network model to obtain a security scene recognition result comprises:

6. The security scene recognition method according to claim 1, wherein the step of recognizing the collected target scene image based on the trained network model, after obtaining the security scene recognition result, comprises the steps of:

7. The security scene recognition method according to claim 6, wherein the generating the quantized network model according to the application programming interface of the framework matched with the edge computing end comprises:

8. A security scene recognition device, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the security scene recognition method of any of claims 1 to 7 when the program is executed by the processor.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the security scene recognition method according to any of claims 1 to 7.