CN114140882A

CN114140882A - Image processing method and processing apparatus

Info

Publication number: CN114140882A
Application number: CN202111489322.3A
Authority: CN
Inventors: 王晔; 杨凯; 许成楠; 付钰
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-03-04

Abstract

The application provides an image processing method and processing equipment. The method comprises the following steps: acquiring a target image comprising a target object, electronic equipment and a monitoring object; extracting skeletal key point information of the target object from the target image, and obtaining a first predicted value by using a behavior recognition model, wherein the first predicted value is used for representing the predicted probability that the target object has shooting behavior; according to the target image, determining the spatial relationship of at least two of the target object, the electronic equipment and the monitored object to obtain a prediction value set, wherein each prediction value in the prediction value set is used for representing the predicted probability that the target object has shooting behavior for shooting the monitored object; and determining whether the shooting behavior exists in the target object according to the first predicted value and the predicted value set. The accuracy of recognizing the photographing behavior can be improved.

Description

Image processing method and processing apparatus

Technical Field

The present application relates to information processing technologies, and in particular, to an image processing method and an image processing apparatus.

Background

In order to prevent enterprise data, products, customer information and the like from being leaked due to shooting records, people are generally prohibited from entering relevant places by carrying electronic equipment with a shooting function.

With the development of security protection intellectualization, the shooting behavior can be identified based on the monitoring data acquired by the monitoring equipment through an intelligent algorithm. Whether shooting behaviors exist is mainly judged by recognizing human body postures, the probability of false detection in practice is high, and accuracy of recognizing the shooting behaviors is low.

Disclosure of Invention

The application provides an image processing method and processing equipment, which are used for solving the problem of low accuracy of shooting behavior identification.

In a first aspect, an image processing method is provided, including:

acquiring a target image, wherein the target image comprises a target object, electronic equipment and a monitoring object;

extracting the bone key point information of the target object from the target image;

obtaining a first predicted value by utilizing a behavior recognition model according to the skeletal key point information, wherein the first predicted value is used for representing the predicted probability that the target object has shooting behaviors;

determining a spatial relationship of at least two of the target object, the electronic device and the monitoring object according to the target image;

obtaining a predicted value set according to the spatial relationship, wherein the predicted value set comprises at least one predicted value, and each predicted value is used for representing the predicted probability that the target object has shooting behavior for shooting the monitored object;

and determining whether the shooting behavior exists in the target object according to the first predicted value and the predicted value set.

According to the scheme, the first predicted value of the shooting behavior of the target object can be obtained by obtaining the bone key point information of the target object according to the target image, and the predicted value of the shooting behavior determined based on the posture of the target object is obtained. And obtaining a prediction set of shooting behaviors of the target object for shooting the monitoring object through the spatial relationship of at least two of the target object, the electronic equipment and the monitoring object. The processing equipment judges whether the target object has shooting behaviors for shooting the monitoring target or not based on multiple dimensions such as the posture of the target object and the spatial relation of the key target, so that the accuracy of shooting behavior recognition can be improved, and the probability of false recognition is reduced. The purpose of timely and accurately prompting and alarming to reduce the probability of leakage of the protected information is achieved.

With reference to the first aspect, in certain embodiments of the first aspect, if it is determined that the shooting behavior exists for the target object, a prompt message is output, where the prompt message is used to prompt that the shooting behavior exists for the target object.

With reference to the first aspect, in certain embodiments of the first aspect, the set of predicted values includes a second predicted value, and the determining a spatial relationship of at least two of the target object, the electronic device, and the monitored object according to the target image includes: determining a first included angle between a first plane where the electronic equipment is located and a second plane where the monitoring object is located according to the target image; and obtaining a set of predicted values according to the spatial relationship, including: and obtaining a second predicted value according to the first included angle and a first preset value, wherein the first included angle is an included angle between a first plane where the electronic equipment is located and a second plane where the monitored object is located.

According to the above scheme, the spatial relationship determined by the processing device may include an angular relationship between a plane where the electronic device is located and a plane where the monitoring object is located, and the processing device obtains a predicted value of the electronic device shooting the monitoring object based on the angular relationship and an empirical value (i.e., a first preset value). Whether shooting behaviors exist or not is judged based on multi-dimensional detection, and therefore accuracy of shooting behavior identification is improved.

With reference to the first aspect, in certain implementations of the first aspect, the set of predicted values further includes a third predicted value, and the determining, from the target image, a spatial relationship of at least two of the target object, the electronic device, and the monitored object includes: determining a second included angle between a first connecting line and a second connecting line according to the target image, wherein the first connecting line is a connecting line between a target point of the target object and the electronic equipment, the target point is a shoulder central point or a head point of the target object, and the second connecting line is a connecting line between the electronic equipment and the monitored object; and obtaining a set of predicted values according to the spatial relationship, including: obtaining a third predicted value according to the second included angle and a second preset value, wherein the second included angle is a second included angle between the first connecting line and the second connecting line,

according to the above scheme, the spatial relationship determined by the processing device may include a spatial relationship among the target object, the electronic device, and the monitoring device, that is, an angular relationship between the first connection line and the second connection line, and the processing device obtains a predicted value of the monitoring object shot by the electronic device based on the angular relationship and an empirical value (that is, a second preset value). Whether shooting behaviors exist or not is judged based on multi-dimensional detection, and therefore accuracy of shooting behavior identification is improved.

With reference to the first aspect, in certain implementations of the first aspect, the set of predicted values further includes a fourth predicted value, and the determining, from the target image, a spatial relationship of at least two of the target object, the electronic device, and the monitored object includes: determining the distance between the hand of the target object and the electronic equipment according to the target image; and obtaining a set of predicted values according to the spatial relationship, including: and obtaining the fourth predicted value according to the distance and the third preset value.

According to the scheme, the spatial relationship determined by the processing device can comprise the distance relationship between the hand of the target object and the electronic device, so that the situation that the electronic device is placed on a desktop and the like can be eliminated if the target object holds the electronic device. The processing device obtains a predicted value of the target handheld electronic device based on the distance relationship and the distance threshold value (i.e., the third preset value). Whether shooting behaviors exist or not is judged based on multi-dimensional detection, and therefore accuracy of shooting behavior identification is improved.

With reference to the first aspect, in some implementations of the first aspect, the determining a spatial relationship of at least two of the target object, the electronic device, and the monitored object according to the target image includes: determining the distance between the hand of the target object and the electronic equipment according to the target image; and, the method further comprises: and determining whether the target object holds the electronic equipment or not according to the fact that the distance between the hand of the target object and the electronic equipment is smaller than a third preset value.

According to the above scheme, the processing device may first determine a possibility that the target object holds the electronic device, and by comparing a distance between the hand of the target object and the electronic device with a threshold value (i.e., a third preset value), in a case where the possibility is high, that is, the distance is smaller than the threshold value, continue to determine whether the target object has a behavior of shooting the monitored object, for example, obtain the first angle to determine the second predicted value in a case where the distance is smaller than the threshold value, and/or obtain the second angle to determine the third predicted value. The power consumption of the processing device can be reduced.

With reference to the first aspect, in certain embodiments of the first aspect, the determining whether the shooting behavior of the target object exists according to the first predicted value and the set of predicted values includes: calculating to obtain a fifth predicted value according to the first predicted value, the weight coefficient of the first predicted value, each predicted value in the predicted value set and the weight coefficient of each predicted value; if the fifth predicted value is greater than or equal to a fourth preset value, determining that the shooting behavior exists in the target object; and if the fifth predicted value is smaller than the fourth preset value, determining that the shooting behavior does not exist in the target object.

According to the scheme, the weight coefficients are configured for the predicted values obtained by different reference dimensions, the final predicted value is obtained after the weighted average is calculated, and the accuracy of shooting behavior identification can be improved.

With reference to the first aspect, in certain embodiments of the first aspect, the acquiring the target image includes: receiving the target image from the acquisition device; or shooting to obtain the target image.

With reference to the first aspect, in certain embodiments of the first aspect, the method further comprises: and identifying the target image, and determining that the target image comprises the target object and the electronic equipment.

In a second aspect, a processing apparatus is provided, comprising:

the system comprises an acquisition module, a monitoring module and a display module, wherein the acquisition module is used for acquiring a target image, and the target image comprises a target object, electronic equipment and a monitoring object;

the processing module is used for extracting the bone key point information of the target object from the target image;

the processing module is further used for obtaining a first predicted value by utilizing a behavior recognition model according to the skeleton key point information, wherein the first predicted value is used for representing the predicted probability that the target object has shooting behavior;

the processing module is further used for determining the spatial relationship of at least two of the target object, the electronic equipment and the monitoring object according to the target image;

the processing module is further configured to obtain a prediction value set according to the spatial relationship, where the prediction value set includes at least one prediction value, and each prediction value is used to represent a predicted probability that the target object has a shooting behavior for shooting the monitored object;

the processing module is further used for determining whether the shooting behavior exists in the target object according to the first predicted value and the predicted value set.

With reference to the second aspect, in some embodiments of the second aspect, the set of predicted values includes a second predicted value, and the processing module is specifically configured to:

determining a first included angle between a first plane where the electronic equipment is located and a second plane where the monitoring object is located according to the target image;

and obtaining a second predicted value according to the first included angle and a first preset value, wherein the first included angle is an included angle between a first plane where the electronic equipment is located and a second plane where the monitored object is located.

With reference to the second aspect, in some embodiments of the second aspect, the transceiver module is further configured to output a prompt message for prompting the target object of the shooting behavior if it is determined that the target object has the shooting behavior.

With reference to the second aspect, in some embodiments of the second aspect, the set of predicted values further includes a third predicted value, and the processing module is specifically configured to:

determining a second included angle between a first connecting line and a second connecting line according to the target image, wherein the first connecting line is a connecting line between a target point of the target object and the electronic equipment, the target point is a shoulder central point or a head point of the target object, and the second connecting line is a connecting line between the electronic equipment and the monitored object;

obtaining a third predicted value according to the second included angle and a second preset value, wherein the second included angle is a second included angle between the first connecting line and the second connecting line,

with reference to the second aspect, in some embodiments of the second aspect, the set of predicted values further includes a fourth predicted value, and the processing module is specifically configured to:

determining the distance between the hand of the target object and the electronic equipment according to the target image;

and obtaining the fourth predicted value according to the distance and the third preset value.

With reference to the second aspect, in some embodiments of the second aspect, the processing module is specifically configured to determine, according to the target image, a distance between the hand of the target object and the electronic device;

and the processing module is also used for determining whether the target object holds the electronic equipment according to the fact that the distance between the hand of the target object and the electronic equipment is smaller than a third preset value.

With reference to the second aspect, in some embodiments of the second aspect, the processing module is specifically configured to:

calculating to obtain a fifth predicted value according to the first predicted value, the weight coefficient of the first predicted value, each predicted value in the predicted value set and the weight coefficient of each predicted value;

if the fifth predicted value is greater than or equal to a fourth preset value, determining that the shooting behavior exists in the target object;

and if the fifth predicted value is smaller than the fourth preset value, determining that the shooting behavior does not exist in the target object.

With reference to the second aspect, in some embodiments of the second aspect, the acquisition module is specifically configured to receive the target image from an acquisition device; or shooting to obtain the target image.

With reference to the second aspect, in some embodiments of the second aspect, the processing module is further configured to identify the target image, and determine that the target image includes the target object and the electronic device.

In a third aspect, a processing apparatus is provided, including: the processing device comprises a logic circuit and a communication interface, wherein the communication interface is used for acquiring data to be processed and/or outputting processed data, and the logic circuit is used for obtaining processed data from the data to be processed, so that the processing device executes the method in any one of the possible implementation manners of the first aspect and the first aspect.

In one possible design, the communication interface includes an input interface and an output interface.

In a fourth aspect, a processing apparatus is provided that includes a processor and a memory. The processor is configured to read instructions stored in the memory and may receive signals and transmit signals via the transceiver to perform the method of the first aspect and any possible implementation of the first aspect.

In a fifth aspect, there is provided a computer program product comprising: a computer program (also referred to as code, or instructions), which when executed, causes a computer to perform the method of any of the possible implementations of the first aspect and the first aspect described above.

A sixth aspect provides a computer-readable medium storing a computer program (which may also be referred to as code or instructions) which, when run on a computer, causes the computer to perform the method of any one of the possible implementations of the first aspect and the above-mentioned first aspect.

According to the image processing method, the target image is analyzed, the multi-dimensional information of the target object, the electronic equipment and the monitoring object is obtained, the predicted value of the shooting behavior corresponding to the multi-dimensional information is obtained, the plurality of predicted values are fused, whether the shooting behavior of shooting the monitoring object exists in the target object is determined, and the effect of improving the accuracy of recognizing the shooting behavior can be achieved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

FIG. 1 is a schematic diagram of an application scenario suitable for the image processing method provided herein;

FIG. 2 is a schematic flow chart of an image processing method provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a region of interest provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of two-dimensional frame position information of a key target provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a spatial relationship A provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a spatial relationship B provided by an embodiment of the present application;

fig. 7 is a schematic diagram of a shooting behavior identification module of a processing device according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a processing apparatus according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a processing device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

Fig. 1 is a schematic diagram of an application scenario 100 of the image processing method provided in the present application.

The application scenario 100 is a scenario for identifying whether a display screen is shot and recorded to prompt and alarm enterprise information and personal information which may be leaked. For example, the application scenario 100 may be a bank business hall, and whether a situation that a bank worker shoots a workstation display screen to record enterprise information and customer information exists is determined by the image processing method provided by the present application. It should be understood that the present application is not limited thereto, and the image processing method provided by the present application may also be applied to identifying whether people shooting behaviors exist in scenes such as exhibition halls, factories, and the like.

The application scene includes at least one monitoring object and at least one image capturing device, as shown in fig. 1, the monitoring object may be a display screen 101, and the image capturing device may be a monitoring device 102 or a monitoring device 103. The image acquisition equipment can acquire the monitoring object and images around the monitoring object and is used for identifying whether shooting behaviors of people on the detection object exist or not. The device of the image processing method provided by the present application may be executed by a processing device, where the processing device may be an image capturing device, or may be the device 106 shown in fig. 1, the image capturing device sends captured image data to the device 106, and after the image data is obtained, the device 106 executes the image processing method provided by the present application, where the device 106 may be an edge device in a communication network, or may be a server. For example, when the person 104 uses the electronic device 105 to photograph the monitored object as shown in fig. 1, the processing device may identify the photographing behavior of the person 104 by using the image processing method provided by the present application, so as to take measures such as stopping, warning, or reporting in time. Alternatively, the image capturing device may include an Internet Protocol Camera (IPC) and a Network Video Recorder (NVR).

To the problem that the accuracy of the shooting behavior is low, the method and the device can provide the characteristics of multiple spatial visual angles and multiple subjects which can be generated by fusing the shooting behavior so as to achieve the effect of improving the accuracy of the shooting behavior.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Example one

Fig. 2 is a schematic flowchart of an image processing method provided in an embodiment of the present application. The image processing method may be performed by a processing device or an apparatus configured to the processing device. The method is described below as being performed by a processing device as an example.

S201, the processing device acquires a target image, wherein the target image comprises a target object, the electronic device and a monitoring object.

In one embodiment, the processing device may be an image capturing device, such as a monitoring device, and the processing device may obtain the target image by shooting.

In another embodiment, the processing device is a network edge device, the image capturing device may capture an image of the surroundings of the monitored object, and the processing device receives the target image from the image capturing device.

For example, the processing device is a network edge device, the image capturing device is a monitoring device, the monitoring device sends captured monitoring video stream data to the processing device, and the processing device decodes the video stream data to obtain a sequence of image frames. For example, the video stream data is encoded data encoded based on a Real Time Streaming Protocol (RTSP), and the processing device decodes the video stream based on the RTSP to obtain an image frame sequence. The target image is one or more frames of images in the image frame sequence.

Optionally, the processing device may sample the decoded image frame sequence based on a preconfigured frame sampling rate, and reduce the frame rate to meet the processing capability of the processing device.

Optionally, after the processing device acquires the target image, the processing device identifies the target image, and determines that the target image includes the target object and the electronic device. Optionally, the processing device further identifies the monitored object in the target image.

For example, after acquiring a target image, the processing device identifies whether the target image includes a target object (e.g., the target object is a person) and an electronic device, and if the target image includes the target object and the electronic device, the processing device continues to process the target image to determine whether a behavior of shooting the monitoring object exists; if the target image does not include the target object and/or the electronic device, the processing device does not perform an operation of recognizing whether the target object has the shooting monitoring object.

Optionally, the processing device acquires a region of interest (ROI) in the target image, and identifies that the target object and the electronic device are included in the region of interest.

The processing device may obtain the ROI in the target image according to the ROI configuration parameters, for example, as shown in fig. 3, after the processing device obtains the target image 301, the processing device determines the ROI 302 around the monitored object 303 according to the ROI configuration parameters, and the processing device identifies whether the target object and the electronic device exist in the ROI 302, but does not identify whether the target object and the electronic device exist in the region outside the ROI 302 in the target image 301. The ROI can be an area near a monitored object, and the arrangement of the ROI can avoid the situation of false recognition caused by a target object which is far away from the monitored object and cannot shoot the monitored object and electronic equipment.

Alternatively, the processing device may detect the target object, the electronic device, and the monitoring object in the target image by using the key target detection model according to the target image.

The key target detection model is an artificial intelligence model obtained by learning images of a large number of target objects, electronic equipment and monitoring objects. The key target detection model is configured in the processing device, after the processing device acquires a target image, the target image is input into the key target detection model, and the key target detection model outputs whether a key target exists or not, namely one or more of a target object, the electronic device and a monitoring object.

For example, the monitoring object may be a display screen on an enterprise workstation, the target object may be a person, and the electronic device may be an electronic device having a photographing function. In the early development stage, the display screen images of various types and angles can be adopted for carrying out model training on the key target detection model, so that the key target detection model can recognize the display screen in the target image through reasoning. And performing model training on the key target by adopting images of electronic equipment (such as a mobile phone, a camera and the like) with shooting functions in various appearances and angles, so that the key target detection model can identify the electronic equipment in the target image through inference. In addition, the key target detection model may be model trained using images of multiple poses of the person (i.e., the target object), so that the key target detection model may recognize the target object in the target image through inference.

The key target detection model can output the type information of the key target when detecting the key target, and the type information is used for indicating that the key target is a target object, an electronic device or a detection object. Optionally, the key target detection model may also output location information of the detected key target in the presence of the key target (e.g., one or more of the target object, the electronic device, and the monitoring object).

For example, inputting a target image as shown in fig. 4 into the key target detection model, the key target detection model detecting the monitoring object, the target object and the electronic device and outputting type information of the monitoring object, the target object and the electronic device and accordingly two-dimensional frame position information, such as the key target detection model outputting position information of a two-dimensional frame 401 in which the monitoring object is located, position information of a two-dimensional frame 402 in which the target object is located and position information of a two-dimensional frame 403 in which the electronic device is located as shown in the figure, allows the processing device to determine the position of the key target in the target image. Specifically, the position information of the two-dimensional frame may be coordinate information of four corner vertices of the two-dimensional frame, but the application is not limited thereto.

It should be noted that the target image may include one or more target objects, one or more electronic devices, and one or more monitoring objects, and the processing device may group the target objects, the electronic devices, and the monitoring objects according to the distance between the target objects, the electronic devices, and the monitoring objects, where each group includes one target object, one electronic device, and the monitoring object. And the processing equipment detects whether the target object in each group has shooting behavior for shooting the monitored object in the group according to the target image. The embodiment of the present application specifically takes a group of target objects, electronic devices, and monitoring objects as an example for description.

Optionally, the processing device identifies a hand position of the target object according to the target image, and determines whether the target object holds the electronic device according to a distance between the hand of the target object and the electronic device. Optionally, the processing device may compare the distance between the hand of the target object and the electronic device with a third preset value to determine whether the target object holds the electronic device.

For example, the processing device may identify a hand position of the target object in the two-dimensional frame image data of the target object, calculate a euclidean distance between the hand of the target object and the electronic device, determine that the electronic device is not an electronic device held by the target object if the distance is greater than or equal to the third preset value, and the processing device may not continue to perform the operation of identifying whether the target object has a behavior of photographing the monitoring object using the electronic device. If the distance is smaller than a third preset value, the processing device determines that the target object holds the electronic device, and then the processing device identifies whether shooting behaviors exist in the target object according to the target image.

S202, the processing device extracts skeletal keypoint information of the target object from the target image.

For example, a bone keypoint detection model may be used to identify a bone keypoint of the target object, and obtain bone keypoint information, for example, the processing device may input two-dimensional frame image data of the target object into the bone keypoint detection model (e.g., the bone keypoint detection model may be a HRNet model), and the bone keypoint detection model outputs a coordinate sequence of the bone keypoint. But the application is not limited thereto.

Alternatively, the processing device may identify the hand position of the target object by determining the coordinates of the wrist joint point of the target object according to the bone key point information. The processing device may calculate a euclidean distance between the hand of the target object and the electronic device according to the acquired coordinates of the wrist joint point and the coordinates of the center point of the electronic device, compare the euclidean distance with a third preset value, and determine whether the target object holds the electronic device.

And S203, the processing equipment obtains a first predicted value by using a behavior recognition model according to the bone key point information, wherein the first predicted value is used for representing the predicted probability that the target object has shooting behavior.

The behavior recognition model may recognize one or more behaviors of the target object including a photographing behavior of the target object based on the skeletal keypoint information. After the skeleton key point information of the target object is input by the behavior recognition model, prediction (or reasoning) is carried out and a first predicted value of shooting behavior of the target object is output.

The behavior recognition model is an artificial intelligence model obtained by performing model training through a large amount of skeleton key point information of various human body postures with shooting behaviors. Through model training, the behavior recognition model can deduce whether shooting behaviors exist in the target object or not based on the input bone key point information.

In this application, the predicted value may be a probability value or a probability level of a shooting behavior of the target object, which is predicted by the processing device based on a model or an algorithm, for shooting the monitored object. For example, the probability that the target object has a shooting behavior for shooting the monitoring object may be divided into a plurality of levels, for example, the probability may be represented by positive integer values, and the higher the level is, the higher the probability is, or the lower the level is, the higher the probability is, which is not limited in the present application.

And S204, the processing equipment determines the spatial relationship of at least two of the target object, the electronic equipment and the monitoring object according to the target image.

And S205, the processing equipment obtains a predicted value set according to the spatial relationship, wherein the predicted value set comprises at least one predicted value, and each predicted value is used for representing the predicted probability that the target object has shooting behavior for shooting the monitoring object.

The processing device can determine the spatial relationship of at least two of a target object, the electronic device and a detection object according to a target image, determine the probability that the target object has the shooting behavior for shooting the monitoring object, fuse a first predicted value obtained based on the skeleton key point information and the probability that the target object has the shooting behavior for shooting the monitoring object obtained based on the spatial relationship in S206, determine whether the target object has the shooting behavior for shooting the monitoring object through multi-dimensional information, and improve the accuracy of shooting behavior identification.

In the embodiment of the present application, the determining the spatial relationship of at least two of the target object, the electronic device, and the monitoring object in the target image may include, but is not limited to, one or more of the following spatial relationship a, spatial relationship B, and spatial relationship C.

The spatial relationship a is an angular relationship between a first plane and a second plane, where the first plane is a plane where the electronic device is located, and the second plane is a plane where the monitoring object is located.

The processing device may determine an angle between the first plane and the second plane, i.e., a first angle, from the target image at 204.

For example, as shown in fig. 5, the processing device performs image analysis according to a target image, specifically, may identify a plurality of edges of the electronic device according to two-dimensional frame image data of the electronic device 501, determine a spatial plane 502 (i.e., an example of a first plane) where the electronic device is located based on the plurality of edges and fitting a spatial shape of the electronic device, and determine a spatial plane 504 (i.e., an example of a second plane) where the monitoring object 503 is located in a similar manner, and then calculate an included angle between the spatial plane 502 and the spatial plane 504 to obtain a first included angle.

Alternatively, the processing device may connect a plurality of image capturing devices disposed at different locations, such as the image capturing device 102 and the image capturing device 103 disposed at different locations shown in fig. 1, or may further include image capturing devices at other locations. The target image includes a plurality of images from the plurality of image acquisition devices, and the processing device may determine a spatial location, such as a spatial coordinate and/or a spatial plane, of one or more of the target object, the electronic device, and the monitoring object based on the plurality of images.

In S205, the processing device may obtain a second predicted value according to the first included angle and the first preset value, where the set of predicted values includes the second predicted value.

The processing device may be pre-stored with a first preset value that is an empirical value of an angle between the first plane and the second plane determined based on a plurality of experimental cases of shooting behavior. The processing device may compare the similarity between the first included angle and the first preset value to obtain a second predicted value. The higher the similarity is, the higher the probability that the subject has the shooting behavior of shooting the monitoring subject is, and the lower the similarity is, the lower the probability that the subject has the shooting behavior of shooting the monitoring subject is.

For example, the processing device determines a difference value between the first angle and the first preset value, specifically, the difference value is equal to a maximum value of the first angle and the first preset value minus a minimum value of the first angle and the first preset value. The processing device may determine the second predicted value according to the difference value and a first corresponding relationship, where the first corresponding relationship may be a corresponding relationship between a plurality of difference value intervals and a plurality of predicted values, for example, the first corresponding relationship may be as shown in table 1, the first corresponding relationship includes a plurality of difference value intervals, and each difference value interval corresponds to one predicted value, for example, a difference value interval where the difference value I is greater than or equal to I1 corresponds to the predicted value P1, a difference value interval where I1 is greater than or equal to I2 corresponds to the predicted value P2, and a difference value interval where I2 is greater than or equal to I3 corresponds to the predicted value P3. The processing device may determine, based on a difference value interval to which a difference value between the first included angle and the first preset value belongs, that the second predicted value is a predicted value corresponding to the difference value interval. But the application is not limited thereto.

TABLE 1

And the spatial relationship B is the angular relationship between a first connecting line and a second connecting line, wherein the first connecting line is a connecting line between a target point of the target object and the electronic equipment, and the second connecting line is a connecting line between the electronic equipment and the monitoring object. Alternatively, the target point of the target object may be a shoulder center point or a point of the head of the target object.

The processing device may determine an angle between the first line and the second line (i.e., a second angle) according to the target image in S204.

For example, as shown in fig. 6, the target point is a shoulder central point of the target object, and the processing device may determine a shoulder bone key point of the target object 601 according to the bone key point information, and further determine a shoulder central point. The processing device may obtain a line 603 (i.e., a first line) connecting the center point of the shoulder of the target object and the electronic device 602, such as a line connecting the center point of the electronic device and the center point of the shoulder of the target object based on inference. The processing device may also obtain a connection 605 (i.e., a second connection) between the electronic device 602 and the monitored object 604, such as a connection between a center point of the electronic device 602 and a center point of the monitored object 604. The processing device calculates an angle 606 (i.e., a second angle) between connecting line 603 and connecting line 605.

In S205, the processing device may obtain a third predicted value according to the second included angle and a second preset value, where the set of predicted values includes the third predicted value.

The processing device may be pre-stored with the second preset value, which is an empirical value of an included angle between the first connection line and the second connection line determined based on a plurality of shooting behavior experimental cases. The processing device may compare the similarity between the second included angle and the second preset value to obtain a third predicted value. For brevity, details are not repeated herein for the embodiment of the processing device for comparing the similarity between the first included angle and the first preset value.

Spatial relationship C, distance relationship between the hand of the target object and the electronic device.

The processing device may determine the hand position of the target object from the target image at 204.

For example, the processing device may determine a wrist joint point of the target object according to the bone key point information, and calculate an euclidean distance between the wrist joint point of the target object and a central point of the electronic device. But the application is not limited thereto.

In S205, the processing device may obtain a fourth predicted value according to a distance between the hand of the target object and the electronic device and a third preset value, where the set of predicted values includes the fourth predicted value.

The processing device may be pre-stored with the third preset value, which is an empirical value of the distance between the hand of the target object and the electronic device determined based on a plurality of shooting behavior experiment cases. The processing device may compare the calculated similarity between the distance between the hand of the target object and the electronic device and the second preset value to obtain a fourth predicted value. For brevity, details are not repeated herein for the embodiment of the processing device for comparing the similarity between the first included angle and the first preset value.

In one embodiment, the processing device may determine each of spatial relationships of at least two of the target object, the electronic device, and the monitoring object, and determine a predicted value corresponding to each of the spatial relationships based on a spatial parameter (e.g., an angle or a distance) obtained from each of the spatial relationships.

In another embodiment, the processing device may predict the predicted value corresponding to each spatial relationship based on the target image using the shooting behavior prediction model.

For example, the shooting behavior prediction model is obtained by model training a target image of shooting behaviors of a plurality of postures of the object by using the target. Wherein each gesture corresponds to a set of parameters including one or more of: the system comprises bone key point information of a target object, an included angle between a plane where an electronic device is located and a plane where a monitoring object is located, an included angle between a connecting line of a target object point and the electronic device and a connecting line of the electronic device and the monitoring object, and a distance between a hand of the target object and the electronic device. The shooting behavior prediction model may determine a predicted value corresponding to each spatial relationship based on a relationship between each pose and one or more spatial parameters. The shooting behavior prediction model may output the set of prediction values.

And S206, the processing equipment determines whether the target object has shooting behavior for shooting the monitoring object according to the first predicted value and the predicted value set.

The processing device can fuse predicted values of shooting behaviors obtained based on the multiple spatial relationships to determine whether the target object has the shooting behavior for shooting the monitoring object.

Optionally, the processing device may calculate a fifth predicted value according to the first predicted value, the weight coefficient of the first predicted value, each predicted value in the set of predicted values, and the weight coefficient of each predicted value.

The processing device may be pre-stored with the fourth preset value, where the fourth preset value is a threshold value of a shooting behavior prediction value. If the fifth predicted value is greater than or equal to a fourth preset value, determining that the shooting behavior exists in the target object; and if the fifth predicted value is smaller than the fourth preset value, determining that the shooting behavior does not exist in the target object.

In one embodiment, the weight coefficients of the predicted values may all be 1, and the processing device determines whether the target object has a shooting behavior for shooting the monitoring object based on a mean value of the predicted values.

For example, the prediction value set includes a second prediction value corresponding to the spatial relationship A, and is denoted as P_AThe processing device may calculate a first predicted value (denoted as P)_x) And P_AAveraging to obtain a fifth predicted value, and recording as P_y. Then P is_y＝(P_A+P_x)/2。

As another example, the prediction value set includes a second prediction value (denoted as P) corresponding to the spatial relationship A_A) The third predicted value (denoted as P) corresponding to the spatial relationship B_B) The fourth predicted value (denoted as P) corresponding to the above-mentioned spatial relationship C_C). The processing device may calculate a first predicted value (denoted as P)_x) And the average value of the prediction value sets to obtain a fifth prediction value which is recorded as P_y. Then P is_y＝(P_A+P_B+P_C+P_x)/4。

In another embodiment, the corresponding weight coefficient may be configured according to the importance degree of the predicted value, and the processing device may determine whether the target object has a shooting behavior for shooting the monitoring object based on a weighted average of the predicted values.

For example, if whether the target object is in the shooting posture is a relatively important condition for determining the shooting behavior, the weight coefficient of the first predicted value may be set to 1, for example, the predicted value set includes a second predicted value of the shooting behavior corresponding to the angle relationship between the first plane and the second plane, and referring to the predicted value, the situation that the target object does not have the shooting behavior but simply uses the behavior of the electronic device to be erroneously determined as the shooting behavior may be avoided, and the important coefficient of the second predicted value may be set to be greater than 0.5 and less than or equal to 1. But the application is not limited thereto.

Optionally, the method may further include S207, if it is determined that the target object has a shooting behavior of shooting the monitoring object, outputting a prompt message, where the prompt message is used to prompt the target object that the shooting behavior of shooting the monitoring object exists.

If the processing device determines that the target object has the shooting behavior for shooting the monitoring object by fusing the predicted values obtained by the plurality of dimensions in S206, the processing device may output prompt information to prompt the target object to have the shooting behavior for shooting the monitoring object.

For example, the processing device may send the prompt information to an alarm device disposed near the monitored object, and the alarm device, after receiving the prompt information, sends an alarm prompt tone to prompt that there is a shooting behavior for shooting the monitored object, so as to warn the target object to stop the shooting behavior after hearing the alarm prompt tone.

For another example, the processing device may send the prompt information to the server, so that the server may forward the prompt information to the terminal device of the supervisor, notify the supervisor that there is a shooting behavior for the monitored object, so that the supervisor takes measures, such as refraining from the shooting behavior, and the like.

Optionally, the hint information includes image data, which is data of the target image or, when the target image includes a plurality of images, the image data may be data of one or more of the plurality of images.

For example, the processing device sends the image data to the server, the server forwards the image data to the terminal device of the supervisor, and the terminal device of the supervisor can display the target image, so that the supervisor can determine the target object based on the target image and timely stop the shooting behavior of the target object.

According to the scheme provided by the application, the processing equipment fuses the predicted values obtained by the multiple dimensions and determines whether the target object has shooting behaviors for shooting the monitoring object. The accuracy of the shooting behavior can be improved, and the purpose of accurately protecting information from being leaked is achieved.

Example two

The processing device may include a shooting behavior recognition module as shown in fig. 7, and the image processing method provided by the present application is executed by the behavior recognition module. As shown in fig. 7, the behavior recognition module includes a data access unit, a pre-processing unit, a behavior inference unit, a post-processing unit, and a data reporting unit. The functions of the respective units of the behavior recognition module are explained below.

The data access unit may obtain video stream data and decode the video stream data to obtain a sequence of picture frames.

The preprocessing unit can sample the image frame sequence decoded by the data access module. The pre-processing module may sample the sequence of image frames according to a configuration-based frame sampling rate, and reduce a frame rate of the image frames to meet a processing capability of the processing device. One or more frames of images in the sampled image frames may be used as the target image.

The behavior reasoning unit can realize the functions of multi-target detection, human body tracking, behavior identification and the like based on the acquired target image. Firstly, the behavior inference unit can include the key target detection model in the foregoing to realize multi-target detection, and after the target image is input into the key target detection model, the key target detection model performs inference and outputs the type information and corresponding position information of the detected key target. Determining one or more target objects in the key targets based on the type information, determining the position information of the one or more target objects based on the position information, acquiring image data of corresponding positions of each target object, and performing feature extraction and identification to obtain skeleton key point information. For example, the behavior inference unit may include a bone key point detection model HRNet, and after the data image of the position corresponding to the target object is input into the bone key point detection model, the bone key point information of the target object output by the model is obtained. The behavior reasoning unit utilizes the behavior recognition model to carry out reasoning according to the skeletal key point information of the target object to obtain a first predicted value of the shooting behavior.

The post-processing unit mainly combines the skeletal key point position of the target object and the spatial relationship joint analysis of the electronic equipment and the monitoring target, eliminates the condition that the electronic equipment is not in the hand (such as being placed on a desktop) or the target object holds the electronic equipment by hand but has no shooting action and the like, and reduces the probability of false recognition. The post-processing unit can obtain the prediction value set according to the spatial relationship (such as one or more of the spatial relationship a, the spatial relationship B, and the spatial relationship C) of at least two of the target object, the electronic device, and the monitoring object. And the post-processing unit can also fuse the first predicted value and the predicted value in the predicted value set to determine whether the target object has shooting behavior for shooting the monitored object.

Optionally, the post-processing unit may obtain a key target in the region of interest ROI in the target image to perform spatial relationship analysis, so as to obtain a corresponding predicted value.

For example, the post-processing unit may obtain an ROI around the monitored object in the target image according to the ROI configuration parameter, and analyze a spatial relationship of at least two of the target object, the electronic device, and the monitored object in the ROI to obtain the prediction value set. And the prediction value set and the first prediction value are fused to determine whether the target object has shooting behavior for shooting the monitoring object.

The data reporting unit can generate prompt information under the condition that the post-processing unit determines that the target object has shooting behaviors of shooting the monitored object.

For example, the data reporting unit may generate a prompt message to be sent to the alarm to trigger the alarm to alarm. Or, the data reporting unit may perform format encapsulation (e.g., encoding, compressing, etc.) on one or more pictures in the target image to obtain the prompt information, and the processing device may send the prompt information to the server, and may forward the prompt information to the terminal device of the administrator.

Fig. 8 is a schematic structural diagram of a processing apparatus according to an embodiment of the present application. The processing means may be implemented as the processing device itself or configured in the processing device by software, hardware or a combination of both. The processing means comprises an acquisition module 801, a processing module 802 and a transceiving module 803.

An obtaining module 801, configured to obtain a target image, where the target image includes the target object, the electronic device, and the monitoring object;

a processing module 802, configured to extract skeletal key point information of the target object from the target image;

the processing module 802 is further configured to obtain a first predicted value by using a behavior recognition model according to the skeletal key point information, where the first predicted value is used to represent a predicted probability that the target object has a shooting behavior;

the processing module 802 is further configured to determine a spatial relationship between at least two of the target object, the electronic device, and the monitoring object according to the target image;

the processing module 802 is further configured to obtain a prediction value set according to the spatial relationship, where the prediction value set includes at least one prediction value, and each prediction value is used to represent a predicted probability that the target object has a shooting behavior for shooting the monitored object;

the processing module 802 is further configured to determine whether the shooting behavior exists in the target object according to the first predicted value and the set of predicted values;

the transceiver module 803 is configured to, in a case that it is determined that the shooting behavior exists for the target object, output a prompt message, where the prompt message is used to prompt the target object that the shooting behavior exists.

Optionally, the predicted value set includes a second predicted value, and the processing module 802 is specifically configured to:

Optionally, the predicted value set further includes a third predicted value, and the processing module 802 is specifically configured to:

optionally, the prediction value set further includes a fourth prediction value, and the processing module 802 is specifically configured to:

Optionally, the processing module 802 is specifically configured to determine, according to the target image, a distance between the hand of the target object and the electronic device;

and the processing module 802 is further configured to determine whether the target object holds the electronic device according to that a distance between the hand of the target object and the electronic device is smaller than a third preset value.

Optionally, the processing module 802 is specifically configured to:

Optionally, the obtaining module 801 is specifically configured to receive the target image from an acquisition device; or shooting to obtain the target image.

Optionally, the processing module 802 is further configured to identify the target image, and determine that the target image includes the target object and the electronic device.

The embodiment of the application provides a processing device. The processing device includes: the image processing device comprises a logic circuit and a communication interface, wherein the communication interface is used for acquiring data to be processed and/or outputting the processed data, and the logic circuit is used for processing the data to be processed to obtain the processed data so as to enable the processing device to execute the image processing method in the method embodiment. In one possible design, the communication interface includes an input interface and an output interface.

It should be understood that the Processing device provided in the embodiments of the present Application may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor, or in a combination of the hardware and software modules in the processor.

Fig. 9 is a schematic structural diagram of a processing device 900 according to an embodiment of the present application. The processing device 900 may be applied in a system as shown in fig. 1, and performs the functions of the processing device in the above-described method embodiments. As shown, the processing device 900 includes a transceiver 901 and a processor 902. Optionally, the processing device 900 further comprises a memory 903. The processor 902, the transceiver 901 and the memory 903 may communicate with each other via internal connection paths to transmit control and/or data signals. The memory 903 is used for storing a computer program, and the processor 902 is used for executing the computer program in the memory to control the transceiver 901 to transmit and receive signals. In particular implementations, the memory may be integrated within the processor or may be separate from the processor.

The present embodiment also provides a readable storage medium, in which an execution instruction is stored, and when the execution instruction is executed by at least one processor of the electronic device, when the execution instruction is executed by the processor, the image processing method in the above embodiments is implemented.

The present embodiments also provide a computer program product comprising executable instructions stored in a readable storage medium. The at least one processor of the electronic device may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the electronic device to implement the image processing method provided by the various embodiments described above.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the module is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the method according to various embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. an image processing method, is characterized in that, comprises:

acquiring a target image, the target image including a target object, an electronic device and a monitoring object;

extracting bone key point information of the target object from the target image;

According to the skeleton key point information, a behavior recognition model is used to obtain a first predicted value, and the first predicted value is used to represent the predicted probability that the target object has a shooting behavior;

According to the target image, determine the spatial relationship of at least two of the target object, the electronic device and the monitoring object;

According to the spatial relationship, a set of predicted values is obtained, the set of predicted values includes at least one predicted value, and each predicted value is used to represent the probability that the predicted target object has a shooting behavior of shooting the monitoring object;

According to the first predicted value and the set of predicted values, it is determined whether the target object has the shooting behavior.

2. The method according to claim 1, wherein the set of predicted values comprises a second predicted value,

And, determining the spatial relationship of at least two of the target object, the electronic device, and the monitoring object according to the target image, including:

determining, according to the target image, a first included angle between the first plane where the electronic device is located and the second plane where the monitoring object is located;

And, according to the spatial relationship, the obtained set of predicted values includes:

The second predicted value is obtained according to the first included angle and the first preset value, wherein the first included angle is the included angle between the first plane where the electronic device is located and the second plane where the monitoring object is located.

3. The method according to claim 1 or 2, wherein the predicted value set further comprises a third predicted value,

According to the target image, a second angle between a first connection line and a second connection line is determined, where the first connection line is a connection line between the target point of the target object and the electronic device, so The target point is the shoulder center point or the head point of the target object, and the second connection line is the connection line between the electronic device and the monitoring object;

The third predicted value is obtained according to the second included angle and the second preset value, where the second included angle is the second included angle between the first connecting line and the second connecting line.

4. The method according to claim 3, wherein the predicted value set further comprises a fourth predicted value,

determining the distance between the hand of the target object and the electronic device according to the target image;

The fourth predicted value is obtained according to the distance and the third preset value.

5 . The method according to claim 1 or 2 , wherein the spatial relationship between at least two of the target object, the electronic device and the monitoring object is determined according to the target image. 6 . ,include:

And, the method also includes:

According to whether the distance between the hand of the target object and the electronic device is smaller than a third preset value, it is determined whether the target object is holding the electronic device.

6. The method according to claim 1 or 2, wherein the determining whether the target object has the shooting behavior according to the first predicted value and the set of predicted values comprises:

Calculate a fifth predicted value according to the first predicted value, the weight coefficient of the first predicted value, each predicted value in the predicted value set, and the weight coefficient of each predicted value;

If the fifth predicted value is greater than or equal to the fourth preset value, determine that the target object has the shooting behavior;

If the fifth predicted value is smaller than the fourth preset value, it is determined that the target object does not have the shooting behavior.

7. The method according to claim 1 or 2, wherein the acquiring a target image comprises:

receiving the target image from an acquisition device; or,

The target image is obtained by shooting.

8. The method according to claim 7, wherein the method further comprises:

The target image is identified, and it is determined that the target image includes the target object and the electronic device.

9. The method of claim 1, wherein the method further comprises:

If it is determined that the target object has the shooting behavior, prompt information is output, where the prompt information is used to prompt the target object that the shooting behavior exists.

10. An image processing device, comprising:

an acquisition module for acquiring a target image, the target image including the target object, the electronic device and the monitoring object;

a processing module for extracting the skeleton key point information of the target object from the target image;

The processing module is further configured to obtain a first predicted value by using a behavior recognition model according to the skeleton key point information, and the first predicted value is used to represent the predicted probability that the target object has a shooting behavior;

The processing module is further configured to determine the spatial relationship of at least two of the target object, the electronic device and the monitoring object according to the target image;

The processing module is further configured to obtain a set of predicted values according to the spatial relationship, where the set of predicted values includes at least one predicted value, and each predicted value is used to represent that the predicted target object exists in the shooting of the monitoring object. probability of behavior;

The processing module is further configured to determine whether the target object has the shooting behavior according to the first predicted value and the set of predicted values.

11. The apparatus according to claim 10, wherein the predicted value set comprises a second predicted value, and the processing module is specifically configured to:

12. The apparatus according to claim 10 or 11, wherein the predicted value set further comprises a third predicted value, and the processing module is specifically configured to:

13. The apparatus according to claim 12, wherein the predicted value set further comprises a fourth predicted value, and the processing module is specifically configured to:

14. The apparatus according to claim 10 or 11, wherein the processing module is specifically configured to:

15. The apparatus according to claim 10 or 11, wherein the processing module is specifically configured to:

16. The device according to claim 10 or 11, wherein the acquisition module is specifically configured to:

receiving the target image from an acquisition device; or,

The target image is obtained by shooting.

17. The apparatus according to claim 16, wherein the processing module is further configured to identify the target image, and determine that the target image includes the target object and the electronic device.

18. The apparatus of claim 10, wherein

The transceiver module is further configured to output prompt information when it is determined that the target object has the shooting behavior, where the prompt information is used to prompt the target object to have the shooting behavior.

19. A processing device, comprising: a processor and a memory;

the memory stores computer-executable instructions;

The at least one processor executes computer-implemented instructions stored in the memory to cause the electronic device to perform the method of any one of claims 1-9.

20. A computer-readable storage medium, characterized in that, computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are executed by a processor, are used to implement any one of claims 1 to 9 method described in item.

21. A computer program product comprising instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 9.