Disclosure of Invention
In view of this, the present invention provides a communication behavior detection method, apparatus, device and storage medium, for overcoming the problems in the prior art that when a person is far away from a monitoring device, the definition of a picture obtained by the monitoring device is reduced, and the accuracy is low when the picture is identified and determined.
According to a first aspect of embodiments of the present application, there is provided a communication behavior detection method, including:
acquiring behavior picture information of monitored personnel;
inputting the picture information into a posture estimation module to generate key point information; wherein the key points comprise at least one of wrist key points, palm key points, elbow key points and eye key points;
the wrist key points are key points corresponding to the wrist parts; the palm key points are key points corresponding to the palm parts; the elbow key points are the key points corresponding to the elbow parts; the eye key points are key points corresponding to the eye parts;
and judging whether the behavior picture information contains communication behaviors or not according to the key point information.
Optionally, the input posture estimation module adopts a posture recognition technology of openpos based on a skeleton model.
Optionally, the generating the key point information includes:
generating palm key point position information, face key point position information, left eye key point position information and right eye key point position information;
the determining whether the behavior picture information contains a communication behavior according to the key point information includes:
calculating the distance between the palm key point and the face key point;
judging whether the distance between the palm key point and the face key point is smaller than a preset distance or not;
if so, judging whether the palm key point is positioned between a straight line extending along the right front of the face of the monitored person where the left eye key point is positioned and a straight line extending along the right front of the face of the monitored person where the right eye key point is positioned in the horizontal direction;
if not, determining whether the behavior picture information contains communication behaviors.
Optionally, the method for generating the position information of the palm key point includes:
generating position information of the elbow key points and position information of the wrist key points;
and determining the position information of the palm key point according to the position information of the elbow key point and the position information of the wrist key point.
Optionally, determining the position information of the palm key point according to the position information of the elbow key point and the position information of the wrist key point includes:
determining straight lines where the elbow key points and the wrist key points are located according to the position information of the elbow key points and the position information of the wrist key points;
when a first distance corresponding to a target point on a straight line is a first preset number times of a second distance, determining that the position information of the target point is the position information of a palm key point;
the target point is any point except a line segment formed by the elbow key point and the wrist key point on the straight line; the first distance is the distance between the target point and the elbow key point, and the second distance is the distance between the target point and the wrist key point.
Optionally, the value range of the first preset number is 3-5.
Optionally, determining the position information of the palm key point according to the position information of the elbow key point and the position information of the wrist key point includes:
determining a line segment taking the elbow key point and the wrist key point as end points according to the position information of the elbow key point and the position information of the wrist key point;
determining auxiliary points close to the wrist key points at one-half of a second preset number of the line segments;
determining a point of symmetry of the auxiliary points with respect to the wrist keypoints;
and the position information of the symmetrical point is the position information of the palm.
Optionally, the value range of the second preset number is 3-5.
Optionally, the generating the key point information includes:
generating corresponding elbow key point position information and frame number information of different frame numbers in continuous time;
generating wrist key point position information and frame number information corresponding to different frame numbers in continuous time;
generating information and frame number information of whether the palm key point uses the telephone or not corresponding to different frame numbers in continuous time;
the determining whether the behavior picture information contains a communication behavior according to the key point information includes:
judging whether the behavior is a hand-lifting behavior or not according to the position information of the elbow key points and the position information of the wrist key points;
if so, judging whether the duration time of the hand raising behavior exceeds a first preset time according to the elbow key point frame number information and the wrist key point frame number information;
if yes, judging that the duration of using the telephone exceeds a second preset time within the duration of the hand-up behavior according to the information and the frame number information of whether the palm key point uses the telephone;
if yes, determining whether the behavior picture information contains communication behaviors.
According to a second aspect of embodiments of the present application, there is provided a communication behavior detection apparatus including:
the acquisition module acquires behavior picture information of monitored personnel;
the generating module inputs the picture information into the attitude estimation module to generate key point information; wherein the key points comprise at least one of wrist key points, palm key points, elbow key points and eye key points;
the wrist key points are key points corresponding to the wrist parts; the palm key points are key points corresponding to the palm parts; the elbow key points are the key points corresponding to the elbow parts; the eye key points are key points corresponding to the eye parts;
and the judging module is used for judging whether the behavior picture information contains communication behaviors or not according to the key point information.
Optionally, the posture estimation module adopts a posture recognition technology of openpos based on a skeleton model.
Optionally, the generating module is specifically configured to: inputting the picture information into a posture estimation module to generate palm key point position information, face key point position information, left eye key point position information and right eye key point position information;
the judgment module is specifically configured to:
calculating the distance between the palm key point and the face key point;
judging whether the distance between the palm key point and the face key point is smaller than a preset distance or not;
if so, judging whether the palm key point is positioned between a straight line extending along the right front of the face of the monitored person where the left eye key point is positioned and a straight line extending along the right front of the face of the monitored person where the right eye key point is positioned in the horizontal direction;
if not, determining whether the behavior picture information contains communication behaviors.
Optionally, the generating module is specifically configured to:
generating corresponding elbow key point position information and frame number information of different frame numbers in continuous time;
generating wrist key point position information and frame number information corresponding to different frame numbers in continuous time;
generating information and frame number information of whether the palm key point uses the telephone or not corresponding to different frame numbers in continuous time;
the judgment module is specifically configured to:
the determining whether the behavior picture information contains a communication behavior according to the key point information includes:
judging whether the behavior is a hand-lifting behavior or not according to the position information of the elbow key points and the position information of the wrist key points;
if so, judging whether the duration time of the hand raising behavior exceeds a first preset time according to the elbow key point frame number information and the wrist key point frame number information;
if yes, judging that the duration of using the telephone exceeds a second preset time within the duration of the hand-up behavior according to the information and the frame number information of whether the palm key point uses the telephone;
if yes, determining whether the behavior picture information contains communication behaviors.
According to a third aspect of embodiments of the present application, there is provided a communication behavior detection apparatus including:
a processor, and a memory coupled to the processor;
the memory is configured to store a computer program configured to perform at least the communication behavior detection method according to the first aspect of the present application;
the processor is used for calling and executing the computer program in the memory.
According to a fourth aspect of the embodiments of the present application, there is provided a storage medium storing a computer program, which when executed by a processor, implements the steps of the communication behavior detection method according to the first aspect of the present application.
By adopting the technical scheme, the behavior picture information can be acquired; then inputting the picture information into a posture estimation module to generate the position information of the key points; and judging whether the behavior picture information contains communication behaviors or not according to the key point information. Each of the key points includes: at least one of a wrist, an elbow, and a point of a location of an eye; due to the size of the positions occupied by the key points such as the wrist, the elbow and the eyes, the positions are occupied more than the positions occupied by the various details on the hand used for judging the gesture. Whether the behavior is a communication behavior is determined through key point points such as wrists, elbows and eyes, and compared with the determination of lower definition requirements on pictures through gestures, when a person is far away from the monitoring equipment, the definition of pictures acquired by the monitoring equipment is reduced.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be made by those skilled in the art without any inventive work based on the embodiments of the present invention, are within the scope of the present invention.
Fig. 1 is a flowchart illustrating a communication behavior detection method according to an embodiment of the present invention.
As shown in fig. 1, the method of the present embodiment includes:
s101, acquiring picture information of behaviors of monitored personnel;
the behavior picture information is obtained by shooting through the monitoring equipment.
Specifically, the monitoring device may be various types of cameras, monitor persons in situations where communication using the mobile communication device is prohibited, and take picture information of the behaviors of the persons in the situations in real time. The monitored person is a person in a situation where the communication using the mobile communication device is prohibited.
S102, inputting the picture information into an attitude estimation module to generate key point information; wherein the key points comprise at least one of wrist key points, palm key points, elbow key points and eye key points;
the wrist key points are key points corresponding to the wrist parts; the palm key points are key points corresponding to the palm parts; the elbow key points are the key points corresponding to the elbow parts; the eye key points are key points corresponding to the eye parts;
specifically, the input posture estimation module adopts an openpos skeleton model-based posture identification technology. The gesture recognition technology of OpenPose based on a skeleton model is a technology for processing human body gestures as a whole. The technology can be used for tracking and detecting the feature points of multiple persons in real time, and can simultaneously locate 18 key feature points on the human body.
And S103, judging whether the behavior picture information contains communication behaviors or not according to the key point information.
By adopting the technical scheme, the behavior picture information can be acquired through the monitoring equipment; then inputting the picture information into a posture estimation module to generate the position information of the key points; and judging whether the behavior is a communication behavior according to the key point information. Each of the key points includes: at least one of a wrist, an elbow, and a point of a site where an eye is located. Due to the size of the positions occupied by the key points such as the wrist, the elbow and the eyes, the positions are occupied more than the positions occupied by the various details on the hand used for judging the gesture. Whether the behavior is the communication behavior is determined through key point points such as wrists, elbows and eyes, and compared with the determination of lower definition requirements on the picture through gestures, when a person is far away from the monitoring equipment, the definition of the picture acquired by the monitoring equipment is reduced.
In practical applications, there are the following two methods for detecting communication behaviors by the above technical solutions.
Referring to fig. 2, the steps of one method are as follows:
s201, generating palm key point position information, face key point position information, left eye key point position information and right eye key point position information;
specifically, S201 is a further explanation of S102. Inputting the picture information into a posture estimation module, and generating palm key point position information, face key point position information, left eye key point position information and right eye key point position information.
Further, the method for generating the position information of the palm key point comprises the following steps: generating position information of the elbow key points and position information of the wrist key points; and determining the position information of the palm key point according to the position information of the elbow key point and the position information of the wrist key point.
Generating the position information of the elbow key points and the position information of the wrist key points is as follows: and inputting the picture information into a posture estimation module to generate the position information of the elbow key points and the position information of the wrist key points.
According to the position information of the elbow key points and the position information of the wrist key points, determining the position information of the palm key points as follows: determining straight lines where the elbow key points and the wrist key points are located according to the position information of the elbow key points and the position information of the wrist key points; when the first distance corresponding to the target point on the straight line is multiple of the first preset number of the second distance, determining that the position information of the target point is the position information of the palm key point; the target point is any point except a line segment formed by the elbow key point and the wrist key point on the straight line; the first distance is the distance between the target point and the elbow key point, and the second distance is the distance between the target point and the wrist key point. Wherein the value range of the first preset number is 4-6. Specifically, the first preset number may be 5.
According to the position information of the elbow key points and the position information of the wrist key points, the position information of the wrist key points can be determined as follows:
determining a line segment taking the elbow key point and the wrist key point as end points according to the position information of the elbow key point and the position information of the wrist key point;
determining auxiliary points close to the wrist key points at one-half of a second preset number of line segments;
determining the symmetrical points of the auxiliary points relative to the key points of the wrist;
the position information of the symmetric point is the position information of the palm.
Wherein the value range of the second preset number is 3-5. Specifically, the second preset number may be 4.
S202, calculating the distance between a palm key point and a face key point;
specifically, a three-dimensional model may be established, and the distance between the palm key point and the face key point may be determined according to the information of the palm key point and the information of the face key point.
S203, judging whether the distance between the palm key point and the face key point is smaller than a preset distance or not;
the value range of the preset distance is 0-60 pixel distance.
S204, if yes, judging whether the palm key point is positioned between a straight line extending along the front of the face of the monitored person and a straight line extending along the front of the face of the monitored person, wherein the left eye key point is positioned on the straight line;
and S205, if the judgment result is negative, determining that the behavior of the monitored person is a calling behavior.
If the monitored person is using other functions of the communication device, for example: playing games, reading electronic documents, etc., it is necessary to place the communication device in front of the eyes, i.e., on the midperpendicular of the connection line of the key points of the left eye and the right eye; meanwhile, since the communication device is generally picked up by the palm, the position of the palm key point can be regarded approximately as the position of the communication device. When the palm holding the communication device is closer to the face and not in front of the eyes, it is considered to communicate with the communication device.
Referring to fig. 3, another method includes the following steps:
s301, generating corresponding elbow key point position information and frame number information with different frame numbers in continuous time;
s302, generating wrist key point position information and frame number information corresponding to different frame numbers in continuous time;
s303, generating information and frame number information of whether the palm key point uses the telephone or not, which correspond to different frame numbers in continuous time;
specifically, S301, S302, and S303 are further explanations of S102. Inputting the picture information into a posture estimation module to generate elbow key point position information and frame number information under different frame numbers in continuous time; generating the position information and the frame number information of the wrist key point in the continuous time and at different frame numbers; generating information of whether the palm key point uses the telephone or not and frame number information under different frame numbers in continuous time; generating palm key point position information, face key point position information, left eye key point position information and right eye key point position information.
S304, judging whether the behavior is a hand-lifting behavior according to the position information of the elbow key points and the position information of the wrist key points;
specifically, the position information of the elbow key point and the position information of the wrist key point can be input into a posture estimation module adopting an openpos skeleton model-based posture recognition technology, so that whether the behavior is a hand-lifting behavior can be judged.
S305, if yes, judging whether the duration time of the hand raising behavior exceeds a first preset time according to the elbow key point frame number information and the wrist key point frame number information;
specifically, the first preset time ranges from 3 seconds to 5 seconds, and further, the first preset time may be 5 seconds.
S306, if yes, judging that the duration of using the phone exceeds a second preset time within the duration of the hand-up behavior according to the information and the frame number information of whether the palm key point uses the phone;
specifically, the second preset time ranges from 3 seconds to 5 seconds, and further, the second preset time may be 3 seconds.
And S307, if yes, determining the behavior of the monitored person to be a calling behavior.
The basis of the above judging mode is: when the monitored person holds the handset and lifts his hand for more than a certain period of time, it is highly likely that communication is being performed using the communication device. Thus, it is determined whether or not the hand raising operation is performed by S304; by judging the duration of the hand-raising action in S305 and judging whether the mobile phone is held in S306, it can be determined step by step whether the monitored person meets the conditions in the basis mentioned in this paragraph, and whether the monitored person is communicating.
Fig. 4 is a schematic structural diagram of a communication behavior detection apparatus according to another embodiment of the present application. Referring to fig. 4, the communication behavior detection apparatus provided by the present application includes:
the acquiring module 401 acquires behavior picture information of a monitored person;
the behavior picture information is obtained by shooting through the monitoring equipment.
A generating module 402, which inputs the picture information into the attitude estimation module to generate the key point information; wherein the key points comprise at least one of wrist key points, palm key points, elbow key points and eye key points;
the determining module 403 determines whether the behavior picture information includes a communication behavior according to the key point information.
By adopting the technical scheme, the behavior picture information can be acquired through the monitoring equipment; then inputting the picture information into a posture estimation module to generate the position information of the key points; and judging whether the behavior is a communication behavior according to the key point information. The respective key points may include: the 18 key nodes where the wrist, elbow and eyes are located. Due to the size of the positions occupied by the key points such as the wrist, the elbow and the eyes, the positions are occupied more than the positions occupied by the various details on the hand used for judging the gesture. Whether the behavior is the communication behavior is determined through key point points such as wrists, elbows and eyes, and compared with the determination of lower definition requirements on pictures through gestures, when people are far away from the monitoring equipment, the definition of pictures acquired by the monitoring equipment is reduced, compared with the scheme in the background technology, the behavior is identified and judged through the scheme, and the judgment accuracy can be improved.
Optionally, the input posture estimation module adopts a posture recognition technology of openpos based on a skeleton model.
Optionally, the method further includes: generating key point information, including:
the generating module 402 is specifically configured to:
inputting the picture information into a posture estimation module, and generating palm key point position information, face key point position information, palm key point position information, left eye key point position information and right eye key point position information;
the determining module 403 is specifically configured to:
calculating the distance between the palm key point and the face key point;
judging whether the distance between the palm key point and the face key point is smaller than a preset distance or not;
if so, judging whether the palm key point is positioned between a straight line extending along the right front of the face of the monitored person where the left eye key point is positioned and a straight line extending along the right front of the face of the monitored person where the right eye key point is positioned in the horizontal direction;
if not, determining whether the behavior picture information contains communication behaviors.
Optionally, the generating module 402 is specifically configured to:
generating position information and frame number information of the elbow key points in continuous time and at different frame numbers;
generating the position information and the frame number information of the wrist key point in the continuous time and at different frame numbers;
generating information of whether the palm key point uses the telephone or not and frame number information under different frame numbers in continuous time;
the determining module 403 is specifically configured to:
judging whether the behavior picture information contains communication behaviors according to the key point information, wherein the judging step comprises the following steps:
judging whether the behavior is a hand-lifting behavior or not according to the position information of the elbow key points and the position information of the wrist key points;
if so, judging whether the duration time of the hand raising behavior exceeds a first preset time or not according to the elbow key point frame number information and the wrist key point frame number information;
if yes, judging whether the duration of using the telephone exceeds a second preset time within the duration of the hand-up behavior according to the information and the frame number information of whether the palm key point uses the telephone;
and if so, determining whether the behavior picture information contains communication behaviors.
Fig. 5 is a schematic structural diagram of a communication behavior detection device according to another embodiment of the present application. Referring to fig. 5, the communication behavior detection apparatus provided by the present application includes: a processor 502, and a memory 501 connected to the processor 502;
the memory 501 is used for storing a computer program, which is at least used for executing the communication behavior detection method provided by the present application;
the processor 502 is used to invoke and execute computer programs in the memory 501.
By adopting the technical scheme, the behavior picture information can be acquired through the monitoring equipment; then inputting the picture information into a posture estimation module to generate the position information of the key points; and judging whether the behavior is a communication behavior according to the key point information. The respective key points may include: at least one of a wrist, an elbow, and a point of a site where an eye is located. Due to the size of the positions occupied by the key points such as the wrist, the elbow and the eyes, the positions are occupied more than the positions occupied by the various details on the hand used for judging the gesture. Whether the behavior is the communication behavior is determined through key point points such as wrists, elbows and eyes, and compared with the determination of lower definition requirements on pictures through gestures, when people are far away from the monitoring equipment, the definition of pictures acquired by the monitoring equipment is reduced, compared with the scheme in the background technology, the behavior is identified and judged through the scheme, and the judgment accuracy can be improved.
The application also provides a storage medium, wherein the storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the communication behavior detection method provided by the application are realized.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.