CN113591661A - Call-making behavior prediction method and system - Google Patents

Call-making behavior prediction method and system Download PDF

Info

Publication number
CN113591661A
CN113591661A CN202110840176.8A CN202110840176A CN113591661A CN 113591661 A CN113591661 A CN 113591661A CN 202110840176 A CN202110840176 A CN 202110840176A CN 113591661 A CN113591661 A CN 113591661A
Authority
CN
China
Prior art keywords
behavior
key
coordinates
key point
video image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110840176.8A
Other languages
Chinese (zh)
Inventor
宋梦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Teamway Electric Co ltd
Original Assignee
Shenzhen Teamway Electric Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Teamway Electric Co ltd filed Critical Shenzhen Teamway Electric Co ltd
Priority to CN202110840176.8A priority Critical patent/CN113591661A/en
Publication of CN113591661A publication Critical patent/CN113591661A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The application discloses a method and a system for predicting a call-making behavior, which relate to the technical field of image recognition and comprise the following steps: defining a call behavior type, the call behavior type comprising a hand in left ear class, a hand in right ear class, and a hand in nose class; acquiring a sample data set of the calling behavior based on the calling behavior type; analyzing the human skeleton coordinates in the sample data set through a training model, and defining behavior key points, wherein the behavior key points comprise trunk key points and hand key points; respectively acquiring a key point coordinate list of each behavior key point based on the sample data set; calculating and acquiring a key point reference coordinate of the calling behavior based on the key point coordinate list; and acquiring a video image to be tested, and predicting a calling behavior in the video image to be tested based on the key point reference coordinates and the training model. The method and the device have the advantage that the call-making behavior can be predicted or judged.

Description

Call-making behavior prediction method and system
Technical Field
The application relates to the technical field of image recognition, in particular to a method and a system for predicting a call-making behavior.
Background
In recent years, with the development of smart technology, smart phones are used more frequently in various occasions, and then, some public places and working places are prohibited from taking a call, such as gas stations, driving cars, and the like. Therefore, the system for detecting the call-making behavior has great significance for personnel safety consideration and preventing the occurrence of call-making behavior events in specific occasions. In the related technology, multi-order model fusion is adopted, wherein the multi-order model comprises a vehicle window detection model, a human face detection model, a driver positioning detection module, a driver hand detection module and the like, and the calling behavior of a driver in a fixed scene in a cab is detected.
With respect to the related art among the above, the inventors consider that the following drawbacks exist: due to the adoption of a specific detection model, the method is difficult to be applied to other occasions where the calling behavior is forbidden to occur, and the method can only detect the calling behavior and is difficult to predict the calling behavior.
Disclosure of Invention
In order to overcome the defect that a calling behavior detection system is difficult to predict calling behaviors, the application provides a calling behavior prediction method and a calling behavior prediction system.
In a first aspect, the present application provides a method for predicting a call-making behavior, which specifically includes the following steps:
defining a call behavior type, the call behavior type comprising a hand in left ear class, a hand in right ear class, and a hand in nose class;
obtaining a sample data set based on the calling behavior type, wherein the sample data set comprises a plurality of sampling video images containing calling behaviors;
analyzing the human skeleton coordinates in the sample data set through a training model, and defining behavior key points, wherein the behavior key points comprise trunk key points and hand key points;
respectively acquiring a key point coordinate list of each behavior key point based on the sample data set;
calculating and acquiring a key point reference coordinate of the calling behavior based on the key point coordinate list;
and acquiring a video image to be tested, and judging or predicting a calling behavior in the video image to be tested based on the key point reference coordinates and the training model.
By adopting the technical scheme, the calling behavior type is defined according to the behavior characteristics of the public in the big data when the public calls, a plurality of sampling video images are collected according to the calling behavior type and integrated into the sample data set, the sample data set is analyzed through the training model to obtain the key point reference coordinates of the calling behavior, the key point reference coordinates can embody the behavior characteristics in the calling behavior, and then the calling behavior in the video image to be detected can be judged or predicted based on the key point reference coordinates and the training model.
Optionally, the step of respectively obtaining a key point coordinate list of each behavior key point based on the sample data set includes the following steps:
respectively acquiring trunk key points of all sampled video images in the sample data set, wherein the trunk key points comprise a left ear node, a right ear node and a nose node;
respectively acquiring a left ear node coordinate, a right ear node coordinate and a nose node coordinate based on the human body skeleton coordinate;
respectively calculating the trunk key point coordinates of each sampling video image according to the shielding condition of the trunk key points;
and integrating the trunk key point coordinates of all the sampling video images in the sample data set to obtain a trunk key point coordinate list.
By adopting the technical scheme, the calling behavior types comprise a left ear type with a hand, a right ear type with a hand and a nose type with a hand, so that a left ear node, a right ear node and a nose node are obtained from trunk key points of the sampled video image, node coordinates of the three nodes are obtained, and the trunk key point coordinates of the sampled video image are integrated to obtain a trunk key point coordinate list by integrating the trunk key point coordinates of the sampled video image and the sample data and the trunk key point coordinates of all the sampled video images as well as the node of the trunk key point in the sampled video image is possibly shielded by a shielding object.
Optionally, the step of respectively obtaining a key point coordinate list of each behavior key point based on the sample data set further includes the following steps:
respectively acquiring hand key points of all sampled video images in the sample data set, wherein the hand key points comprise a plurality of key nodes;
respectively acquiring key node coordinates of each key node based on the human skeleton coordinates;
respectively calculating the coordinates of the hand key points of each sampling video image according to the shielding condition of the hand key points;
and integrating the hand key point coordinates of all the sampling video images in the sample data set to obtain a hand key point coordinate list.
By adopting the technical scheme, as the calling behavior of the public is usually that the calling equipment is held by hands to carry out calling, the key points of the hands in the sampled video images are required to be acquired, each hand of each hand can be designated as a key node, and the key node coordinates are acquired.
Optionally, respectively calculating the coordinates of the trunk key points of each of the sampled video images according to the occlusion condition of the trunk key points includes the following steps:
judging the shielding conditions of the left ear node, the right ear node and the nose node;
if the three nodes are not shielded, obtaining the coordinates of the key points of the trunk by calculating the coordinate mean value of the three nodes;
if any node is shielded, obtaining the coordinates of the key points of the trunk by calculating the coordinate mean value of the other two nodes;
if any two nodes are blocked, the node coordinate of the other node is the coordinate of the key point of the trunk.
By adopting the technical scheme, the shielded nodes in the trunk key points are screened out, the coordinate mean value of the unshielded nodes is calculated, and the coordinate mean value is used as the coordinates of the trunk key points.
Optionally, respectively calculating the coordinates of the hand key points of each of the sampled video images according to the occlusion condition of the hand key points includes the following steps:
judging the shielding conditions of all key nodes;
if all the key nodes are not shielded, obtaining the coordinates of the key points of the hand by calculating the coordinate mean value of all the key nodes;
if the plurality of key nodes are shielded and the number of the key nodes which are not shielded is not less than 2, obtaining the coordinates of the key points of the hand by calculating the coordinate mean value of the key nodes which are not shielded;
if one key node is not shielded and all other key nodes are shielded, the key node coordinates of the key nodes which are not shielded are the hand key point coordinates.
By adopting the technical scheme, the blocked key nodes in the key points of the hand are screened out, the coordinate mean value of the key nodes which are not blocked is calculated, and the coordinate mean value is used as the coordinates of the key points of the hand.
Optionally, the step of calculating and acquiring the key point reference coordinate of the call-making behavior based on the key point coordinate list includes the following steps:
obtaining a trunk key point reference coordinate by calculating a coordinate mean value of all trunk key point coordinates in the trunk key point coordinate list;
and calculating the coordinate mean value of all the hand key point coordinates in the hand key point coordinate list to obtain the hand key point reference coordinates.
By adopting the technical scheme, the trunk key point coordinate list and the hand key point coordinate list are subjected to mean value calculation, so that trunk key point coordinates and hand key point coordinates with large partial deviation can be screened out, and finally obtained two coordinate mean values can be used as trunk key point reference coordinates and hand key point reference coordinates.
Optionally, the step of judging or predicting the call-making behavior in the video image to be tested based on the key point reference coordinates and the training model includes the following steps:
calculating and acquiring Euclidean distance between the trunk key point reference coordinate and the hand key point reference coordinate;
obtaining a prediction range according to the Euclidean distance;
analyzing and acquiring the actual human skeleton coordinates of the video image to be tested through the training model;
acquiring coordinates of all actual behavior key points based on the actual human skeleton coordinates, and calculating and acquiring an actual Euclidean distance between an actual trunk key point and an actual hand key point in the video image to be detected according to the coordinates of the actual behavior key points;
and judging or predicting the call-making behavior in the video image to be detected according to the actual Euclidean distance and the prediction range.
By adopting the technical scheme, the Euclidean distance between the hand and the trunk in the public telephone calling behavior can be obtained by calculating the reference coordinates of the key points of the trunk and the hand in the space, so that the prediction range is obtained according to the calculated Euclidean distance, the actual Euclidean distance in the video image to be detected is calculated, and the telephone calling behavior in the video image to be detected is judged or predicted according to whether the actual Euclidean distance is in the prediction range or not.
Optionally, the step of judging or predicting the call-making behavior in the video image to be tested according to the actual euclidean distance and the prediction range includes the following steps:
defining a behavior occurrence range based on the prediction range, wherein the range size of the behavior occurrence range is smaller than the range size of the prediction range;
judging whether the actual Euclidean distance is in the behavior occurrence range;
if the actual Euclidean distance is in the behavior occurrence range, judging whether the time when the actual Euclidean distance is in the behavior occurrence range exceeds a preset first judgment time;
if the preset first judgment time is exceeded, judging that the call making behavior appears in the video image to be detected;
and if the actual Euclidean distance is not in the behavior occurrence range or the time that the actual Euclidean distance is in the behavior occurrence range does not exceed a preset first judgment time, judging that the call making behavior does not appear in the video image to be detected.
By adopting the technical scheme, because the call-making action in the video image to be detected can possibly occur, the action occurrence range is defined according to the prediction range, whether the actual Euclidean distance is in the action occurrence range is judged, whether the time in the range exceeds the first judgment time is judged if the actual Euclidean distance is in the action occurrence range, the call-making action is a continuous action, and people in the video image to be detected can possibly have transient actions such as touching ears or lifting hair, so that the actual Euclidean distance is in the action occurrence range in a short time, and therefore misjudgment caused by the transient actions can be screened out by setting the first judgment time.
Optionally, the step of judging or predicting the call-making behavior in the video image to be tested according to the actual euclidean distance and the prediction range further includes the following steps:
judging whether the actual Euclidean distance is in the prediction range;
if the actual Euclidean distance is in the prediction range, judging whether the time when the actual Euclidean distance is in the prediction range exceeds a preset second judgment time;
if the preset second determination time is exceeded, predicting that the call making behavior will appear in the video image to be detected;
and if the actual Euclidean distance is not in the prediction range or the time that the actual Euclidean distance is in the prediction range does not exceed the preset second determination time, predicting that the call-making behavior does not appear in the video image to be detected.
By adopting the technical scheme, the occurrence condition of the call-making behavior in the video image to be tested is predicted by judging whether the actual Euclidean distance is in the prediction range or not, if the actual Euclidean distance is in the prediction range and exceeds the second judgment time, the call-making behavior in the video image to be tested is predicted, and the influence of transient action on the prediction result can be screened out by setting the second judgment time.
In a second aspect, the present application further provides a system for predicting a call-making behavior, where the method for predicting a call-making behavior according to the first aspect specifically includes:
the video image acquisition module is used for acquiring the sampling video image and the video image to be detected;
and the analysis and prediction module is connected with the video image acquisition module to receive the sampling video image and the video image to be tested, analyzes and acquires the behavior characteristics of the call-making behavior according to the sampling video image, and judges or predicts the call-making behavior in the video image to be tested based on the behavior characteristics.
By adopting the technical scheme, the calling behavior type is defined according to the behavior characteristics of the public in the big data when the public calls, a plurality of sampling video images are collected through the video image collection module according to the calling behavior type and are integrated into the sample data set, the sample data set is analyzed through the training model in the analysis and prediction module to obtain the key point reference coordinate of the calling behavior, the key point reference coordinate can embody the behavior characteristics in the calling behavior, the video image to be detected is collected through the video image collection module, and the calling behavior in the video image to be detected can be judged or predicted based on the key point reference coordinate and the training model.
In summary, the present application includes at least one of the following beneficial technical effects:
1. and acquiring a sampled video image of the public telephone making behavior based on big data acquisition, identifying and acquiring the behavior characteristics of the telephone making behavior, and judging or predicting the telephone making behavior in the video image to be tested according to the behavior characteristics.
2. The training model is trained through the sample data set, the calling behavior is judged or predicted according to the training model, and the collection of the sample data set sampling video images can not be limited by occasions, so the training model can be suitable for various occasions.
Drawings
Fig. 1 is a flowchart illustrating a method for predicting a call-making behavior according to an embodiment of the present disclosure.
Fig. 2 is a first flowchart illustrating an embodiment of the present application of respectively obtaining a key point coordinate list of each behavior key point.
Fig. 3 is a schematic flowchart of a second embodiment of the present application, where the key point coordinate list of each behavior key point is obtained separately.
Fig. 4 is a schematic flowchart of a process of separately calculating the coordinates of the torso key points of each of the sampled video images according to an embodiment of the present application.
Fig. 5 is a schematic flowchart of a process of separately calculating coordinates of a hand key point of each of the sample video images according to an embodiment of the present application.
Fig. 6 is a schematic flowchart of acquiring the reference coordinates of the key points of the call-making behavior according to an embodiment of the present application.
Fig. 7 is a first flowchart illustrating a process of determining or predicting a call-making behavior in a video image to be tested according to an embodiment of the present disclosure.
Fig. 8 is a schematic flowchart illustrating a second process of determining or predicting a call-making behavior in a video image to be tested according to an embodiment of the present disclosure.
Fig. 9 is a third flowchart illustrating a process of determining or predicting a call-making behavior in a video image to be tested according to an embodiment of the present disclosure.
Detailed Description
The present application is described in further detail below with reference to figures 1-9.
The embodiment of the application discloses a calling behavior prediction system, which comprises a video image acquisition module and an analysis prediction module, wherein the video image acquisition module can be a camera, and the video image acquisition module is used for acquiring a sampling video image and a video image to be detected. The analysis and prediction module can be an image recognition device, is connected with the video image acquisition module, receives the sampled video image and the video image to be detected, analyzes and acquires the behavior characteristics of the call-making behavior according to the sampled video image, and judges or predicts the call-making behavior in the video image to be detected based on the behavior characteristics.
The embodiment of the application also discloses a method for predicting the call-making behavior.
Referring to fig. 1, the method for predicting a call-making behavior specifically includes the steps of:
101, defining a call behavior type.
And defining the calling behavior type according to the conventional behavior action characteristics of the public in the big data when the public calls, wherein the behavior types comprise a left ear type with a hand, a right ear type with the hand and a nose type with the hand.
And 102, acquiring a sample data set based on the calling behavior type.
The video image acquisition module is used for acquiring a plurality of sampling video images, the sampling video images contain behaviors of people during telephone calling, and the sampling video images are integrated into a sample data set.
And 103, analyzing the human skeleton coordinates in the sample data set through a training model, and defining behavior key points.
The training model can be an openposition model, human skeleton coordinates in each sampling video image in the sample data set can be analyzed through the training model, the human skeleton coordinates comprise coordinates of 25 joint points of a human trunk and 22 joint points of a left hand and a right hand, and a plurality of joint points of the trunk and the hand are defined as behavior key points according to behavior characteristics during calling.
And 104, respectively acquiring a key point coordinate list of each behavior key point based on the sample data set.
And 105, calculating and acquiring the key point reference coordinate of the call-making behavior based on the key point coordinate list.
And 106, acquiring a video image to be tested, and judging or predicting a calling behavior in the video image to be tested based on the key point reference coordinates and the training model.
The video image acquisition module acquires a video image to be detected.
The implementation principle of the embodiment is as follows:
according to behavior characteristics of the public in the big data when the public makes a call, defining a call making behavior type, collecting a plurality of sampling video images according to the call making behavior type and integrating the sampling video images into a sample data set, analyzing the sample data set through a training model to obtain a key point reference coordinate of the call making behavior, wherein the key point reference coordinate can embody the behavior characteristics of the call making behavior, and then the call making behavior in the video image to be detected can be judged or predicted based on the key point reference coordinate and the training model.
In step 104 of the embodiment shown in fig. 1, coordinates of all the trunk key points of each sampled video image in the sample data set are obtained, and then the coordinates of all the trunk key points are integrated into a trunk key point coordinate list, which is specifically described in detail with the embodiment shown in fig. 2.
Referring to fig. 2, the method for obtaining the coordinate list of the trunk key points includes the following steps:
and 201, respectively acquiring trunk key points of all the sampling video images in the sample data set.
In the calling behavior, the handheld mobile phone is usually placed at both ears or near the nose, so that the trunk key points include a left ear node, a right ear node and a nose node, and in the openpos model, the serial number of the left ear node is 17, the serial number of the right ear node is 18, and the serial number of the nose node is 0. In a special scenario, the torso keypoints will also include various nodes on the torso.
And 202, respectively acquiring a left ear node coordinate, a right ear node coordinate and a nose node coordinate based on the human skeleton coordinate.
Wherein, the coordinates of the left ear node are (x)17,y17) The coordinates of the right ear node are (x)18,y18) The coordinates of the nose node are (x)0,y0)。
And 203, respectively calculating the coordinate of the trunk key point of each sampling video image according to the shielding condition of the trunk key point.
Wherein, assuming that there are n sampled video images, wherein n is more than or equal to 2, the calculated coordinate of the key point of the trunk is (
Figure DEST_PATH_IMAGE001
Figure 647546DEST_PATH_IMAGE002
)。
And 204, integrating the trunk key point coordinates of all the sampling video images in the sample data set to obtain a trunk key point coordinate list.
Wherein, the coordinates of all the trunk key points are integrated to obtain a trunk key point coordinate list, and the list is as follows:
[(
Figure DEST_PATH_IMAGE003
,
Figure 675676DEST_PATH_IMAGE004
),(
Figure DEST_PATH_IMAGE005
,
Figure 377571DEST_PATH_IMAGE006
),...,(
Figure 562565DEST_PATH_IMAGE001
,
Figure DEST_PATH_IMAGE007
)]
the implementation principle of the embodiment is as follows:
the calling behavior types comprise a left ear type with a hand, a right ear type with a hand and a nose type with a hand, so that a left ear node, a right ear node and a nose node are obtained from trunk key points of the sampled video images, node coordinates of the three nodes are obtained, and the trunk key point coordinates of the sampled video images are integrated to obtain a trunk key point coordinate list by integrating occlusion conditions because the nodes of the trunk key points in the sampled video images are possibly occluded by an occlusion object.
In step 104 of the embodiment shown in fig. 1, coordinates of all the hand key points of each sample video image in the sample data set are obtained, and then the coordinates of all the hand key points are integrated into a hand key point coordinate list, which is specifically described in detail with reference to the embodiment shown in fig. 3.
Referring to fig. 3, acquiring a hand key point coordinate list of hand key points specifically includes the following steps:
301, respectively acquiring the hand key points of all the sampled video images in the sample data set.
The key points of the hand comprise five key nodes of the left hand or five key nodes of the right hand, and in the openposition model, the serial numbers of the five key nodes of the left hand or the five key nodes of the right hand are all [4, 8, 12, 16, 20 ].
And 302, respectively acquiring the key node coordinates of each key node based on the human skeleton coordinates.
Wherein the key node coordinates of the left hand five finger key nodes and the right hand five finger key nodes are defined with reference to step 202.
303, respectively calculating the coordinates of the hand key points of each sampling video image according to the shielding condition of the hand key points.
Wherein, assuming that there are n sampled video images, wherein n is greater than or equal to 2, the calculated left-hand key point coordinate is (
Figure 460114DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
) The coordinates of the key points of the right hand are (
Figure 190303DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE011
)。
And 304, integrating the hand key point coordinates of all the sampling video images in the sample data set to obtain a hand key point coordinate list.
The left-handed key point coordinate list is obtained by integrating all the left-handed key point coordinates, and the list is as follows:
[(
Figure 642145DEST_PATH_IMAGE012
,
Figure 506195DEST_PATH_IMAGE013
),(
Figure 382884DEST_PATH_IMAGE014
,
Figure 939768DEST_PATH_IMAGE015
),...,(
Figure 285430DEST_PATH_IMAGE016
,
Figure 218750DEST_PATH_IMAGE009
)]
and integrating all the coordinates of the right-hand key points to obtain a right-hand key point coordinate list, wherein the right-hand key point coordinate list comprises the following coordinates:
[(
Figure 949946DEST_PATH_IMAGE017
,
Figure 19009DEST_PATH_IMAGE018
),(
Figure 101234DEST_PATH_IMAGE019
,
Figure 916875DEST_PATH_IMAGE020
),...,(
Figure 909101DEST_PATH_IMAGE021
,
Figure 73367DEST_PATH_IMAGE011
)]
the implementation principle of the embodiment is as follows:
because the handheld call equipment is usually used for calling in the public call behavior, hand key points in the sampled video images need to be acquired, each hand of each hand can be designated as a key node, key node coordinates are acquired, and the key nodes of the hands can be shielded in the sampled video images, so that the hand key point coordinates in the sampled video images are calculated and integrated to obtain a hand key point coordinate list under the condition of integrating the hand key nodes.
In step 203 of the embodiment shown in fig. 2, the coordinates of the torso key point in each sampled video image are calculated, where if the node is occluded, the coordinates are 0, and if the node is not occluded, the coordinates of the node are obtained for calculation, which is specifically described in detail with the embodiment shown in fig. 4.
Referring to fig. 4, the method for calculating the torso key point coordinates of each sampled video image includes the following steps:
401, judging the occlusion conditions of the left ear node, the right ear node and the nose node, and if none of the three nodes are occluded, executing step 402; if any node is occluded, go to step 403; if any two nodes are occluded, step 404 is performed.
And 402, obtaining the coordinates of the key points of the trunk by calculating the coordinate mean of the three nodes.
Wherein, because three nodes are all not sheltered from, consequently calculate the coordinate mean value of three nodes and obtain trunk key point coordinate, specific computational formula is as follows:
Figure 642888DEST_PATH_IMAGE022
Figure 652432DEST_PATH_IMAGE023
in the formula (I), the compound is shown in the specification,
Figure 640111DEST_PATH_IMAGE024
is the coordinate abscissa of the key point of the trunk,
Figure 444119DEST_PATH_IMAGE025
and (4) the coordinate ordinate of the key point of the trunk, and i is the ith sampling video image.
And 403, obtaining the coordinates of the key points of the trunk by calculating the coordinate mean of the other two nodes.
Wherein, assuming the nose is occluded:
Figure 500937DEST_PATH_IMAGE026
=0,
Figure 923959DEST_PATH_IMAGE027
=0
Figure 890778DEST_PATH_IMAGE028
Figure 990321DEST_PATH_IMAGE029
assuming the right ear is occluded:
Figure 409801DEST_PATH_IMAGE030
=0,
Figure DEST_PATH_IMAGE031
=0
Figure 120004DEST_PATH_IMAGE032
Figure DEST_PATH_IMAGE033
assuming the left ear is occluded:
Figure 613434DEST_PATH_IMAGE034
=0,
Figure DEST_PATH_IMAGE035
=0
Figure 962506DEST_PATH_IMAGE036
Figure DEST_PATH_IMAGE037
where i is the ith sample video image.
404, the node coordinates of the other node are torso key point coordinates.
The implementation principle of the embodiment is as follows:
and screening out shielded nodes in the trunk key points, calculating the coordinate mean value of the nodes which are not shielded, and taking the coordinate mean value as the coordinates of the trunk key points.
In step 303 of the embodiment shown in fig. 3, coordinates of the hand key point in each sample video image are calculated, where if the node is occluded, the coordinates are 0, and if the node is not occluded, the coordinates of the node are obtained for calculation, which is specifically described in detail with the embodiment shown in fig. 5.
Referring to fig. 5, the method for calculating the coordinates of the key points of the hand of each sampled video image includes the following steps:
501, judging the shielding condition of all key nodes, and if all the key nodes are not shielded, executing step 502; if the plurality of key nodes are occluded and the number of the unoccluded key nodes is not less than 2, executing step 503; if one of the key nodes is not occluded and all other key nodes are occluded, step 504 is performed.
And 502, obtaining the coordinates of the key points of the hand by calculating the coordinate mean value of all the key nodes.
Taking the left hand as an example, the coordinates of the key points of the hand are calculated through a calculation formula, wherein the calculation formula is as follows:
Figure 603703DEST_PATH_IMAGE038
Figure 352216DEST_PATH_IMAGE039
in the formula (I), the compound is shown in the specification,
Figure 559207DEST_PATH_IMAGE040
is the coordinate abscissa of the key point of the left hand,
Figure 485706DEST_PATH_IMAGE041
and the coordinate ordinate of the left-hand key point is, and i is the ith sampling video image.
And 503, obtaining the coordinates of the key points of the hand by calculating the coordinate mean of the key nodes which are not shielded.
The specific calculation steps refer to the detailed descriptions of step 403 and step 502.
And 504, the key node coordinates of the key nodes which are not shielded are the hand key point coordinates.
The implementation principle of the embodiment is as follows:
and screening out the occluded key nodes in the hand key points, calculating the coordinate mean value of the unoccluded key nodes, and taking the coordinate mean value as the hand key point coordinate.
In step 105 of the embodiment shown in fig. 1, the key point coordinate list is obtained, and the average value is calculated to obtain the key point reference coordinate of the call-making behavior, which is specifically described in detail with the embodiment shown in fig. 6.
Referring to fig. 6, acquiring a reference coordinate of a key point of a call-making behavior specifically includes the following steps:
601, obtaining the reference coordinates of the trunk key points by calculating the coordinate mean of all the trunk key point coordinates in the trunk key point coordinate list.
If the coordinate lists of the key points of the trunk are all nonzero coordinates, calculating a coordinate mean value through a calculation formula, wherein the specific calculation formula is as follows:
Figure 4412DEST_PATH_IMAGE024
=
Figure 697561DEST_PATH_IMAGE042
Figure 100336DEST_PATH_IMAGE043
=
Figure 853529DEST_PATH_IMAGE044
the torso key point reference coordinate is therefore (
Figure 593952DEST_PATH_IMAGE024
Figure 90792DEST_PATH_IMAGE045
)。
And 602, calculating a coordinate mean value of all the coordinates of the hand key points in the hand key point coordinate list to obtain a reference coordinate of the hand key point.
If the coordinate lists of the key points of the left hand are all nonzero coordinates, calculating a coordinate mean value through a calculation formula, wherein the specific calculation formula is as follows:
Figure 616582DEST_PATH_IMAGE040
=
Figure 399731DEST_PATH_IMAGE046
Figure 768395DEST_PATH_IMAGE047
=
Figure 68926DEST_PATH_IMAGE048
if the coordinate lists of the key points of the right hand are all non-zero coordinates, calculating a coordinate mean value through a calculation formula, wherein the specific calculation formula is as follows:
Figure 449223DEST_PATH_IMAGE049
=
Figure 809797DEST_PATH_IMAGE050
Figure 790392DEST_PATH_IMAGE051
=
Figure 363456DEST_PATH_IMAGE052
therefore, the left-handed key point baseQuasi coordinate is (
Figure 254051DEST_PATH_IMAGE053
Figure 395314DEST_PATH_IMAGE041
) The reference coordinates of the key points of the right hand are (
Figure 738570DEST_PATH_IMAGE054
Figure 505538DEST_PATH_IMAGE051
)。
The implementation principle of the embodiment is as follows:
by means of mean value calculation of the trunk key point coordinate list and the hand key point coordinate list, trunk key point coordinates and hand key point coordinates with large partial deviation can be screened out, and finally obtained two coordinate mean values can be used as trunk key point reference coordinates and hand key point reference coordinates.
In step 601 and step 602 of the embodiment shown in fig. 6, after the key point reference coordinates are acquired, the call behavior in the video image to be tested is judged or predicted by the training model, which is specifically described in detail by the embodiment shown in fig. 7.
Referring to fig. 7, the method for judging or predicting the call-making behavior in the video image to be tested specifically includes the following steps:
701, calculating and acquiring an Euclidean distance between the trunk key point reference coordinate and the hand key point reference coordinate.
The Euclidean distance between the trunk key point reference coordinate and the left-hand key point reference coordinate is calculated, and the calculation formula is as follows:
Figure 250640DEST_PATH_IMAGE055
calculating the Euclidean distance between the reference coordinates of the key points of the trunk and the reference coordinates of the key points of the right hand, wherein the calculation formula is as follows:
Figure 687438DEST_PATH_IMAGE056
in the formula (I), the compound is shown in the specification,
Figure 396286DEST_PATH_IMAGE057
is the Euclidean distance between the reference coordinates of the key points of the trunk and the reference coordinates of the key points of the left hand,
Figure 311153DEST_PATH_IMAGE058
the Euclidean distance between the reference coordinates of the key points of the trunk and the reference coordinates of the key points of the right hand.
And 702, acquiring a prediction range according to the Euclidean distance.
Wherein the prediction range may be set to [ 2 ]
Figure 176341DEST_PATH_IMAGE057
,2
Figure 908673DEST_PATH_IMAGE057
],[
Figure 226522DEST_PATH_IMAGE058
,2
Figure 820446DEST_PATH_IMAGE058
]。
703, analyzing and acquiring the actual human skeleton coordinates of the video image to be detected through the training model.
And 704, acquiring coordinates of all actual behavior key points based on the actual human skeleton coordinates, and calculating and acquiring the actual Euclidean distance between the actual trunk key point and the actual hand key point in the video image to be detected according to the actual behavior key point coordinates.
In the embodiment shown in fig. 4, the embodiment shown in fig. 5, and the embodiment shown in fig. 6, the actual behavior key point coordinates in the video image to be measured, including the actual torso key point coordinates (are derived and calculated: (i)
Figure 540140DEST_PATH_IMAGE059
Figure 318740DEST_PATH_IMAGE060
) Actual left hand keypoint coordinates (
Figure 982940DEST_PATH_IMAGE061
Figure 770767DEST_PATH_IMAGE062
) Actual right hand keypoint coordinates (
Figure 344968DEST_PATH_IMAGE063
Figure 169835DEST_PATH_IMAGE064
);
Calculating the actual Euclidean distance between the actual trunk key point and the actual left-hand key point through a formula, wherein the specific formula is as follows:
Figure 931118DEST_PATH_IMAGE065
calculating the actual Euclidean distance between the actual trunk key point and the actual right-hand key point through a formula, wherein the specific formula is as follows:
Figure 647270DEST_PATH_IMAGE066
in the formula (I), the compound is shown in the specification,
Figure 810398DEST_PATH_IMAGE067
is the actual euclidean distance between the actual torso key point and the actual left hand key point,
Figure 196380DEST_PATH_IMAGE068
is the actual euclidean distance between the actual torso key point and the actual right hand key point.
705, judging or predicting the call-making behavior in the video image to be tested according to the actual Euclidean distance and the prediction range.
The implementation principle of the embodiment is as follows:
the Euclidean distance between the hand and the trunk in the public telephone calling behavior can be obtained by calculating the Euclidean distance between the trunk key point reference coordinate and the left hand/right hand key point reference coordinate in the space, so that the prediction range is obtained according to the calculated Euclidean distance, the actual Euclidean distance in the video image to be detected is calculated, and the telephone calling behavior in the video image to be detected is judged or predicted according to whether the actual Euclidean distance is in the prediction range.
In step 705 of the embodiment shown in fig. 7, after the actual euclidean distance is calculated, the call behavior in the video image to be measured is determined according to the prediction range, which is specifically described in detail with the embodiment shown in fig. 8.
Referring to fig. 8, the method for judging the call-making behavior in the video image to be tested specifically includes the following steps:
and 801, defining a behavior occurrence range based on the prediction range, wherein the range size of the behavior occurrence range is smaller than that of the prediction range.
Wherein, if the prediction range is set to [ 2 ]
Figure 585904DEST_PATH_IMAGE057
,2
Figure 715534DEST_PATH_IMAGE057
],[
Figure 857803DEST_PATH_IMAGE058
,2
Figure 680265DEST_PATH_IMAGE058
]Then the behavior occurrence range may be set to 0,
Figure 416140DEST_PATH_IMAGE057
],[0,
Figure 690739DEST_PATH_IMAGE058
]。
802, determining whether the actual euclidean distance is within the behavior occurrence range, if yes, executing step 803; if not, go to step 805.
803, judging whether the time of the actual Euclidean distance in the behavior occurrence range exceeds a preset first judgment time, if so, executing step 804; if not, go to step 805.
Wherein the first determination time can be preset to 20s, and at this time, if satisfied
Figure 94039DEST_PATH_IMAGE069
Or
Figure 946457DEST_PATH_IMAGE070
If either condition exceeds 20s, step 804 is performed.
And 804, judging that the call making behavior appears in the video image to be detected.
805, determining that the call-making behavior does not occur in the video image to be tested.
The implementation principle of the embodiment is as follows:
because the call-making behavior in the video image to be detected may have occurred, the behavior occurrence range is defined according to the prediction range, and then whether the actual Euclidean distance is in the behavior occurrence range is judged, if the actual Euclidean distance is in the behavior occurrence range, whether the time in the range exceeds the first judgment time is judged, the call-making behavior is a continuous action, and a person in the video image to be detected may have transient actions such as touching ears or lifting hair, so that the actual Euclidean distance is in the behavior occurrence range in a short time, and therefore the misjudgment caused by the transient actions can be screened out by setting the first judgment time.
In step 705 of the embodiment shown in fig. 7, after the actual euclidean distance is calculated, the call-making behavior in the video image to be measured is predicted according to the prediction range, which is specifically described in detail with the embodiment shown in fig. 9.
Referring to fig. 9, predicting a call-making behavior in a video image to be tested specifically includes the following steps:
901, judging whether the actual Euclidean distance is in the prediction range, if yes, executing a step 902; if not, go to step 904.
902, judging whether the time of the actual euclidean distance in the prediction range exceeds a preset second judgment time, if yes, executing step 903; if not, go to step 904.
Wherein, assuming that the preset second determination time is 10s, at this time, if the preset second determination time is satisfied
Figure 169628DEST_PATH_IMAGE071
Or
Figure 641061DEST_PATH_IMAGE072
If either condition exceeds 10s, step 803 is executed.
And 903, predicting the calling behavior to appear in the video image to be tested.
And 904, predicting that the call-making behavior does not appear in the video image to be tested.
The implementation principle of the embodiment is as follows:
and predicting the occurrence condition of the call-making behavior in the video image to be detected by judging whether the actual Euclidean distance is in the prediction range or not, predicting the call-making behavior in the video image to be detected if the actual Euclidean distance is in the prediction range and exceeds a second determination time, and screening the influence of transient actions on the prediction result through the setting of the second determination time.
The above embodiments are preferred embodiments of the present application, and the protection scope of the present application is not limited by the above embodiments, so: all equivalent changes made according to the structure, shape and principle of the present application shall be covered by the protection scope of the present application.

Claims (10)

1. A method for predicting a call-making behavior, comprising the steps of:
defining a call behavior type, the call behavior type comprising a hand in left ear class, a hand in right ear class, and a hand in nose class;
obtaining a sample data set based on the calling behavior type, wherein the sample data set comprises a plurality of sampling video images containing calling behaviors;
analyzing the human skeleton coordinates in the sample data set through a training model, and defining behavior key points, wherein the behavior key points comprise trunk key points and hand key points;
respectively acquiring a key point coordinate list of each behavior key point based on the sample data set;
calculating and acquiring a key point reference coordinate of the calling behavior based on the key point coordinate list;
and acquiring a video image to be tested, and judging or predicting a calling behavior in the video image to be tested based on the key point reference coordinates and the training model.
2. The method of claim 1, wherein the step of obtaining a key point coordinate list of each behavior key point based on the sample data set comprises the steps of:
respectively acquiring trunk key points of all sampled video images in the sample data set, wherein the trunk key points comprise a left ear node, a right ear node and a nose node;
respectively acquiring a left ear node coordinate, a right ear node coordinate and a nose node coordinate based on the human body skeleton coordinate;
respectively calculating the trunk key point coordinates of each sampling video image according to the shielding condition of the trunk key points;
and integrating the trunk key point coordinates of all the sampling video images in the sample data set to obtain a trunk key point coordinate list.
3. The method of claim 2, wherein the step of obtaining the keypoint coordinate list of each behavior keypoint based on the sample data set further comprises the steps of:
respectively acquiring hand key points of all sampled video images in the sample data set, wherein the hand key points comprise a plurality of key nodes;
respectively acquiring key node coordinates of each key node based on the human skeleton coordinates;
respectively calculating the coordinates of the hand key points of each sampling video image according to the shielding condition of the hand key points;
and integrating the hand key point coordinates of all the sampling video images in the sample data set to obtain a hand key point coordinate list.
4. The method according to claim 2, wherein the step of calculating the torso key point coordinates of each of the sampled video images according to the occlusion condition of the torso key point comprises the following steps:
judging the shielding conditions of the left ear node, the right ear node and the nose node;
if the three nodes are not shielded, obtaining the coordinates of the key points of the trunk by calculating the coordinate mean value of the three nodes;
if any node is shielded, obtaining the coordinates of the key points of the trunk by calculating the coordinate mean value of the other two nodes;
if any two nodes are blocked, the node coordinate of the other node is the coordinate of the key point of the trunk.
5. The method according to claim 3, wherein the step of calculating the coordinates of the key points of the hand of each of the sampled video images according to the occlusion condition of the key points of the hand comprises the following steps:
judging the shielding conditions of all key nodes;
if all the key nodes are not shielded, obtaining the coordinates of the key points of the hand by calculating the coordinate mean value of all the key nodes;
if the plurality of key nodes are shielded and the number of the key nodes which are not shielded is not less than 2, obtaining the coordinates of the key points of the hand by calculating the coordinate mean value of the key nodes which are not shielded;
if one key node is not shielded and all other key nodes are shielded, the key node coordinates of the key nodes which are not shielded are the hand key point coordinates.
6. The method according to claim 4 or 5, wherein the step of calculating and acquiring the key point reference coordinates of the call-making behavior based on the key point coordinate list comprises the steps of:
obtaining a trunk key point reference coordinate by calculating a coordinate mean value of all trunk key point coordinates in the trunk key point coordinate list;
and calculating the coordinate mean value of all the hand key point coordinates in the hand key point coordinate list to obtain the hand key point reference coordinates.
7. The method according to claim 6, wherein the step of determining or predicting the call-making behavior in the video image to be tested based on the key point reference coordinates and the training model comprises the steps of:
calculating and acquiring Euclidean distance between the trunk key point reference coordinate and the hand key point reference coordinate;
obtaining a prediction range according to the Euclidean distance;
analyzing and acquiring the actual human skeleton coordinates of the video image to be tested through the training model;
acquiring coordinates of all actual behavior key points based on the actual human skeleton coordinates, and calculating and acquiring an actual Euclidean distance between an actual trunk key point and an actual hand key point in the video image to be detected according to the coordinates of the actual behavior key points;
and judging or predicting the call-making behavior in the video image to be detected according to the actual Euclidean distance and the prediction range.
8. The method according to claim 7, wherein the step of determining or predicting the call-making behavior in the video image to be tested according to the actual Euclidean distance and the prediction range comprises the following steps:
defining a behavior occurrence range based on the prediction range, wherein the range size of the behavior occurrence range is smaller than the range size of the prediction range;
judging whether the actual Euclidean distance is in the behavior occurrence range;
if the actual Euclidean distance is in the behavior occurrence range, judging whether the time when the actual Euclidean distance is in the behavior occurrence range exceeds a preset first judgment time;
if the preset first judgment time is exceeded, judging that the call making behavior appears in the video image to be detected;
and if the actual Euclidean distance is not in the behavior occurrence range or the time that the actual Euclidean distance is in the behavior occurrence range does not exceed a preset first judgment time, judging that the call making behavior does not appear in the video image to be detected.
9. The method of claim 8, wherein the step of determining or predicting the call-making behavior in the video image to be tested according to the actual Euclidean distance and the prediction range further comprises the steps of:
judging whether the actual Euclidean distance is in the prediction range;
if the actual Euclidean distance is in the prediction range, judging whether the time when the actual Euclidean distance is in the prediction range exceeds a preset second judgment time;
if the preset second determination time is exceeded, predicting that the call making behavior will appear in the video image to be detected;
and if the actual Euclidean distance is not in the prediction range or the time that the actual Euclidean distance is in the prediction range does not exceed the preset second determination time, predicting that the call-making behavior does not appear in the video image to be detected.
10. A call behavior prediction system that employs a call behavior prediction method according to any one of claims 1 to 9, characterized by comprising:
the video image acquisition module is used for acquiring the sampling video image and the video image to be detected;
and the analysis and prediction module is connected with the video image acquisition module to receive the sampling video image and the video image to be tested, analyzes and acquires the behavior characteristics of the call-making behavior according to the sampling video image, and judges or predicts the call-making behavior in the video image to be tested based on the behavior characteristics.
CN202110840176.8A 2021-07-24 2021-07-24 Call-making behavior prediction method and system Pending CN113591661A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110840176.8A CN113591661A (en) 2021-07-24 2021-07-24 Call-making behavior prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110840176.8A CN113591661A (en) 2021-07-24 2021-07-24 Call-making behavior prediction method and system

Publications (1)

Publication Number Publication Date
CN113591661A true CN113591661A (en) 2021-11-02

Family

ID=78249388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110840176.8A Pending CN113591661A (en) 2021-07-24 2021-07-24 Call-making behavior prediction method and system

Country Status (1)

Country Link
CN (1) CN113591661A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469073A (en) * 2015-12-16 2016-04-06 安徽创世科技有限公司 Kinect-based call making and answering monitoring method of driver
CN108446678A (en) * 2018-05-07 2018-08-24 同济大学 A kind of dangerous driving behavior recognition methods based on skeleton character
CN110188701A (en) * 2019-05-31 2019-08-30 上海媒智科技有限公司 Dress ornament recognition methods, system and terminal based on the prediction of human body key node

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469073A (en) * 2015-12-16 2016-04-06 安徽创世科技有限公司 Kinect-based call making and answering monitoring method of driver
CN108446678A (en) * 2018-05-07 2018-08-24 同济大学 A kind of dangerous driving behavior recognition methods based on skeleton character
CN110188701A (en) * 2019-05-31 2019-08-30 上海媒智科技有限公司 Dress ornament recognition methods, system and terminal based on the prediction of human body key node

Similar Documents

Publication Publication Date Title
WO2018168042A1 (en) Image analysis device, image analysis method, and image analysis program
WO2015126031A1 (en) Person counting method and device for same
CN106027931A (en) Video recording method and server
CN112115904A (en) License plate detection and identification method and device and computer readable storage medium
CN111523416A (en) Vehicle early warning method and device based on highway ETC portal
CN106412422A (en) Focusing method, focusing device and terminal
CN111476160A (en) Loss function optimization method, model training method, target detection method, and medium
CN106384348A (en) Monitor image anomaly detection method and device
CN110348345A (en) A kind of Weakly supervised timing operating position fixing method based on continuity of movement
CN111062319B (en) Driver call detection method based on active infrared image
CN111597985A (en) Dynamic identification method and device for equipment wearing and electronic equipment
CN116758493B (en) Tunnel construction monitoring method and device based on image processing and readable storage medium
CN113591661A (en) Call-making behavior prediction method and system
CN116403162B (en) Airport scene target behavior recognition method and system and electronic equipment
CN110390313B (en) Violent action detection method and system
CN112183287A (en) People counting method of mobile robot under complex background
CN110086987B (en) Camera visual angle cutting method and device and storage medium
CN109543610A (en) Vehicle detecting and tracking method, device, equipment and storage medium
WO2022012573A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN115578664A (en) Video monitoring-based emergency event judgment method and device
CN114332695A (en) Method and device for identifying opening and closing of elevator door and storage medium
CN112749577B (en) Parking space detection method and device
CN114360055A (en) Behavior detection method, device and storage medium based on artificial intelligence
CN109684991B (en) Image processing method, image processing device, electronic equipment and storage medium
CN114463776A (en) Fall identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination