CN109948450A

CN109948450A - A kind of user behavior detection method, device and storage medium based on image

Info

Publication number: CN109948450A
Application number: CN201910131928.6A
Authority: CN
Inventors: 陈海波
Original assignee: Deep Blue Technology Shanghai Co Ltd
Current assignee: Deep Blue Technology Shanghai Co Ltd
Priority date: 2019-02-22
Filing date: 2019-02-22
Publication date: 2019-06-28

Abstract

The invention discloses a kind of user behavior detection method, device and storage medium based on image, to improve the accuracy of driver's driving making and receiving calls behavioral value result.User behavior detection method based on image, comprising: using Face datection identification model trained in advance, identify the facial image in realtime graphic and determine the predicted position information of face frame and multiple key points in the realtime graphic；Detection zone is determined according to the predicted position information of face frame；Identify the palm area and lip-region in the detection zone；Determine the minimum range between the palm area central point and each key point；And statistics is in preset duration inner lip change frequency；If the minimum range between palm area central point and each key point is less than pre-determined distance threshold value and is greater than preset times threshold value in the preset duration inner lip change frequency, it is determined that detect driving making and receiving calls behavior.

Description

A kind of user behavior detection method, device and storage medium based on image

Technical field

The present invention relates to technical field of image detection more particularly to a kind of user behavior detection methods based on image, dress It sets and storage medium.

Background technique

Making and receiving calls will lead to security risk to driver in driving process, with image processing techniques, computer vision The development of technology and depth learning technology, by being analyzed the video image acquired in driver's startup procedure to differentiate Whether driver has driving making and receiving calls behavior to have become one of the research hotspot in intelligent transport technology.

It is existing whether to have in driving making and receiving calls behavior foundation different colours space by video image analysis driver, The skin pixel Distribution value of different ethnic groups has different Clustering features, and image is transformed into YCBCR color by RGB color Space will meet skin pixel distributed areas by the way of Threshold segmentation and split, to each colour of skin block split Classified to obtain palm area, thinks that driver has driving to take electricity as long as detecting palm and face in a certain range Words behavior.

In above scheme, due to using complexion model, thus it is more demanding to camera application scenarios, and be easy On the other hand generating error detection to the object of the similar colour of skin when facial angle deviation is larger, can not detect face, and And for the movement using palm support face, easy mistake is determined as making and receiving calls behavior, reduces driver and drives to take The accuracy of phone behavioral value result.

Summary of the invention

The embodiment of the present invention provides a kind of user behavior detection method, device and storage medium based on image, to mention The accuracy of high driver's driving making and receiving calls behavioral value result.

In a first aspect, providing the user behavior detection method based on image, comprising:

Using Face datection identification model trained in advance, identifies the facial image in realtime graphic and determine the reality When image in face frame and multiple key points predicted position information；

Detection zone is determined according to the predicted position information of face frame；

Identify the palm area and lip-region in the detection zone；

Determine the minimum range between the palm area central point and each key point；And

Statistics is in preset duration inner lip change frequency；

If the minimum range between palm area central point and each key point is less than pre-determined distance threshold value and described Preset duration inner lip change frequency is greater than preset times threshold value, it is determined that detects driving making and receiving calls behavior.

Optionally, the Face datection identification model is to utilize three-layer network based on the sample image comprising different faces posture Network training obtains, and the actual position information of face frame and each key point is labeled in the sample image.

Optionally, the key point includes two eyes key points；And

Before the predicted position information according to face frame determines detection zone, further includes:

According to the location information of two eyes key points, the level angle between two is determined；And

According to the level angle between two, the corresponding correcting image of the face frame is determined.

Optionally, according to the level angle between two, the corresponding correcting image of the face frame is determined, comprising:

According to the level angle between described two, the corresponding angle of the Real-time image rotation is obtained into intermediate image；

Transformation matrix is determined according to the realtime graphic and the intermediate image；

Using the transformation matrix, the corresponding correcting image of the face frame is determined.

Optionally, transformation matrix is determined according to the realtime graphic and the intermediate image, specifically included:

3 key points are selected from the realtime graphic；

According to the 3 of selection first location informations of the key point in the realtime graphic and 3 key points described Second location information in intermediate image determines the transformation matrix.

Optionally, detection zone is determined according to the predicted position information of face frame, specifically included:

Determine that point, face frame central point are N times big with the distance between the face frame on the basis of face frame central point Small is size, extends the face frame and obtains the detection zone, and wherein N is the numerical value greater than 1.

Optionally, the key point includes two labial angle key points；And

The lip-region in the detection zone is identified in accordance with the following methods:

Using the transformation matrix, the third place information of two labial angle key points in the realtime graphic is converted to The 4th location information in the correcting image；

The lip-region in the detection zone is determined according to corresponding 4th location information of two labial angle key points.

Optionally, statistics is specifically included in preset duration inner lip change frequency:

Count the histogram change frequency in preset duration inner lip region；

Determine that histogram change frequency is the lip change frequency.

Second aspect provides a kind of user behavior detection device based on image, comprising:

First recognition unit, for identifying the people in realtime graphic using Face datection identification model trained in advance Face image and the predicted position information for determining face frame and multiple key points in the realtime graphic；

First determination unit, for determining detection zone according to the predicted position information of face frame；

Second recognition unit, for identification palm area and lip-region in the detection zone；

Second determination unit, for determining the minimum range between the palm area central point and each key point；

Statistic unit, for counting in preset duration inner lip change frequency；

Third determination unit, if be less than for the minimum range between palm area central point and each key point default Distance threshold and the preset duration inner lip change frequency be greater than preset times threshold value, it is determined that detect that driving takes electricity Words behavior.

Optionally, the key point includes two eyes key points；And

Described device, further includes:

4th determination unit, for determining detection zone according to the predicted position information of face frame in first determination unit Before domain, according to the location information of two eyes key points, the level angle between two is determined；And according between two Level angle determines the corresponding correcting image of the face frame.

Optionally, the 4th determination unit, specifically for according to the level angle between described two, will it is described in real time The corresponding angle of image rotation obtains intermediate image；Transformation matrix is determined according to the realtime graphic and the intermediate image；Benefit With the transformation matrix, the corresponding correcting image of the face frame is determined.

Optionally, the 4th determination unit is specifically used for selecting 3 key points from the realtime graphic；

Optionally, first determination unit is specifically used for determining point, face frame center on the basis of face frame central point N times of size of point and the distance between the face frame is size, and extending the face frame obtains the detection zone, wherein N For the numerical value greater than 1.

Optionally, the key point includes two labial angle key points；And

Second recognition unit is specifically used for utilizing the transformation matrix, by two labial angle key points described real-time The third place information in image is converted to the 4th location information in the correcting image；It is corresponding according to two labial angle key points The 4th location information determine the lip-region in the detection zone.

Optionally, the statistic unit, specifically for counting the histogram change frequency in preset duration inner lip region； Determine that histogram change frequency is the lip change frequency.

The third aspect provides a kind of computing device, including at least one processor and at least one processor, wherein The memory is stored with computer program, when described program is executed by the processor, so that the processor executes State either step described in the user behavior detection method based on image.

Fourth aspect provides a kind of computer-readable medium, is stored with the computer program that can be executed by computing device, When described program is run on the computing device, so that the computing device executes the above-mentioned user behavior detection side based on image Either step described in method.

User behavior detection method, device and storage medium provided in an embodiment of the present invention based on image, by preparatory Trained Face datection identification model identifies facial image therein and determines the predicted position of face frame and each key point Information determines detection zone according further to face frame, and identifies the palm area in detection zone and lip-region, The minimum range between palm area central point and each key point is calculated, and is counted in preset duration inner lip change frequency, If the minimum range between palm area central point and each key point is less than pre-determined distance threshold value and in the preset duration Inner lip change frequency is greater than preset times threshold value, it is determined that driving making and receiving calls behavior is detected, in the above process, in conjunction with hand Slap between key point minimum range and lip change frequency to determine whether there is driving making and receiving calls behavior, improve detection As a result accuracy.

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by written explanation Specifically noted structure is achieved and obtained in book, claims and attached drawing.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present invention, constitutes a part of the invention, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:

Fig. 1 a is the output schematic diagram of face image shallow-layer network in the embodiment of the present invention；

Fig. 1 b is the output schematic diagram of face image go-between in the embodiment of the present invention；

Fig. 1 c is the output schematic diagram of face image deep layer network in the embodiment of the present invention；

Fig. 1 d is the output schematic diagram of 70 degree of side face lower shallow-layer networks in the embodiment of the present invention；

Fig. 1 e is the output schematic diagram of 70 degree of side face lower shallow-layer networks in the embodiment of the present invention；

Fig. 1 f is the output schematic diagram of 70 degree of side face lower shallow-layer networks in the embodiment of the present invention；

Fig. 2 is to obtain the different image of a packet size in the embodiment of the present invention according to different down-sampling scale parameters and show It is intended to；

Fig. 3 is the implementation process diagram according to the user behavior detection method based on image of embodiment of the present invention；

Fig. 4 is that face frame is extended to the schematic diagram of detection zone in the embodiment of the present invention；

Fig. 5 is that the schematic diagram of detection zone is cut out from realtime graphic in the embodiment of the present invention；

Fig. 6 a is the palm area schematic diagram that identifies in the embodiment of the present invention；

Fig. 6 b is to determine the minimum range signal between palm area central point and five key points in the embodiment of the present invention Figure；

Fig. 7 is in the embodiment of the present invention, and lip-region identifies schematic diagram；

Fig. 8 a is the lip-region schematic diagram under the first state in the embodiment of the present invention；

Fig. 8 b is the lip-region schematic diagram under second of state in the embodiment of the present invention；

Fig. 9 a is the lip-region histogram under the first state in the embodiment of the present invention；

Fig. 9 b is the lip-region histogram under second of state in the embodiment of the present invention；

Figure 10 is in the embodiment of the present invention, and there are the image schematic diagrames of certain angle offset；

Figure 11 is postrotational intermediate image schematic diagram in the embodiment of the present invention；

Figure 12 is the structural schematic diagram of the user behavior detection device based on image in the embodiment of the present invention；

Figure 13 is the structural schematic diagram according to the computing device of embodiment of the present invention.

Specific embodiment

It is mentioned to improve recall rate and testing result accuracy, the embodiment of the present invention of driver's driving making and receiving calls behavior A kind of user behavior detection method, device and storage medium based on image is supplied.

Specification and claims in the embodiment of the present invention and the term " first " in above-mentioned attached drawing, " second " etc. are It is used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that the data used in this way It is interchangeable under appropriate circumstances, so that the embodiments described herein can be other than the content for illustrating or describing herein Sequence implement.

Referenced herein " multiple or several " refer to two or more."and/or" describes affiliated partner Incidence relation, indicate may exist three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, These three situations of individualism B.Character "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or".

Below in conjunction with Figure of description, preferred embodiment of the present invention will be described, it should be understood that described herein Preferred embodiment only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention, and in the absence of conflict, this hair The feature in embodiment and embodiment in bright can be combined with each other.

In the embodiment of the present invention, in order to improve the recall rate of driver's driving making and receiving calls behavior, using coarse-to- Fine (by slightly to essence) network design three-layer network is cascaded, using the three-layer network to the sample comprising different faces posture This image is trained to obtain Face datection identification model, wherein face frame and each key point are labeled in sample image Actual position information.

Wherein, by the mode of coarse-to-fine, shallow-layer network (supports arbitrary image ruler by the way of full convolution Very little input), effect mainly filters out candidate region, is the output schematic diagram of shallow-layer network, middle layer as shown in Figure 1a The effect of network is as shown in Figure 1 b to preliminary tuning detection block position in order to filter out the error detection of shallow-layer network The output schematic diagram of middle layer network.Deep layer network is then to adjust face frame that middle layer network obtains and key point position It is excellent, it is the output schematic diagram of deep layer network as illustrated in figure 1 c.When it is implemented, face frame is rectangle frame, location information can With using the position coordinates in the upper left corner of rectangle frame and the lower right corner vertex Liang Ge mark, face key point may include two eyes The location information of key point, nose key point and two corners of the mouth key points, each key point can be using each key point Position coordinates indicate.

In the embodiment of the present invention, in order to allow trained network can the sample image to each scale preferably detected, It solves planned network convolution kernel size and lacks multifarious drawback, it, can be according to input figure to before operation before carrying out image As size and the required size for detecting minimum face, down-sampling ratio calculating operation is carried out to original image dynamic, obtains adopting under one group Sample scale parameter simultaneously carries out image pyramid operation, in this way, it is directed to same sample image, it can be according to different down-sampling ratios Parameter obtains the different image of a packet size, and training network is enabled preferably to adapt to different picture size.As shown in Fig. 2, It is to obtain the different image schematic diagram of a packet size according to different down-sampling scale parameters.

When it is implemented, for the LOSS (damage of shallow-layer network (differentiating whether sample image includes face, i.e. sorter network) Lose) function be related to it is as follows:

L_i=-y_i log(p_i)+(1-y_i)(1-log(p_i))

Wherein, x_iFor the input of training network, i.e. sample image, y_iFor the true tag marked in sample image, i.e. face The actual position information on two vertex and 5 key points of frame, each position information can be using a pair of of abscissa and ordinates It indicates, therefore, y_iThe feature vector of one 14 dimension can be expressed as (by two vertex of 5 key point coordinates and face frame Coordinate composition).p_iFor the output of training network, i.e. the predicted position information of face frame and each key point, equally, p_iIt can It is expressed as the feature vector (being made of two apex coordinates of 5 key point coordinates and face frame) of one 14 dimension.

The loss function of face frame and key point location information can use following formula: L_i=‖ y_i-p_i‖。

Traditional detection method uses complexion model, requires video source to be visible images video source, and the present invention is real It is lower to video source requirement to apply the Face datection identification model that example training obtains, infrared image, visible images are applicable in, To improve the recall rate of driving making and receiving calls behavior.Moreover, traditional detection method is deviated in facial angle at 45 degree or more When, it can not just detect face, due to can't detect face, will lead to the position letter that can not assess palm apart from face key point Breath, to occur normally realize detection and early warning in the case that driver's side face is made a phone call, there are biggish security risks. Algorithm used in the embodiment of the present invention has robustness good, and detection is able to achieve within 75 degree in facial angle offset.The party Method can significantly promote the recall rate in the case of side face is made a phone call, respectively as shown in Fig. 1 d, Fig. 1 e and Fig. 1 f, in side face Testing result in the case where 70 degree, in the case that Fig. 1 d is 70 degree of side face, the output of shallow-layer network is as a result, Fig. 1 e is side face 70 In the case where degree, in the case that the output of go-between is as a result, Fig. 1 f is 70 degree of side face, the output result of deep layer network out.

For the palm area in detection image, in the embodiment of the present invention, a kind of palm detection model training is additionally provided Method.When it is implemented, due to palm area it is smaller and it is easy blocked by face, so to detection network requirement it is higher, this patent It, can be very well using the detection network for being based on ssd (Single Shot MultiBox Detector, single channel algorithm of target detection) Realization palm area detection.Ssd main thought be equably the different location on network different layers, carry out intensive sampling, Different scale and length-width ratio can be used when sampling, directly divided after then extracting feature using CNN (convolutional neural networks) Class and recurrence, whole process only need a step, so relatively traditional algorithm that palm detection is realized using ycbcr complexion model, Its advantage is that precision is higher.In when specific implementation, the palm area in trained detection network detection image can use.

When it is implemented, the training sample of palm detection model is the sample image for including palm, sample image acceptance of the bid It is marked with the location information of palm area.

The Face datection identification model and palm detection model obtained based on training, the embodiment of the invention provides a kind of bases In the user behavior detection method of image, as shown in figure 3, may comprise steps of:

S31, using Face datection identification model trained in advance, identify the facial image in realtime graphic and determine real When image in face frame and multiple key points predicted position information.

When it is implemented, face frame is rectangle frame, identified first in real time using Face datection identification model trained in advance Whether there is facial image in image, if so, then further determining that in the location information and face frame of face frame 5 key points Location information, wherein the location information of face frame can be using the apex coordinate expression in the upper left corner and the lower right corner of rectangle frame, respectively The location information of a key point can use key point coordinate representation, the output result of Face datection identification model such as Fig. 1 c institute Show.

It include 5 key points in the embodiment of the present invention, in face frame, it is specific as follows: two eyes key points, a nose Sub- key point and two labial angle key points, when it is implemented, can identify each key according to the ordinate of each key point Point, for example, by taking realtime graphic lower-left angular vertex is origin as an example, then eyes key point, nose key point and labial angle key point Ordinate is sequentially reduced, that is, in the key point of 5 identified, is sorted according to ordinate, sort first and second be eyes close Key point, the ordinate of the two may be identical, it is also possible to which different, sequence third position is nose key point, sequence the 4th and the Five are two labial angle key points.

S32, detection zone is determined according to the predicted position information of face frame.

When it is implemented, can be extended to face frame to improve the recall rate of driving making and receiving calls behavior by hand It slaps detection zone and head and shoulder region is expanded to by face frame.When it is implemented, can on the basis of the central point of face frame point, face N times of size of the distance between frame central point and the face frame is size, extends the face frame and obtains the detection zone, Wherein N is the numerical value greater than 1.Wherein, the size of N can be configured according to actual needs, the embodiment of the present invention to this not into Row limits, for example, N can be set as 1.5, also can be set as 2.By taking N=2 as an example, as shown in figure 4, it is to expand face frame Exhibition is the schematic diagram of detection zone.Determine detection zone, can filter non-interesting region without detection, and can be by palm Receptive field in detection model is promoted, by taking mode input is fixed as 300*300 as an example, it is assumed that palm is 1/ in full figure ratio 10, the accounting ratio in detection zone will be greater than 1/10, so as to promote the recall rate of palm.As shown in figure 5, its be from The schematic diagram of detection zone is cut out in realtime graphic.

S33, palm area and lip-region in recognition detection region.

When it is implemented, utilizing the palm area in palm detection model detection trained in advance and recognition detection region. It as shown in Figure 6 a, is the palm area schematic diagram identified.

For lip-region, in the embodiment of the present invention, lip-region that can in accordance with the following methods in recognition detection region: The lip-region in detection zone is determined according to the corresponding location information of two labial angle key points.

Specifically, according to the position coordinates of two labial angle key points, using the central point of two labial angle key points as width, with Wide M times is as high interception lip-region, wherein M is the numerical value greater than 0 and less than 1, when it is implemented, the value of M can root It is configured according to actual needs, to this without limiting in the embodiment of the present invention, for example, can be set to M=1/3.Such as Fig. 7 institute Show, identifies schematic diagram for lip-region.

S34, minimum range between palm area central point and each key point is determined.

When it is implemented, can determine palm according to following formula after identifying the palm area in detection zone Minimum range between regional center point and each key point:

Distane=min ((p_center.x-p_i.x)²+(p_center.y-p_i.y)²)

p_center.x the abscissa of palm area central point, p are indicated_center.y the ordinate of palm area central point is indicated, p_i.x the abscissa of key point i, p are indicated_i.y the ordinate of key point i is indicated.

When it is implemented, calculating separately palm area central point to the distance of each key point using above-mentioned formula, such as scheme Shown in 6b, to determine the minimum range schematic diagram between palm area central point and five key points, minimum therein is selected Distance is as the minimum range between palm area central point and each key point.

S35, statistics are in preset duration inner lip change frequency.

When it is implemented, lip state analysis, major way is to count the histogram distribution of lip whithin a period of time, is come Judge in the period, if lip variation has occurred.To judge whether driver occurred row of speaking during this period of time For.Respectively as figures 8 a and 8 b show, the corresponding lip-region histogram of lip state under two states respectively such as Fig. 9 a and Shown in Fig. 9 b, when lip speaks event, there are larger differences when histogram and mouth are closed.

Based on this, when it is implemented, step S35 can be implemented in accordance with the following methods: statistics is in preset duration inner lip area The histogram change frequency in domain；Determine that histogram change frequency is the lip change frequency.

It should be noted that when it is implemented, step S34 and step S35 have no and certain successive execute sequence, step S35 can also be executed prior to step S34, alternatively, two steps also may be performed simultaneously.

If minimum range between S36, palm area central point and each key point be less than pre-determined distance threshold value and The preset duration inner lip change frequency is greater than preset times threshold value, it is determined that detects driving making and receiving calls behavior.

When it is implemented, after detecting driving making and receiving calls behavior voice prompting can be carried out to driver.

When it is implemented, there are certain angles for facial image in the realtime graphic of acquisition there is likely to be following problems Offset, as shown in Figure 10.It in this case,, can be right first in the embodiment of the present invention in order to improve the accuracy of testing result Realtime graphic is corrected, and the direct picture of face, i.e. angular distortion in removal realtime graphic has been obtained, to realize scene angle The normalization of degree, so as to more accurately extract lip-region and palm area.

When it is implemented, the correcting image of detection zone can be obtained according to following below scheme: according to two eyes key points Location information, determine the level angle between two；And according to the level angle between two, determine the face frame pair The correcting image answered.

Specifically, according to the level angle between described two, the corresponding angle of the Real-time image rotation obtained Between image, postrotational intermediate image is as shown in figure 11；Transformation matrix is determined according to the realtime graphic and the intermediate image； Using the transformation matrix, the corresponding correcting image of the face frame is determined.During the corresponding angle of Real-time image rotation is obtained Between after image, for the face frame and each key point identified, can correspondingly determine that the vertex position of face frame is sat Be marked with and each key point coordinate intermediate image after rotation in corresponding position coordinates.

In the embodiment of the present invention, mainly passes through the correspondence mappings relationship of three pairs of coordinate points, determine transformation matrix.Specifically Ground can select 3 key points from realtime graphic, according to first of the 3 of selection key points in the realtime graphic Confidence breath and the second location information of 3 key points in the intermediate image, determine the transformation matrix.Specific implementation When, transformation matrix can be determined according to following formula:

(x'_i,y'_i, 1) and=map_matrix. (x_i,y_i,1)^T

Wherein, x_i,y_iFor the key point coordinate of realtime graphic, x'_i,y'_iThe key point coordinate of intermediate image.By obtaining Transformation matrix, the purpose of the correction of image may be implemented, and be calculated variation after key point position.Utilize transition matrix The lip-region obtained after being corrected is substantially horizontal, so as to provide unified benchmark for lip-region.It is based on This, extracts lip-region from the face frame cheap there are certain angle, can implement according to following below scheme: utilizing the transformation Two labial angle key points are converted to the 4th in the correcting image in the third place information in the realtime graphic by matrix Location information；The lip-region in the detection zone is determined according to corresponding 4th location information of two labial angle key points.

User behavior detection method provided in an embodiment of the present invention based on image is known by Face datection trained in advance Other model identifies facial image therein and determines the predicted position information of face frame and each key point, further root Detection zone is determined according to face frame, and identifies the palm area in detection zone and lip-region, is calculated in palm area Minimum range between heart point and each key point, and count in preset duration inner lip change frequency, if in palm area Minimum range between heart point and each key point is less than pre-determined distance threshold value and in the preset duration inner lip change frequency Greater than preset times threshold value, it is determined that driving making and receiving calls behavior is detected, in the above process, in conjunction between palm and key point Minimum range and lip change frequency to determine whether there is driving making and receiving calls behavior, improve the accuracy of testing result.

In addition, the user behavior detection method provided in an embodiment of the present invention based on image, infrared and two kinds of visible light Making and receiving calls behavioral value can be realized under environment well, and while there is very high detection rate, curbs hand on head The case where not making a phone call nearby but really.

Based on the same inventive concept, a kind of user behavior detection dress based on image is additionally provided in the embodiment of the present invention It sets, since the principle that above-mentioned apparatus solves the problems, such as is similar to the user behavior detection method based on image, above-mentioned apparatus Implementation may refer to the implementation of method, and overlaps will not be repeated.

It as shown in figure 12, is the structural representation of the user behavior detection device provided in an embodiment of the present invention based on image Figure, comprising:

First recognition unit 121, for identifying in realtime graphic using Face datection identification model trained in advance Facial image and the predicted position information for determining face frame and multiple key points in the realtime graphic；

First determination unit 122, for determining detection zone according to the predicted position information of face frame；

Second recognition unit 123, for identification palm area and lip-region in the detection zone；

Second determination unit 124, for determining the minimum range between the palm area central point and each key point；

Statistic unit 125, for counting in preset duration inner lip change frequency；

Third determination unit 126, if the minimum range between palm area central point and each key point is less than Pre-determined distance threshold value and the preset duration inner lip change frequency be greater than preset times threshold value, it is determined that detect that driving connects It makes a phone call behavior.

Optionally, the key point includes two eyes key points；And

Described device, further includes:

Optionally, the key point includes two labial angle key points；And

For convenience of description, above each section is divided by function describes respectively for each module (or unit).Certainly, exist Implement to realize the function of each module (or unit) in same or multiple softwares or hardware when the present invention.

After the user behavior detection method based on image and device for describing exemplary embodiment of the invention, connect Get off, introduces the computing device of another exemplary embodiment according to the present invention.

Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment, complete The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here Referred to as circuit, " module " or " system ".

In some possible embodiments, computing device according to the present invention can include at least at least one processing Device and at least one processor.Wherein, the memory is stored with program code, when said program code is by the processing When device executes, so that the processor executes the base of the illustrative embodiments various according to the present invention of this specification foregoing description Step in the user behavior detection method of image.For example, the processor can execute step S31 as shown in Figure 3, Using Face datection identification model trained in advance, identifies the facial image in realtime graphic and determine face in realtime graphic The predicted position information and step S32 of frame and multiple key points determine detection zone according to the predicted position information of face frame； And step S33, palm area and lip-region in recognition detection region；Step S34, palm area central point and each is determined Minimum range between a key point；S35, statistics are in preset duration inner lip change frequency；If S36, palm area center Minimum range between point and each key point is less than pre-determined distance threshold value and big in the preset duration inner lip change frequency In preset times threshold value, it is determined that detect driving making and receiving calls behavior.

The computing device 130 of this embodiment according to the present invention is described referring to Figure 13.The meter that Figure 13 is shown Calculating device 130 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.

As shown in figure 13, computing device 130 is showed in the form of universal computing device.The component of computing device 130 can be with Including but not limited to: at least one above-mentioned processor 131, above-mentioned at least one processor 132, connection different system components (packet Include memory 132 and processor 131) bus 133.

Bus 133 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, processor or the local bus using any bus structures in a variety of bus structures.

Memory 132 may include the readable medium of form of volatile memory, such as random access memory (RAM) 1321 and/or cache memory 1322, it can further include read-only memory (ROM) 1323.

Memory 132 can also include program/utility 1325 with one group of (at least one) program module 1324, Such program module 1324 includes but is not limited to: operating system, one or more application program, other program modules and It may include the realization of network environment in program data, each of these examples or certain combination.

Computing device 130 can also be communicated with one or more external equipments 134 (such as keyboard, sensing equipment etc.), also Can be enabled a user to one or more equipment interacted with computing device 130 communication, and/or with make the computing device The 130 any equipment (such as router, modem etc.) that can be communicated with one or more of the other calculating equipment are led to Letter.This communication can be carried out by input/output (I/O) interface 135.Also, computing device 130 can also be suitable by network Orchestration 136 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as because of spy Net) communication.As shown, network adapter 136 is communicated by bus 133 with other modules for computing device 130.It should Understand, although not shown in the drawings, other hardware and/or software module can be used in conjunction with computing device 130, including but unlimited In: microcode, device driver, redundant processor, external disk drive array, RAID system, tape drive and data Backup storage system etc..

In some possible embodiments, each side of the user behavior detection method provided by the invention based on image Face is also implemented as a kind of form of program product comprising program code, when described program product on a computing device When operation, said program code is shown for making the computer equipment execute the various according to the present invention of this specification foregoing description Step in the user behavior detection method based on image of example property embodiment, for example, the computer equipment can execute Step S31 as shown in Figure 3, using Face datection identification model trained in advance, identify the face figure in realtime graphic Picture simultaneously determines the predicted position information and step S32 of face frame and multiple key points in realtime graphic, according to the prediction of face frame Location information determines detection zone；And step S33, palm area and lip-region in recognition detection region；Step S34, Determine the minimum range between palm area central point and each key point；S35, statistics are in the variation time of preset duration inner lip Number；If the minimum range between S36, palm area central point and each key point is less than pre-determined distance threshold value and described pre- If duration inner lip change frequency is greater than preset times threshold value, it is determined that detect driving making and receiving calls behavior.

Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, red The system of outside line or semiconductor, device or device, or any above combination.The more specific example of readable storage medium storing program for executing (non exhaustive list) includes: the electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc Read memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.

The program product for the user behavior detection based on image of embodiments of the present invention can use portable Compact disk read-only memory (CD-ROM) and including program code, and can run on the computing device.However, journey of the invention Sequence product is without being limited thereto, and in this document, readable storage medium storing program for executing can be any tangible medium for including or store program, the journey Sequence can be commanded execution system, device or device use or in connection.

Readable signal medium may include in a base band or as the data-signal that carrier wave a part is propagated, wherein carrying Readable program code.The data-signal of this propagation can take various forms, including --- but being not limited to --- electromagnetism letter Number, optical signal or above-mentioned any appropriate combination.Readable signal medium can also be other than readable storage medium storing program for executing it is any can Read medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or Program in connection.

The program code for including on readable medium can transmit with any suitable medium, including --- but being not limited to --- Wirelessly, wired, optical cable, RF etc. or above-mentioned any appropriate combination.

The program for executing operation of the present invention can be write with any combination of one or more programming languages Code, described program design language include object oriented program language-Java, C++ etc., further include conventional Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's calculating Upper side point is executed on a remote computing or is executed in remote computing device or server completely.It is being related to far Journey calculates in the situation of equipment, and remote computing device can pass through the network of any kind --- including local area network (LAN) or extensively Domain net (WAN)-be connected to user calculating equipment, or, it may be connected to external computing device (such as utilize Internet service Provider is connected by internet).

It should be noted that although being referred to several unit or sub-units of device in the above detailed description, this stroke It point is only exemplary not enforceable.In fact, embodiment according to the present invention, it is above-described two or more The feature and function of unit can embody in a unit.Conversely, the feature and function of an above-described unit can It is to be embodied by multiple units with further division.

In addition, although describing the operation of the method for the present invention in the accompanying drawings with particular order, this do not require that or Hint must execute these operations in this particular order, or have to carry out shown in whole operation be just able to achieve it is desired As a result.Additionally or alternatively, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/or by one Step is decomposed into execution of multiple steps.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of user behavior detection method based on image characterized by comprising

Using Face datection identification model trained in advance, identifies the facial image in realtime graphic and determine the real-time figure The predicted position information of face frame and multiple key points as in；

Identify the palm area and lip-region in the detection zone；

Statistics is in preset duration inner lip change frequency；

If the minimum range between palm area central point and each key point is less than pre-determined distance threshold value and described default Duration inner lip change frequency is greater than preset times threshold value, it is determined that detects driving making and receiving calls behavior.

2. the method as described in claim 1, which is characterized in that the Face datection identification model is based on including different faces The sample image of posture is obtained using three-layer network training, and face frame and each key point are labeled in the sample image Actual position information.

3. method according to claim 1 or 2, which is characterized in that the key point includes two eyes key points；And

4. method as claimed in claim 3, which is characterized in that according to the level angle between two, determine the face frame Corresponding correcting image, comprising:

5. method as claimed in claim 4, which is characterized in that determine transformation according to the realtime graphic and the intermediate image Matrix specifically includes:

3 key points are selected from the realtime graphic；

According to the 3 of selection first location informations of the key point in the realtime graphic and 3 key points in the centre Second location information in image determines the transformation matrix.

6. method as claimed in claim 4, which is characterized in that detection zone is determined according to the predicted position information of face frame, It specifically includes:

Determine that point, face frame central point and N times of size of the distance between the face frame are on the basis of face frame central point Size extends the face frame and obtains the detection zone, and wherein N is the numerical value greater than 1.

7. method as claimed in claim 6, which is characterized in that the key point includes two labial angle key points；And

Using the transformation matrix, the third place information of two labial angle key points in the realtime graphic is converted to described The 4th location information in correcting image；

8. the method as described in claim 1, which is characterized in that statistics is specifically included in preset duration inner lip change frequency:

Count the histogram change frequency in preset duration inner lip region；

Determine that histogram change frequency is the lip change frequency.

9. a kind of user behavior detection device based on image characterized by comprising

First recognition unit, for identifying the face figure in realtime graphic using Face datection identification model trained in advance Picture and the predicted position information for determining face frame and multiple key points in the realtime graphic；

Statistic unit, for counting in preset duration inner lip change frequency；

Third determination unit, if being less than pre-determined distance for the minimum range between palm area central point and each key point Threshold value and the preset duration inner lip change frequency be greater than preset times threshold value, it is determined that detect driving making and receiving calls row For.

10. device as claimed in claim 9, which is characterized in that the Face datection identification model is based on including different people The sample image of face posture is obtained using three-layer network training, and face frame and each key point are labeled in the sample image Actual position information.

11. the device as described in claim 9 or 10, which is characterized in that the key point includes two eyes key points；And

Described device, further includes:

4th determination unit, for first determination unit according to the predicted position information of face frame determine detection zone it Before, according to the location information of two eyes key points, determine the level angle between two；And according to the level between two Angle determines the corresponding correcting image of the face frame.

12. device as claimed in claim 11, which is characterized in that

4th determination unit, specifically for according to the level angle between described two, by the Real-time image rotation phase The angle answered obtains intermediate image；Transformation matrix is determined according to the realtime graphic and the intermediate image；Utilize the transformation Matrix determines the corresponding correcting image of the face frame.

13. device as claimed in claim 12, which is characterized in that

4th determination unit is specifically used for selecting 3 key points from the realtime graphic；

14. device as claimed in claim 12, which is characterized in that

First determination unit is specifically used for determining the point on the basis of face frame central point, face frame central point and the people N times of size of the distance between face frame is size, extends the face frame and obtains the detection zone, and wherein N is the number greater than 1 Value.

15. device as claimed in claim 14, which is characterized in that the key point includes two labial angle key points；And

Second recognition unit is specifically used for utilizing the transformation matrix, by two labial angle key points in the realtime graphic In the third place information be converted to the 4th location information in the correcting image；According to two labial angle key points corresponding Four location informations determine the lip-region in the detection zone.

16. device as claimed in claim 9, which is characterized in that

The statistic unit, specifically for counting the histogram change frequency in preset duration inner lip region；Determine histogram Change frequency is the lip change frequency.

17. a kind of computing device, which is characterized in that including at least one processor and at least one processor, wherein institute It states memory and is stored with computer program, when described program is executed by the processor, so that the processor perform claim It is required that the step of 1~8 any claim the method.

18. a kind of computer-readable medium, which is characterized in that it is stored with the computer program that can be executed by computing device, when When described program is run on the computing device, so that the computing device perform claim requires the step of 1~8 any the method Suddenly.