CN114783043B

CN114783043B - Child behavior track positioning method and system

Info

Publication number: CN114783043B
Application number: CN202210721671.1A
Authority: CN
Inventors: 郝永富; 赵钺; 戴卓学; 王潇
Original assignee: Hangzhou Angor Intelligent Technology Co ltd
Current assignee: Hangzhou Angor Intelligent Technology Co ltd
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2022-09-20
Anticipated expiration: 2042-06-24
Also published as: CN114783043A

Abstract

The invention discloses a child behavior track positioning method and a system, wherein the child behavior track positioning method comprises the steps of constructing a model library, and extracting the characteristics of images in the model library to obtain a first characteristic library; respectively carrying out face detection and human body detection on the collected video so as to generate a face detection frame and a human body detection frame; carrying out first matching on the face detection frame and the first feature library, carrying out feature extraction on the human body detection frame based on a first matching result so as to obtain a second feature library, carrying out face tracking matching on the object to be recognized which is successfully matched for the first time, and carrying out second matching based on the second feature library based on a face tracking matching result; recording the coordinate position of the first matching success or the face tracking matching success or the second matching success in the image sequence, thereby generating a track; through the comprehensive human face and human body information identification, the problems that the human face is shielded and the body faces back to the camera are solved, and the behavior track of the child in the classroom is accurately tracked.

Description

Child behavior track positioning method and system

Technical Field

The invention relates to the technical field of educational information intellectualization, in particular to a method and a system for positioning a child behavior track.

Background

The preschool education mechanism for the children of 1-6 years old mainly comprises a support class and a kindergarten, on one hand, the children at the stage have various and unfixed classes, the moving range of the children in a classroom is large, the children can have more shielding phenomena and non-normal face conditions, and on the other hand, the current universal face recognition algorithm is mainly adult-oriented, so that the recognition rate of the children is low. And the accurate activity distribution condition of each children in the classroom of tracking location can make things convenient for mr analysis children to the stay time of different district angles in the classroom to deduce children's interest point, supplementary accurate objective completion of mr continues observation and analysis to children, lets parent's mr more know children, and then accomplishes to carry out the teaching according to the circumstances, consequently needs accurate tracking location each children activity distribution condition in the classroom. The existing analysis technology for fixed class classes mainly depends on a face recognition technology, the recognition accuracy cannot meet the requirement, and the process of long-term recognition is that children in class scenes are fixed and each child is accurately tracked and positioned. The existing application scenes with large personnel variation, such as market scenes, are not suitable for the application of children tracking and positioning in a classroom.

Disclosure of Invention

The invention provides a method and a system for positioning a child behavior track to overcome the defects of the technologies, on one hand, the problems of face shielding and body back to camera are solved by integrating face and human body information identification, on the other hand, a dynamically updated identification training algorithm is adopted, the data of the current class of children are added into a training data set, the algorithm can be adaptive to the current class of children, the algorithm accuracy is higher and higher along with time accumulation, and a model is insensitive to face angle, age change, body posture and clothes wearing through a training sample generation strategy, so that the behavior track of the children in a classroom can be accurately tracked.

The technical scheme adopted by the invention for overcoming the technical problems is as follows: the invention provides a children behavior track positioning method in a first aspect, which comprises the following steps: constructing a model base corresponding to the object to be recognized, and performing feature extraction on each object to be recognized in the model base through a first dynamic convolutional neural network to obtain a first feature base; respectively carrying out face detection and human body detection on the acquired video so as to obtain face detection frames and human body detection frames which correspond to the video in an image sequence one by one; matching the face detection frame in the image sequence with the first feature library for the first time, performing normalization processing on the human body detection frame based on a first matching result, and performing feature extraction through a second dynamic convolution neural network to obtain a second feature library; carrying out face tracking matching on the object to be recognized which is successfully matched for the first time, and carrying out secondary matching based on a second feature library based on the face tracking matching result; and recording the coordinate position of the object to be recognized, which is successfully matched for the first time or successfully matched for face tracking or successfully matched for the second time, in the image sequence in real time, so as to correspondingly generate the track of the object to be recognized. Correspondingly updating a training sample library of the first dynamic convolution neural network or the second dynamic convolution neural network based on the image of the object to be recognized, which is successfully matched for the first time or successfully matched for face tracking or successfully matched for the second time; and adding the training samples based on the training samples in the training sample library and the attributes of the human faces or the human bodies, so as to update the first dynamic convolution neural network and the second dynamic convolution neural network.

Further, the constructing a model base corresponding to the object to be recognized, and performing feature extraction on each object to be recognized in the model base through a first dynamic convolution neural network to obtain a first feature base specifically includes: constructing a face model library based on identity information Y of an object to be recognized and a plurality of photos P corresponding to the identity information Y; processing all photos in the human model library to obtain an image matrix I; and inputting the image matrix I into a first dynamic convolution neural network for convolution processing, and representing the facial image corresponding to the identity information Y by using the processed characteristic vector, wherein the set of the facial images is used as a first characteristic library.

Further, carry out face detection and human body detection to the image of gathering respectively to obtain the face detection frame and the human body detection frame of one-to-one in the image sequence of corresponding video, specifically include: the acquired video record at least comprises a plurality of two-dimensional image sequences X = { X1, X2, … …, Xt-1, Xt, }, wherein Xt is a two-dimensional image at any time t; performing face detection and human body detection on the two-dimensional image sequence to obtain m human face detection frames FaceBoxset = (F1, F2.,. Fm) and n human body detection frames BodyBoxset = (B1, B2.. Bn), wherein m is greater than or equal to 0, and n is greater than or equal to 0; calculating the contact ratio of any face detection frame Fi in the face detection frame faceBoxset and the human body detection frame Bj in the human body detection frame BodyBoxset

Wherein i is less than or equal to m, and j is less than or equal to n; k one-to-one corresponding human faces and persons are obtained based on preset contact ratio threshold value and contact ratio UA pair of volume detection frames, Boxset = { (F1, B1), (F2, B2),. ·, (Fk, Bk) }, where k is less than or equal to the minimum of m and n.

Due to the difference of algorithms for face detection and human body detection, and the difference of postures of children, the obtained m personal face detection frames and n personal body detection frames do not absolutely correspond to each other one by one.

Further, the first matching between the face detection frame in the image and the first feature library specifically includes: and matching a first face detection frame of the two-dimensional image sequence corresponding to the first time point with the first feature library for the first time, confirming an object to be identified corresponding to the first face detection frame if the first matching is successful, and repeatedly executing the two-dimensional image sequence corresponding to the subsequent time point to be matched with the first feature library for the first time if the first matching is unsuccessful.

After the face matching with the face feature library is successful, the identity of the child is confirmed, and therefore follow-up face tracking is conducted.

Further, the performing face tracking matching on the object to be recognized, which is successfully matched for the first time, specifically includes: ith personal face detection frame on t-time point image sequence

And j individual face detection frame on t +1 time point image sequence

Performing similarity judgment, wherein the similarity is judged

=

,

And

respectively representing the image similarity of two detection framesAnd the degree of similarity of the motion trajectory prediction,

is a weight coefficient; if the similarity is similar

And if the similarity is larger than or equal to the face similarity threshold value, the face tracking matching is successful.

The calculated amount of face tracking according to similarity judgment is much less compared with that of face tracking according to face recognition, so that the face tracking efficiency is improved through similarity judgment.

Further, after normalization processing is performed on the human body detection frame based on the first matching result, feature extraction is performed through a second dynamic convolution neural network, so that a second feature library is obtained, and the method specifically includes: if the first matching is successful, the human body detection frame B corresponding to the first human face detection frame is subjected to _i Cutting to obtain a human body image; and carrying out gesture recognition on the human body image to obtain a human body key point set

(ii) a By reference point set

Obtaining a human body key point set

The transformation matrix T of (a), wherein,

is a set of reference points in a standard upright attitude; normalizing the human body image based on the transformation matrix T; and inputting the human body image subjected to normalization processing into a second dynamic convolution neural network, and taking the obtained feature vector as a human body feature, wherein the set of the human body features is taken as a second feature library.

Considering the action characteristics of children, the posture normalization processing of human bodies is needed.

Further, if the face tracking matching is unsuccessful, feature extraction is carried out based on the first human body detection frame to obtain first human body features, human body similarity comparison is carried out on the first human body features and the human body features of the second feature library, and if the similarity comparison result is larger than or equal to a human body similarity threshold value, the second matching is successful.

Further, if the first matching is successful, intercepting the face image corresponding to the face detection frame, extracting features of the face image to obtain new face features, and updating the corresponding face feature library.

Further, the motion trail prediction similarity

Wherein di and dj represent the state information of the i-th and j-th detection boxes, respectively, wherein the state information at least comprises (u, v, w, h,

,

) (u, v) are coordinates of the center point of the detection frame, (w, h) correspond to the width and height of the detection frame respectively,

,

respectively corresponding to the imagesThe speed in the sequence can obtain the state information of the detection frame at the time t +1 from the state information of the detection frame at the time t

。

Further, the adding of the training samples based on the samples in the training sample library and the attributes of the human face or the human body specifically includes: classifying images in a face sample library and a human body sample library in a training sample library respectively based on a face and human body attribute classifier so as to obtain a plurality of face attributes and a plurality of human body attributes of corresponding images respectively; and fusing and adding the images corresponding to different human body attributes to a human body sample library.

The richness of the sample is increased, so that the dynamic convolution neural network model is insensitive to the angle of the face, the change of the age, the posture of the body and the wearing of clothes, and the identification accuracy is not influenced even if the angle, the age, the posture and the like of the face are changed.

Further, the first dynamic convolutional neural network and the second dynamic convolutional neural network are preset as ResNet neural networks, the network weights of the last two layers are updated in the training process, and the weights of other layers are fixed.

The second aspect of the present invention further provides a children behavior track positioning system, including: the system comprises a data acquisition module, a data storage module, a data processing module, a result presentation module and an identity information input module, wherein the data acquisition module, the data storage module, the data processing module and the result presentation module are sequentially coupled with one another, and the identity information input module is coupled with the data processing module; the data storage module is at least used for storing video data; the identity information input module is used for storing the identity information of the object to be identified;

the data processing module is used for carrying out identity recognition and video data processing on the object to be recognized based on the input identity information and the input video data; the result presentation module is used for displaying the behavior track of the child based on the processing result of the data processing module.

The beneficial effects of the invention are:

1. the human face and human body information are comprehensively utilized for recognition, so that the problems of human face shielding and human body back to camera are solved, and high-precision recognition is obtained;

2. the face tracking and the face recognition are combined, so that the recognition accuracy rate of the time-span period is improved;

3. the feature library is dynamically updated according to the matching result, so that multi-angle and multi-azimuth face recognition is realized, and the recognition precision is improved;

4. the tracks successfully matched each time are recorded in real time, and not only the tracks of the human face and the human body are correlated after tracking, so that the possibility that the tracks are mistakenly stringed on other people due to high tracking operation speed and low accuracy is reduced;

5. a dynamically updated recognition training algorithm is designed, data of the current class of children are added into a training data set, the algorithm can be self-adaptive to the current class of children, the algorithm accuracy is higher and higher along with time accumulation, and the accuracy of face recognition and human body recognition is higher and higher;

6. a training sample generation strategy is provided, so that the model is insensitive to face angle, age change, body posture and clothes wearing, and the recognition accuracy is not influenced even if the face angle, age, posture and the like are changed;

7. aiming at the behavior characteristics of children in the classroom, stable recognition is ensured, and posture normalization processing is carried out on the human body feature library.

Drawings

FIG. 1 is a flow chart of a child behavior trajectory positioning method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a method for positioning a child behavior trajectory according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an updating method of a dynamic convolutional neural network of a child behavior trajectory positioning system according to an embodiment of the present invention;

fig. 4 is a schematic block diagram of a child behavior trajectory positioning system according to an embodiment of the present invention.

Detailed Description

For further understanding of the invention, some of the terms mentioned in the present application will first be explained:

face detection: for any given image, searching the image by adopting a certain method strategy to determine whether the image contains human faces, returning the position and size of each human face, and generally outputting the image in a rectangular frame form.

Face recognition: and after the face is detected in the image, the identity of the face is identified by comparing the characteristic information of the face.

Face tracking: aiming at an image sequence in a video, on the premise that a face is detected in a current image, the position and the size of the corresponding face are continuously acquired in a subsequent image sequence.

In order to facilitate a better understanding of the invention for those skilled in the art, the invention will be described in further detail with reference to the accompanying drawings and specific examples, which are given by way of illustration only and do not limit the scope of the invention.

The invention relates to a child behavior track positioning method which comprises the steps of constructing a model base corresponding to an object to be identified, and extracting the characteristics of each object to be identified in the model base through a first dynamic convolution neural network to obtain a first characteristic base; respectively carrying out face detection and human body detection on the acquired video so as to obtain face detection frames and human body detection frames which correspond to the video in an image sequence one by one; matching a face detection frame in the image sequence with a first feature library for the first time, performing normalization processing on the human body detection frame based on a first matching result, and performing feature extraction through a second dynamic convolution neural network to obtain a second feature library; carrying out face tracking matching on the object to be recognized which is successfully matched for the first time, and carrying out secondary matching based on a second feature library based on the face tracking matching result; and recording the coordinate position of the object to be recognized, which is successfully matched for the first time or successfully matched for face tracking or successfully matched for the second time, in the image sequence in real time, so as to correspondingly generate the track of the object to be recognized. Correspondingly updating the images of the objects to be recognized to a training sample library of the first dynamic convolution neural network or the second dynamic convolution neural network based on the images of the objects to be recognized which are successfully matched for the first time or successfully matched for face tracking or successfully matched for the second time; the first dynamic convolution neural network and the second dynamic convolution neural network are based on samples in a training sample library, training samples in the training sample library and human face or body attributes, and training samples are added, so that the first dynamic convolution neural network and the second dynamic convolution neural network are updated.

As shown in fig. 1, a flow chart of a method for positioning a child behavior trajectory according to this embodiment is described, taking application scenes of preschool education and children in classrooms as examples, of the method for positioning a child behavior trajectory according to the present invention. A schematic diagram of a child behavior trajectory positioning method according to this embodiment is shown in fig. 2.

S1, constructing a model base corresponding to the object to be recognized, and extracting the characteristics of each object to be recognized in the model base to obtain a first characteristic base. The method specifically comprises the following steps.

S11, constructing a model base of the human face based on the identity information Y of the object to be recognized and a plurality of photos P corresponding to the identity information Y.

In the application scene of preschool education, a teacher can input identity information of children in a class through a computer or a mobile phone, an identity file including identity information of the children, such as names and identity cards of the children in the class is created and expressed by Y, meanwhile, p photos including faces are uploaded for each child, and a model library of all the children in the class is constructed.

And S12, processing all the photos in the human model library to obtain an image matrix I.

A Crop () operation is performed on each photo, thereby generating an image matrix I of fixed length and width.

And S13, performing convolution processing on the image matrix I, and representing the facial image corresponding to the identity information Y by using the feature vector obtained after the convolution processing, wherein the set of the facial images is used as a first feature library.

The image matrix I is used as input, the first dynamic convolutional neural network processing is carried out, the obtained features of the last layer with fixed length are used for representing the corresponding face images, so that a set of face features capable of representing the identity of the corresponding child is obtained, a face feature library Fset = { FV1, FV 2.. FVp } is obtained, and the face feature library is used as a first feature library, wherein FVp represents the face features of the No. P photo.

In some embodiments, the first dynamic convolutional neural network may select a general neural network such as ResNet or the like which is pre-trained in an initial state, and in order to improve the generalization of the network, only the network weights of the last two layers may be updated in the training process, and the weights of the other layers are fixed.

S2, respectively carrying out face detection and human body detection on the collected video to obtain a face detection frame and a human body detection frame which correspond to each other in the image sequence of the video one by one, and the specific steps are as follows

S21, the captured video record includes at least several two-dimensional image sequences X = { X1, X2, … …, Xt-1, Xt,. }, where Xt is a two-dimensional image at any time t.

A video recording Vid, which consists of a series of two-dimensional image sequences X = { X1, X2, … …, Xt-1, Xt, · is continuously generated by a camera deployed in a classroom, where Xt represents a two-dimensional image obtained at time t.

S22, performing face detection and human body recognition on the two-dimensional image sequence, so as to obtain m human face detection frames FaceBoxset = (F1, F2,... Fm) and n human body detection frames BodyBoxset = (B1, B2,. Bn), where m is greater than or equal to 0 and n is greater than or equal to 0.

In one embodiment of the present invention, the images are sequentially processed by using a face detection and body recognition module in an OpenCV algorithm library, and m face detection frames FaceBoxset = (F1, F2,... Fm) and n body detection frames BodyBoxset = (B1, B2,... Bn) are obtained. Because the face in the image belongs to a part of a human body, in order to obtain the corresponding relation between the face in the faceBoxset and the human body in the BodyBoxset, the coincidence degree of any face detection frame Fi in the faceBoxset and the human body detection frame Bj in the BodyBoxset is calculated

. Wherein the cnt function is used for counting pixel data，

Indicating the number of pixels where Fi coincides with Bj,

indicating the number of pixels of Fi itself. The U value is a value in the range of 0 to 1, and a larger U value indicates a larger degree of coincidence.

And S23, obtaining k one-to-one corresponding face and human body detection frame pairs Boxset = { (F1, B1), (F2, B2),. and (Fk, Bk) } based on a preset coincidence degree threshold and a coincidence degree U, wherein k is less than or equal to the minimum value of m and n.

The value of the degree of coincidence U is in the range of 0 to 1, and a larger value of U indicates a larger degree of coincidence. In an embodiment of the present invention, the preset contact ratio threshold is set to 0.95, and when the contact ratio U value is greater than or equal to the preset contact ratio threshold, the face detection frame and the human body detection frame are considered to correspond to each other. Thus, k one-to-one corresponding pairs of face and body detection frames, Boxset = { (F1, B1), (F2, B2),. ·, (Fk, Bk) }, are obtained.

It should be noted that, theoretically, the number of the face detection boxes and the number of the human body detection boxes should be equal to each other, and because the processing modes and the application algorithms of the face detection boxes and the human body detection boxes are different, the values of m and n may be different, so that the value of the number k of the obtained pairs of the personal face detection boxes and the human body detection boxes is less than or equal to the minimum value of m and n.

And S3, matching the face detection frame in the image sequence with the first feature library for the first time, and extracting features of the human body detection frame based on the first matching result to obtain a second feature library. The method comprises the following specific steps.

And if the first face detection frame of the two-dimensional image sequence corresponding to the first time point is matched with the first feature library for the first time. If the first matching is successful, the object to be recognized corresponding to the first face detection frame is confirmed, and if the first matching is unsuccessful, the two-dimensional image sequence corresponding to the subsequent time point is repeatedly executed to perform the first matching with the first feature library.

The present step is applied to a scenario where no matching samples are obtained in the initial stage. Taking the face recognition of the child Q as an example, the child Q appears in the two-dimensional image after the first time t, and therefore the identity of the child is determined by the face recognition of the child. The corresponding face image is cut from the corresponding image sequence for the human body detection frame obtained in the step S22, after the cutting operation, the corresponding face feature value is obtained as an input through a first dynamic convolution neural network process, for example, the face feature value corresponding to the face detection frame Bi is FBi, then face similarity between FBi and a face feature library Fset of each child is calculated, the face similarity Sim (FBi, Fset) = max ({ Sim (FBi, FVi), i =1,2,. p }), wherein the face similarity function Sim (a, B) function can be expressed as face feature a and B cosine functions, the values of which fall within the range of 0-1, and the larger the value is expressed as more similar.

In one example of the present invention, if the face velocity threshold is set to 0.8, then the face similarity is greater than or equal to 0.8, which indicates that the first matching is successful. The identity corresponding to the face detection box is confirmed and subsequently face tracking is performed on the face detection box. Meanwhile, feature extraction is carried out on the human body detection frame, so that a second feature library is obtained.

And if the face image corresponding to the child Q passes the first matching, establishing a dynamic human body feature library based on the human body detection frame corresponding to the face detection frame of the child Q, wherein the dynamic human body feature library is represented by Bset = { BV1, BV 2.. BVn }.

In some embodiments, the body feature library, i.e. the second feature library, is not initially created, but is dynamically created on a case-by-case basis, for example, in daily units, taking into account that the child's garments may change on different days.

In addition, considering that the change range of the limbs of the children is large in one day, such as different actions of standing, sitting, squatting and the like, in order to realize stable recognition, the invention designs a characteristic library extraction mode based on posture normalization for the establishment of the human body characteristic library, and the specific steps are as follows.

Intercepting corresponding human body images of the human body detection frame Bi corresponding to the child Q matched for the first time, cutting the human body images to be uniform in size, and then utilizing an openposition algorithm in opencv to conduct human body image processing

Gesture recognition is carried out to obtain a human body key point set

In one embodiment of the invention, the set of human body key points comprises 19 sets of two-dimensional coordinate points covering key limb skeletal nodes of the human body. Then, a point set-based nonlinear registration algorithm is used for calculating a reference point set

To the detection point set

By a non-linear transformation matrix T, i.e. by

To obtain a transformation matrix T, wherein,

is a set of reference points in a standard upright posture. Human body image through obtained change matrix T

Carrying out registration to obtain a normalized image

And finally normalizing the image

Inputting the fixed-length features of the last layer obtained by the second dynamic convolutional neural network to represent the corresponding human body features.

In some embodiments, if the obtained face similarity is less than 0.8, it indicates that the first matching fails, that is, it indicates that the face recognition of the child Q fails, and the first matching is repeated in the image sequence at the subsequent time until the matching is successful.

It should be noted that, in some embodiments, the child Q may appear in the history image, but the child Q has a relatively serious face deflection in the new image, or the expression difference is large, and the face tracking cannot be performed, so that the trajectory positioning accuracy may be improved by combining with the face recognition again.

And S4, carrying out face tracking matching on the object to be recognized which is successfully matched for the first time, and carrying out secondary matching based on the second feature library based on the face tracking matching result.

The step mainly aims at the detection frame which passes the first matching, namely the detection frame with the determined identity and the detection frame in the current frame, to judge the similarity, thereby gradually tracking a series of tracking sequences.

Ith personal face detection frame on t-time point image sequence

And j individual face detection frame on t +1 time point image sequence

Performing similarity judgment, wherein the similarity is judged

=

，

And

respectively representing the image similarity and the motion trail prediction similarity of the two detection frames,

is a weight coefficient; if the similarity is similar

In some embodiments, the method further includes, if the first matching is successful, intercepting the face image corresponding to the face detection frame, performing feature extraction on the face image to obtain new face features, and updating the corresponding face feature library.

In some embodiments of the present invention, the substrate is,

for measuring the image similarity between two detection frames, a convolutional neural network can be used to extract a corresponding feature vector from each detection frame, and then a cosine distance between the two feature vectors is calculated to represent the similarity. In addition, in consideration of the calculation speed, the corresponding feature vector may be expressed by directly expanding the image matrix into a one-dimensional vector.

To match using motion trajectory prediction, we use (u, v, w, h,

,

) To indicate the state of the face detection frame Fi, wherein (u, v) is the coordinate of the center point of the detection frame, (w, h) respectively correspond to the width and height of the detection frame,

,

the detection frame of time t can be deduced from the detection frame of time t-1 by using the linear model time corresponding to the speed in the image sequence

. Measuring the predicted track of the previous frame image detection frame

Monitoring frame for current frame

The similarity can be expressed as

Wherein

And state information (u, v, w, h) indicating the i-th detection box.

Thus, by using the tracking algorithm, the detection frame with the determined identity in the historical sequence frame and the detection frame in the current frame can be matched step by step. The computational efficiency is far greater than that of face tracking only through face recognition.

In one embodiment of the present invention, the similarity threshold is set to 0,8 if the similarity is not equal to zero

>And 0.8, successfully tracking and matching the face, and putting the coordinates of the specific child successfully tracked and matched into a corresponding track library.

In a child classroom, under the condition that children freely move, the side faces and the shielding conditions are more, and long-time track tracking of specific children cannot be realized only by using a face tracking algorithm. However, since the identity of the child to be recognized is already known in advance in the classroom scene and the face library is already established in advance in step S1, the trajectory recognition and tracking across time segments can be realized based on face recognition algorithms of the face library.

Therefore, for a sample with unsuccessful face tracking matching, a second feature library is used for performing second matching, a corresponding human body image is cut out from a human body detection frame Bi with unsuccessful face tracking matching, after the cutting operation, the same operation of posture normalization is used for performing normalization processing on the human body image, then the normalized human body image is used as input and is subjected to pre-trained convolutional neural network processing, a corresponding human body feature value BBi is obtained, then the human body similarity between the BBi and the human body feature library Bset of each child is calculated, Sim (BBi, Bset) = max ({ Sim (BBi, BVi), i =1, 2.. p }), and in one implementation of the invention, if the human body similarity is greater than 0.8, the second matching is successful.

And S5, recording the coordinate position of the object to be recognized, which is successfully matched for the first time or successfully matched for face tracking or successfully matched for the second time in the image sequence, so as to correspondingly generate the track of the object to be recognized.

In the prior art, the tracking operation speed is high, but the accuracy is low, and in the application, the position of the object to be recognized which is successfully matched is recorded in real time, so that the track is prevented from being strung on other people, and the recognition accuracy is improved.

The steps S2-S5 are repeatedly executed, and after the above loop process, coordinates of a child detected in the two-dimensional sequence of images are calibrated, wherein a coordinate trajectory of a certain child can be Tk = { (x1, y1), (x2, y2),. so., (xt, yt). so. }, wherein (xt, yt) represents a coordinate position of a certain child in the image at time t. By connecting the coordinate points, a behavior track map of the corresponding child can be obtained by knowing a two-dimensional image.

And S6, correspondingly updating the image of the object to be recognized to a training sample library of the first dynamic convolution neural network or the second dynamic convolution neural network based on the image of the object to be recognized which is successfully matched for the first time or successfully matched for the face tracking or successfully matched for the second time.

In the track positioning process, the convolutional neural network is used in the face and human body matching process, the convolutional neural network can be dynamically trained, and the face and human body library which is successfully identified in the process is used as a training library to update the convolutional neural network, so that the characteristics of the child which is being identified can be evolved and learned, and the identification accuracy is improved. The specific steps are shown in fig. 3, and the specific steps are as follows.

S600, updating a face/human body training sample library, wherein the source of the sample library is the data of children in the class currently being positioned and tracked, the face sample library and the human body sample library can be respectively established for each child, and the faces and the human bodies which are successfully identified are added into the corresponding face sample library or the corresponding human body sample library according to a preset period.

S601, generating a training sample, wherein the training sample is added based on the samples in the training sample library and the attributes of the human face or the human body because the training sample is influenced by various factors such as the angle of the human face, the expression, the posture of the human body, clothes and the like in the identification process

And analyzing the images in the face/human body sample library based on the face/human body attribute classifier to obtain corresponding attributes. For example, the angle and expression of a face results in attributes that include peace, distraction, heartburn, fear, surprise, and crying. The posture attributes of the human body comprise standing, squatting and jumping, the attributes of the clothes comprise clothes color and whether the clothes are backpack or not, then two pictures with different face angles and expressions/different human body postures and clothes are randomly selected for each child, and finally the two pictures are fused to generate a new picture.

In some embodiments, two pictures are fused to generate a new picture, and the new picture may be directly added and averaged, or the two pictures may be subjected to operations such as rotation and clipping, and then weighted addition.

S602, a first dynamic convolution neural network and a second dynamic convolution neural network are selected, a general neural network which is pre-trained, such as ResNet, can be selected as the first dynamic convolution neural network and the second dynamic convolution neural network in an initial state, in order to improve the generalization of the networks, only the network weights of the last two layers can be updated in the training process, and the weights of the other layers are fixed.

And S603, optimizing the convolutional neural network through multi-task learning, wherein the multi-task comprises the following sub-tasks.

And the human face/human body classification and identification task outputs a plurality of multi-dimensional vectors, wherein each dimension corresponds to a single individual.

A human face/human body similarity identification task used for judging whether the two images belong to the same individual,

the human face/body attributes are mainly used for identifying the angles and expressions of the human face, the postures and clothes of the human body and the like. The multi-task learning can improve the generalization of the training model.

The steps of S601-S603 are circulated, an updated neural network model can be obtained, and because the model learns the applied student information and also identifies and learns the factors irrelevant to identification in the human face and the human body, the new model is used in tracking and positioning, and the identification accuracy can be obviously improved. And is also more suitable for the identification accuracy of the relatively fixed identity.

The invention further provides a child behavior trajectory positioning system based on the child behavior trajectory positioning method, and a functional block diagram is shown in fig. 4 and includes: the system comprises a data acquisition module, a data storage module, an identity information input module, a data processing module and a result presentation module. The data acquisition module comprises a video data acquisition terminal in a classroom, the video storage module is used for storing acquired video data, the identity information recording module is used for storing identity data of an analyzed object, and the data acquisition module mainly comprises a head portrait library used for face recognition and other related identity information such as name, age, gender and the like. The data processing module is used for carrying out identity recognition on the object by utilizing the video data, and the result presentation module is mainly used for displaying the recognized behavior track of the child.

In some embodiments, the data acquisition module mainly refers to a camera video acquisition terminal placed at the front end or the back end of a classroom. The video acquisition terminal can be in various forms, can be a special analog or digital camera, and can also be a camera arranged on a robot or a notebook computer. The acquisition terminal generally further includes a network access device for transmitting video data in a wireless or wired manner, a bus, an input/output device, and the like.

The identity information entry module is used for storing identity data of an analyzed object, wherein the identity data mainly comprises an avatar library for face recognition and other related identity information such as name, age, gender and the like. The entry module can be a mobile phone, a tablet computer and other related equipment which can be connected with the data storage. The input can be manually performed in the form of a touch screen, a keyboard, a mouse and the like. The information entry module also includes an application program that provides an indication of the form of the information entry.

The data storage module is generally a device stored in the park for recording the image sequence acquired by the data acquisition module, and the data storage module can be a separate memory or a part of a storage server. The storage server here mainly comprises a data memory and a program memory area. The data memory can be used for storing the video data and also storing the user identity required by the system processing, the intermediate data processing result and other information. The program storage area can be used for storing an operating system and an application program of a child behavior track positioning system for realizing a classroom environment of a kindergarten.

And the data processing module is used for completing a series of processing units of identity matching algorithm operation and acquiring the behavior track of each observed object by comprehensively utilizing the human face and the human body. The program corresponding to the data processing module is stored in the data module of the server, and the processor connected with the data processing module executes the program to realize the functions. The processor here may be a general central processing unit, and also include special processing units such as GPUs and the like dedicated to data processing acceleration.

The result presentation module can be a display device such as a mobile phone, a tablet personal computer or a garden display, and is mainly used for presenting the track tracking result of each child.

The child behavior track positioning method and the child behavior track positioning system comprehensively utilize face and body information, can solve the problem that children cannot recognize due to shielding and non-frontal faces in a classroom, and ensure high-precision recognition precision, so that the behavior track of each child in the classroom is accurately tracked, teachers can conveniently observe and record daily behavior modes of each child, and education strategies are formulated according to conditions.

It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the methods may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The system and the system embodiments described above are merely illustrative, and some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Claims

1. A children behavior track positioning method is characterized by comprising the following steps:

constructing a model base corresponding to the object to be recognized, and performing feature extraction on each object to be recognized in the model base through a first dynamic convolutional neural network to obtain a first feature base;

respectively carrying out face detection and human body detection on the acquired video so as to obtain face detection frames and human body detection frames which correspond to the video in an image sequence one by one;

matching the face detection frame in the image sequence with the first feature library for the first time, and after carrying out normalization processing on the human body detection frame based on a first matching result, carrying out feature extraction through a second dynamic convolution neural network so as to obtain a second feature library, wherein the second feature library specifically comprises the following steps:

if the first matching is successful, the human body detection frame B corresponding to the first human face detection frame is subjected to _i Cutting to obtain a human body image; and carrying out gesture recognition on the human body image to obtain a human body key point set

(ii) a Through a set of reference points

=T

Obtaining a human body key point set

The transformation matrix T of (a), wherein,

is a set of reference points in a standard upright attitude; normalizing the human body image based on the transformation matrix T; inputting the normalized human body image into a second dynamic convolution neural network, and taking the obtained feature vector as a human body feature, wherein the set of the human body features is taken as a second feature library;

carrying out face tracking matching on the object to be recognized which is successfully matched for the first time, and carrying out secondary matching based on a second feature library based on the face tracking matching result;

recording the coordinate position of the object to be recognized, which is successfully matched for the first time or successfully matched for face tracking or successfully matched for the second time, in the image sequence in real time, so as to correspondingly generate the track of the object to be recognized, and correspondingly updating the training sample library of the first dynamic convolution neural network or the second dynamic convolution neural network based on the image of the object to be recognized, which is successfully matched for the first time or successfully matched for face tracking or successfully matched for the second time;

and adding the training samples based on the training samples in the training sample library and the attributes of the human faces or the human bodies, so as to update the first dynamic convolution neural network and the second dynamic convolution neural network.

2. The method for positioning a child behavior trajectory according to claim 1, wherein the constructing a model library corresponding to the objects to be recognized, and performing feature extraction on each object to be recognized in the model library through a first dynamic convolutional neural network to obtain a first feature library specifically comprises:

constructing a face model library based on identity information Y of an object to be recognized and a plurality of photos P corresponding to the identity information Y;

processing all photos in the face model library to obtain an image matrix I;

and inputting the image matrix I into a first dynamic convolution neural network for convolution processing, and representing the facial image corresponding to the identity information Y by using the feature vector obtained after the convolution processing, wherein the set of the facial image is used as a first feature library.

3. The method for positioning a child behavior trajectory according to claim 1, wherein the method for performing face detection and body detection on the collected video respectively to obtain a face detection frame and a body detection frame corresponding to each other in an image sequence of the video, specifically comprises:

the acquired video record at least comprises a plurality of two-dimensional image sequences X = { X1, X2, … …, Xt-1, Xt, }, wherein Xt is a two-dimensional image at any time t;

performing face detection and human body recognition on the two-dimensional image sequence to obtain m human face detection frames FaceBoxset = (F1, F2.,. Fm) and n human body detection frames BodyBoxset = (B1, B2.. Bn), wherein m is greater than or equal to 0, and n is greater than or equal to 0;

calculating the contact ratio of any face detection frame Fi in the face detection frame faceBoxset and the human body detection frame Bj in the human body detection frame BodyBoxset

Wherein i is less than or equal to m, and j is less than or equal to n;

k one-to-one corresponding human face and human body detection frame pairs, Boxset = { (F1, B1), (F2, B2),. and (Fk, Bk) }, are obtained based on a preset coincidence degree threshold and a coincidence degree U, wherein k is less than or equal to the minimum value of m and n.

4. The method for positioning a child behavior trajectory according to claim 3, wherein the first matching of the face detection frame in the image sequence with the first feature library specifically comprises:

a first face detection frame of the two-dimensional image sequence corresponding to the first time point is matched with the first feature library for the first time,

if the first matching is successful, confirming the object to be recognized corresponding to the first face detection frame,

and if the first matching is unsuccessful, repeatedly executing the two-dimensional image sequence corresponding to the subsequent time point and performing the first matching with the first feature library.

5. The child behavior trajectory positioning method according to claim 4, wherein the performing face tracking matching on the object to be recognized, which is successfully matched for the first time, specifically comprises:

ith personal face detection frame on t-time point image sequence

And t +1 time point image sequenceJ-th face detection frame on

Carrying out similarity judgment, wherein the similarity is judged

=

，

And

is a weight coefficient;

if the similarity is similar

6. The children's behavior track positioning method according to claim 5, further comprising, if the face tracking matching is unsuccessful, performing feature extraction based on the first human detection box to obtain a first human feature, and performing human similarity comparison between the first human feature and the human features in the second feature library, and if the similarity comparison result is greater than or equal to a human similarity threshold, the second matching is successful.

7. The method for positioning child behavior trajectory according to claim 1, wherein the adding training samples based on samples in the training sample library and human face or body attributes specifically comprises:

classifying images in a face sample library and a human body sample library in a training sample library respectively based on a face and human body attribute classifier so as to obtain a plurality of face attributes and a plurality of human body attributes of corresponding images respectively;

and fusing and adding the images corresponding to different human body attributes to a human body sample library.

8. The child behavior trajectory localization method according to claim 7, wherein the first and second dynamic convolutional neural networks are preset as ResNet neural networks, the training process updates the network weights of the last two layers, and the weights of the other layers are fixed.

9. A child behavior trajectory localization system for executing the child behavior trajectory localization method according to any one of claims 1 to 8, comprising: a data acquisition module, a data storage module, an identity information input module, a data processing module and a result presentation module,

the data acquisition module at least comprises video data for acquiring behaviors of children in a classroom and outputs the acquired video data to the data storage module;

the data storage module is at least used for storing video data;

the identity information input module is used for storing the identity information of the object to be identified;

the data processing module is used for carrying out identity recognition and video data processing on the object to be recognized based on the input identity information and the input video data;

the result presentation module is used for displaying the behavior track of the child based on the processing result of the data processing module.