CN114999648A

CN114999648A - Early screening system, equipment and storage medium for cerebral palsy based on baby dynamic posture estimation

Info

Publication number: CN114999648A
Application number: CN202210622793.5A
Authority: CN
Inventors: 舒强; 李海峰; 王慧; 阮雯聪; 陈雯聪; 肖俊
Original assignee: Childrens Hospital of Zhejiang University School of Medicine
Current assignee: Childrens Hospital of Zhejiang University School of Medicine
Priority date: 2022-05-27
Filing date: 2022-06-02
Publication date: 2022-09-02
Anticipated expiration: 2042-06-02
Also published as: CN114999648B

Abstract

The invention provides a cerebral palsy early screening system, equipment and a storage medium based on baby dynamic posture estimation. The whole system of the invention consists of a healthy baby characteristic space acquisition module, a baby video acquisition module, a static posture characteristic extraction module, a dynamic posture characteristic extraction module and an abnormality detection module. The baby motion video clip can extract a baby dynamic attitude feature sequence after the comprehensive time sequence information through the static attitude feature extraction module and the dynamic attitude feature extraction module; and then, carrying out anomaly detection on the infant dynamic posture characteristic sequence of the video segment to be detected in the characteristic space through an anomaly detection module, and outputting a detection result of the infant with cerebral palsy risk in the video. The method can be used for evaluating the risk of the infant suffering from cerebral palsy through the video segments by using a density-based anomaly detection method in practical application scenes.

Description

Early screening system, equipment and storage medium for cerebral palsy based on baby dynamic posture estimation

Technical Field

The invention belongs to the field of image data processing, and particularly relates to an anomaly detection technology in human posture estimation and statistical learning in computer vision.

Background

With the development of many promising tools, such as General Movements Assessment (GMA), early diagnosis of cerebral palsy has become an active research area. Cerebral Palsy (CP) is generally defined as a permanent set of motor and postural developmental disorders resulting in restricted activity due to non-progressive disorders occurring in the developing fetus or infant brain. Such syndromes are often accompanied by sensory, cognitive, communication and behavioral disorders. In severe cases, epilepsy and secondary musculoskeletal abnormalities may occur. Early diagnosis and rehabilitation training of cerebral palsy are particularly important for children.

General locomotion (GMs) summarize the spontaneous motor behaviour characteristics of the infant brain in different developmental states and can be assessed by the variability and complexity of Movement. Normal whole body movement involves a continuous irregular movement throughout the body. And the abnormal core characteristics of the cerebral palsy baby. Infants with severe cerebral palsy often have abnormal postures, and their bodies may be very soft or very stiff. Research shows that lack of restlessness has a higher predictive significance for cerebral palsy. Restless motion is small movements of the infant's neck, torso, and limbs with variable accelerations of motion in various directions. Spontaneous movements of normal infants are significantly variable and complex, while those lacking restless movement have a more uniform movement pattern, involving more repetitive movements.

The human body posture estimation aims at positioning human body parts from images or videos, generally comprising a head, a shoulder, an elbow, a knee and the like, and related technologies are widely applied to the fields of human-computer interaction, games, virtual reality, video monitoring, motion analysis and medical assistance. In recent years, with the development of deep learning and the appearance of large-scale human posture estimation data sets, a large number of human posture estimation models based on neural networks are emerging in academia and industry. The models are often trained end to end on massive marking data by using a back propagation algorithm, so that a good identification effect can be obtained.

In the field of data analysis, anomaly detection is intended to identify samples that have significant deviations from most data. In practical cases, the vast majority of samples are normal samples, while the proportion of abnormal samples is small. This inherent maldistribution makes supervised classification approaches impossible to solve anomaly detection problems. One practical class of anomaly detection methods is density-based methods. Such methods do not contain any parameters that need to be trained and only a small number of normal samples are needed as input to estimate the distribution of normal samples. For a completely new sample to be evaluated, if the density of the area in which the sample is located in the approximate normal sample distribution is lower than a threshold value, the sample is considered as an abnormal sample. Anomaly detection has applications in many areas, including network security, medicine, machine vision, statistics, neuroscience, law enforcement, and financial fraud, among others.

Disclosure of Invention

The invention provides a cerebral palsy early screening system, equipment and a storage medium based on baby dynamic posture estimation. The screening system of the present invention can assess the risk of cerebral palsy of an infant by analyzing video segments of the infant's voluntary movements in the supine position.

In order to achieve the above purpose, the invention specifically adopts the following technical scheme:

in a first aspect, the present invention provides a system for early screening of cerebral palsy based on estimation of dynamic posture of infant, comprising:

the healthy infant feature space acquisition module is used for acquiring a feature space constructed by an infant dynamic posture feature sequence of a healthy infant without cerebral palsy;

the baby video acquisition module is used for acquiring a video segment to be detected, wherein the baby to be screened is in a non-shielding supine position and moves autonomously;

the static posture feature extraction module is used for extracting the baby posture information in the video clip to be detected frame by frame through a pre-trained baby supine position posture estimation model and coding each frame of baby posture information to obtain the corresponding baby static posture feature; the infant supine position posture estimation model is obtained by fine tuning a pre-trained human body posture estimation model on an infant supine position posture estimation data set, wherein the infant supine position posture estimation data set is composed of key frames of a supine position posture infant motion video marked by joint points;

the dynamic attitude feature extraction module is used for taking the static attitude feature of the baby of each frame in the video clip to be detected as input, extracting the network coding of the whole video clip to be detected by the pre-trained baby dynamic attitude feature extraction network, and obtaining the baby dynamic attitude feature sequence after the time sequence information is synthesized in the video clip to be detected;

and the abnormality detection module is used for carrying out abnormality detection on the infant dynamic posture characteristic sequence of the video segment to be detected in the characteristic space and outputting a detection result that the infant in the video has cerebral palsy risk.

In the above first aspect, in the healthy infant feature space acquisition module, the feature space acquisition method includes:

s1: acquiring a video data set of a healthy baby, wherein the age of the baby in each video fragment sample in the data set meets a preset age interval, the baby belongs to a healthy baby without cerebral palsy, and the baby needs to be in a supine position in a video without shielding and does autonomous movement;

s2: performing frame extraction on each video clip sample in the healthy infant video data set, randomly selecting partial key frames from the video clip samples, and labeling infant joint points in each key frame to form a labeled frame with infant posture information; constructing and obtaining the estimation data set of the supine position posture of the infant based on the labeling frame of each video clip sample; fine-tuning a pre-trained human body posture estimation model on the infant supine position posture estimation data set to obtain the infant supine position posture estimation model;

s3: extracting the baby posture information in each section of video clip sample in the healthy baby video data set frame by using the baby supine position posture estimation model, and coding the baby posture information of each frame to obtain the baby static posture characteristic of the frame;

s4: for each video clip sample in the healthy infant video data set, taking the infant static posture feature of each frame in the video clip sample as input, extracting the network coding whole video clip sample through the infant dynamic posture feature, and constructing an infant dynamic posture feature sequence integrating time sequence information; training a baby dynamic posture feature extraction network through a mask reconstruction task in a self-supervision mode to obtain the pre-trained baby dynamic posture feature extraction network;

s5: and coding all video fragment samples in the healthy infant video data set one by using the pre-trained infant dynamic posture feature extraction network to obtain an infant dynamic posture feature sequence corresponding to each video fragment sample, thereby forming the feature space for screening the infant with the cerebral palsy risk through abnormal detection.

As a preferred aspect of the first aspect, the infant supine position posture estimation model takes a human posture estimation model OpenPose pre-trained on a human posture estimation data set as a training starting point, and a back propagation algorithm is used on the artificially labeled infant supine position posture estimation data set to perform fine tuning on OpenPose model parameters, so as to obtain a neural network model dedicated for estimating the supine position posture of the infant; for a piece of image of the infant in the supine position, the infant supine position posture estimation model extracts position information of 8 body parts as posture information, the extracted body parts including bilateral eyes, a neck, bilateral shoulders, bilateral elbows, bilateral wrists, a hip, bilateral knees and bilateral ankles.

As a preferred aspect of the first aspect, the method for encoding the infant posture information of each frame is as follows:

firstly, establishing a two-dimensional plane by taking the central points of the neck and the hip of the baby as the original points, and normalizing the position information of the body part by taking the distance from the neck to the hip of the baby to eliminate the influence of the body type of the baby; subsequently, position offset vectors of different parts of the body are respectively calculated, including: from neck to hip, bilateral from neck to shoulder of body, bilateral from shoulder to elbow of body, bilateral from elbow to wrist of body, from neck to binocular center, from hip to bilateral knee, bilateral from knee to ankle of body; then, calculating an included angle between adjacent offset vectors, including: an included angle of a vector which takes the shoulder as the center and points to the neck and the elbow, an included angle of a vector which takes the elbow as the center and points to the shoulder and the wrist, an included angle of a vector which takes the hip as the center and points to the neck and the knee, and an included angle of a vector which takes the knee as the center and points to the hip and the ankle, wherein all included angles need to comprise both sides of the body; and finally, combining the body part position information and the included angle between the position offset vector and the adjacent offset vector, forming a multi-dimensional vector through coding, and using the multi-dimensional vector as the static posture characteristic of the baby.

In the first aspect, the baby dynamic posture feature extraction network preferably uses a Transformer encoder as a network structure, and inputs the baby static posture feature and a time interval between a time of a frame corresponding to the static posture feature and a start time of a video segment.

As a preferred aspect of the first aspect, when the infant dynamic posture feature extraction network is trained in advance in a self-supervision manner, the mask reconstruction is used as an agent task to generate a training signal, so as to enhance the timing information in the encoded features, and the specific training steps are as follows:

s41: sampling training samples from the healthy infant video data set, randomly taking a certain frame in a video clip sample obtained by sampling as a replaced position, replacing the infant static attitude characteristics corresponding to the replaced position with a fixed random code, and keeping the infant static attitude characteristics of the rest frames of the video clip sample unchanged to form a replaced input sequence;

s42: and then, encoding the replaced input sequence by using the baby dynamic attitude feature extraction network to obtain the encoded dynamic attitude feature corresponding to the replaced position.

S43: predicting the replaced baby static attitude characteristics in the replaced position by using a multilayer perceptron network and taking the coded dynamic attitude characteristics corresponding to the replaced position as input, and finishing the characteristic reconstruction;

s44: and evaluating the characteristic reconstruction quality by using the two-norm loss, optimizing parameters in the baby dynamic posture characteristic extraction network by using a back propagation algorithm by taking the characteristic reconstruction quality as a training target, and obtaining the pre-trained baby dynamic posture characteristic extraction network after training to convergence for carrying out an actual baby dynamic posture characteristic extraction task.

Preferably, in the abnormality detection module, the specific steps of performing abnormality detection include:

first, a high-dimensional Gaussian distribution p (x; mu, sigma) is used to fit the distribution of the baby dynamic posture feature sequence of all healthy babies without cerebral palsy in the feature space:

wherein x ∈ R ⁿ For the sequence of baby dynamic posture features, mu epsilon R ⁿ Is a mean vector of n-dimensional Gaussian distribution, sigma belongs to R ^n×n A covariance matrix of n-dimensional Gaussian distribution; the estimated values of the parameters mu and sigma obtained by fitting are as follows:

wherein x is ⁽ⁱ⁾ The baby dynamic posture characteristic sequence of the ith healthy baby is obtained, and m is the total number of the baby dynamic posture characteristic sequences in the characteristic space;

then, for the baby dynamic attitude characteristic sequence x of the video segment to be detected ^* And calculating the probability density value of the healthy baby in the dynamic posture distribution:

finally, according to the calculated probability density value p (x) ^* ) Making risk judgment if p (x) ^* ) If the value is less than the threshold epsilon, the baby in the video segment to be detected is considered to have the risk of cerebral palsy, otherwise, the baby in the video segment to be detected is considered to be a normal baby without cerebral palsy.

In a second aspect, the invention provides a computer electronic device comprising a memory and a processor;

the memory for storing a computer program;

the processor, when executing the computer program, is configured to output the detection result by using the early screening system for cerebral palsy based on baby dynamic posture estimation according to any one of the aspects of the first aspect.

In a third aspect, the present invention provides a computer-readable storage medium, wherein the storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the computer program can output the detection result by using the early screening system for infantile dynamic posture estimation based cerebral palsy according to any aspect of the first aspect.

In a fourth aspect, the present invention provides an early screening device for cerebral palsy, which comprises a video capturing device and a detecting device;

the video acquisition equipment is used for shooting a video clip of the infant to be screened, which is in a non-shielding supine position and moves autonomously, and storing the shot video clip for the detection equipment to read;

the detection device is used for reading the video segment shot by the video acquisition device as a video segment to be detected, detecting by using the early screening system for cerebral palsy based on baby dynamic posture estimation according to any one of the first aspect, and outputting the detection result.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a cerebral palsy early screening system, equipment and a storage medium based on baby dynamic posture estimation, which can evaluate the risk of the baby suffering from cerebral palsy by analyzing a video segment of the baby doing autonomous movement in a supine position. According to the invention, more accurate identification of the posture of the supine position of the infant is realized by finely adjusting the human posture estimation model pre-trained on a large-scale data set. The invention uses the Transformer encoder to encode the infant static attitude information frame by frame, and can better capture the association between the static attitude characteristics and the differential motion characteristics, such as the direction and frequency of motion, speed and acceleration, and the like. The invention trains the baby dynamic attitude feature extraction network by a self-supervision representation learning method, and uses a mask reconstruction task to guide the dynamic feature extraction network to mine the attitude information of the time sequence. In addition, the method is based on the collected video data set of the healthy baby, the dynamic posture characteristic distribution of the healthy baby is fitted, and the risk of the baby suffering from cerebral palsy is evaluated through video segments by using a density-based anomaly detection method in an actual application scene.

Drawings

Fig. 1 is a schematic diagram of the components of an early screening system for cerebral palsy based on the estimation of the dynamic posture of an infant.

Fig. 2 is a schematic flow chart of the construction process of the early screening system for cerebral palsy based on the estimation of the dynamic posture of the infant.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. The technical characteristics in the embodiments of the present invention can be combined correspondingly without mutual conflict.

In a preferred embodiment of the present invention, as shown in fig. 1, there is provided an early screening system for cerebral palsy based on the estimation of the dynamic posture of an infant, which can evaluate the risk of cerebral palsy of an infant by inputting a video segment of the infant performing an autonomous movement in a supine position. The whole system consists of a healthy baby feature space acquisition module, a baby video acquisition module, a static posture feature extraction module, a dynamic posture feature extraction module and an abnormality detection module, and the specific functions and implementation forms of the modules are described in detail below.

The healthy infant feature space acquisition module is used for acquiring a feature space constructed by an infant dynamic posture feature sequence of a healthy infant without cerebral palsy.

The baby video acquisition module is used for acquiring a video segment to be detected, wherein the baby to be screened is in a non-shielding supine position and moves autonomously.

The static posture characteristic extraction module is used for extracting the baby posture information in the video clip to be detected frame by frame through a pre-trained baby supine posture estimation model and coding each frame of baby posture information to obtain the corresponding baby static posture characteristic; the infant supine position posture estimation model is obtained by fine tuning a pre-trained human body posture estimation model on an infant supine position posture estimation data set, wherein the infant supine position posture estimation data set is composed of key frames of a supine position posture infant motion video marked by joint points.

And the dynamic attitude feature extraction module is used for taking the static attitude features of the baby of each frame in the video clip to be detected as input, extracting the network coding of the whole video clip to be detected by using the pre-trained dynamic attitude features of the baby, and obtaining the dynamic attitude feature sequence of the baby after the time sequence information is synthesized in the video clip to be detected.

It should be noted that, in the above-mentioned baby video acquisition module, the acquisition mode of the video clip to be detected may be acquired online by calling a video acquisition device, or may be read in from a shot video uploaded or stored from the outside through a data reading interface, which is not limited herein. The video segment to be detected should meet the basic requirements of detection, such as clear image, length more than 10 seconds, and no occlusion of the baby.

It should be noted that, in the above healthy infant feature space acquisition module, the acquisition manner of the feature space may be to construct a new feature space by a dynamic construction method, or to directly acquire an already constructed feature space, which may not be limited to this. In addition, the estimation model of the supine position of the infant in the static posture feature extraction module and the extraction network of the dynamic posture feature of the infant adopted in the dynamic posture feature extraction module both need to be trained by using related data sets in advance, and the model can be put into practical use after the performance of the model meets the requirements. Therefore, the feature space, the infant supine position posture estimation model and the dynamic posture feature extraction module can be constructed and trained in advance and then stored in the corresponding modules, so that calling and processing can be conveniently carried out in practical application. Of course, the feature space, the infant supine position posture estimation model, and the dynamic posture feature extraction module in the above modules may also be continuously updated online to maintain the optimal detection performance.

As a preferred aspect of the embodiment of the present invention, when the feature space, the infant supine position posture estimation model, and the dynamic posture feature extraction module are newly constructed or the three modules are updated online, the processes of S1 to S5 may be performed as follows, as shown in fig. 2, the specific process is as follows:

s1: the method comprises the steps of obtaining a video data set of a healthy baby, enabling the age of the baby in each section of video clip sample in the data set to meet a preset age interval, belonging to the healthy baby without cerebral palsy, and enabling the baby to be in a supine position without shielding in a video and do autonomous movement.

It is noted that the samples in the video data set of healthy infants may be obtained by collecting videos of healthy infants from one month to two months old on the internet. The video sample of collection, it is clear to need to screen the image, and length is greater than 10 seconds's unblock baby video clip, and the baby need be in the video and be the supine position and carry out autonomous movement. Two medical experts in each video are required to identify the approximate age and health state of the baby in the video, and only video clips of the baby with the age meeting the requirements and the evaluation result being healthy are reserved as video clip samples.

S2: performing frame extraction processing on each video clip sample in the healthy infant video data set, randomly selecting partial key frames from the video clip samples, and labeling infant joint points in each key frame to form a labeled frame with infant posture information; constructing and obtaining the estimation data set of the supine position posture of the infant based on the labeling frame of each video clip sample; and finely adjusting a pre-trained human body posture estimation model on the infant supine position posture estimation data set to obtain the infant supine position posture estimation model.

It should be noted that, the information labeling of the infant joint on the key frame may be manually labeled or assisted by a labeling tool, which is not limited to this. The body posture estimation model can adopt any model capable of detecting the body posture, such as openpos. Before fine adjustment is carried out, the human body posture estimation model needs to be pre-trained on a related large-scale public human body posture estimation data set, and then fine adjustment is carried out on the small-sized manually marked infant supine position posture estimation data set to achieve good infant supine position posture estimation performance.

S3: and extracting the baby posture information in each section of video clip sample in the healthy baby video data set frame by using the baby supine position posture estimation model, and coding the baby posture information of each frame to obtain the baby static posture characteristic of the frame.

S4: establishing a baby dynamic attitude feature extraction network, regarding each video clip sample in the healthy baby video data set, taking the baby static attitude feature of each frame in the video clip sample as input, and encoding the whole video clip sample through the baby dynamic attitude feature extraction network to construct a baby dynamic attitude feature sequence integrated with time sequence information; and training a baby dynamic posture feature extraction network through a mask reconstruction task in a self-supervision mode to obtain the pre-trained baby dynamic posture feature extraction network.

In the application stage, the constructed feature space, the trained infant supine position posture estimation model and the trained dynamic posture feature extraction module can be respectively embedded into corresponding modules for carrying out reasoning tasks. In the inference process, a user can submit a section of video of the supine position of the baby, the video is used as the input of a baby video acquisition module, the video is trained in a static posture characteristic extraction module through a baby supine position posture estimation model obtained through S2 to extract baby posture information frame by frame and is coded to obtain corresponding baby static posture characteristics, and a baby dynamic posture characteristic extraction network obtained through S4 training is used to extract a baby dynamic posture characteristic sequence. And then evaluating the risk of the abnormal posture of the current baby according to the characteristic distribution of the dynamic posture of the healthy baby in the characteristic space. And if the risk value is larger than a certain threshold value, determining that the current infant has the risk of cerebral palsy.

As a preferable mode of the embodiment of the present invention, in the static posture feature extraction module and step S2, the infant supine position posture estimation model may use a human posture estimation model OpenPose pre-trained on a human posture estimation data set as a training starting point, and a back propagation algorithm is used on the artificially labeled infant supine position posture estimation data set to perform fine tuning on OpenPose model parameters, so as to obtain a neural network model dedicated to infant supine position posture estimation; for a piece of image of the infant in the supine position, the infant supine position posture estimation model extracts position information of 8 body parts as posture information, the extracted body parts including bilateral eyes, a neck, bilateral shoulders, bilateral elbows, bilateral wrists, a hip, bilateral knees and bilateral ankles.

Further, as a preferred aspect of the embodiment of the present invention, the static posture feature extraction module and the method for encoding the baby posture information of each frame in step S3 are as follows:

firstly, establishing a two-dimensional plane by taking the central points of the neck and the hip of the baby as the original points, and normalizing the position information of the body part by taking the distance from the neck to the hip of the baby to eliminate the influence of the body type of the baby; subsequently, position offset vectors of different parts of the body are respectively calculated, the position offset vectors represent the orientation of different positions of the body, and the starting point types of the position offset vectors comprise: from neck to hip, bilateral from neck to shoulder of body, bilateral from shoulder to elbow of body, bilateral from elbow to wrist of body, from neck to binocular center, from hip to bilateral knee, bilateral from knee to ankle of body; then, calculating an included angle between adjacent offset vectors, including: an included angle of a vector which takes the shoulder as the center and points to the neck and the elbow, an included angle of a vector which takes the elbow as the center and points to the shoulder and the wrist, an included angle of a vector which takes the hip as the center and points to the neck and the knee, and an included angle of a vector which takes the knee as the center and points to the hip and the ankle, wherein all included angles need to comprise both sides of the body; and finally, combining the body part position information and the included angle between the position offset vector and the adjacent offset vector, forming a multi-dimensional vector through coding, and using the multi-dimensional vector as the static posture characteristic of the baby. It should be noted that the specific dimensions of the multidimensional vector herein need to be determined according to the encoding situation, and in a preferred embodiment, the above-mentioned body part position information and the included angle between the position offset vector and the adjacent offset vector are encoded as 68-dimensional vectors.

Further, as a preferable aspect of the embodiment of the present invention, in the dynamic posture feature extraction module and step S4, the baby dynamic posture feature extraction network may use a transform encoder as a network structure, and inputs the baby static posture feature and a time interval between a time of a frame corresponding to the static posture feature and a start time of the video segment.

Further, as a preferred mode of the embodiment of the present invention, when the infant dynamic posture feature extraction network is trained in advance in a self-supervision manner in step S4, the mask reconstruction is used as an agent task to generate a training signal, so as to enhance the timing information in the encoded features, and the specific training steps are as follows:

s41: sampling training samples from the healthy infant video data set, randomly taking a certain frame in a sampled video fragment sample as a replaced position, replacing the infant static attitude characteristics corresponding to the replaced position with a fixed random code, and keeping the infant static attitude characteristics of the rest frames of the video fragment sample unchanged to form a replaced input sequence;

Further, as a preferable mode of the embodiment of the present invention, in the abnormality detection module, a high-dimensional gaussian distribution is used to fit the dynamic posture feature distribution of a small number of healthy infants. Video clips of these healthy infants can be collected from the internet, but require identification by medical professionals. The baby dynamic posture features of the video segments are extracted by the baby dynamic posture feature extraction network, and each video segment corresponds to a baby dynamic posture feature sequence with an indefinite length. The specific steps of the abnormality detection module for abnormality detection are as follows:

wherein x ∈ R ⁿ For the feature sequence of the dynamic posture of the baby, mu belongs to R ⁿ Is a mean vector of n-dimensional Gaussian distribution, sigma belongs to R ^n×n Is a co-ordinate of n-dimensional Gaussian distributionA variance matrix; the estimated values of the parameters mu and sigma obtained by fitting are as follows:

wherein x is ⁽ⁱ⁾ The baby dynamic posture characteristic sequence of the ith healthy baby, and m is the total number of the baby dynamic posture characteristic sequences in the characteristic space;

The present invention will be illustrated below by a specific embodiment to show the specific construction, training and practical application process of the early screening system for cerebral palsy based on baby dynamic posture estimation, so as to facilitate understanding.

Examples

Construction of video data set of healthy baby

Videos of healthy infants from one month to two months old are collected on the internet. The manual screening image is clear, the length of the non-shielding infant video clip is more than 10 seconds, and the infant needs to be in a supine position in the video and can move autonomously. Two medical experts identify the approximate age and health status of the baby in the video, and only keep a sample of video clips of the baby of which the age meets the requirements and the evaluation result is healthy, thereby constructing a healthy baby video database.

Second, construction of data set for estimating supine position of baby

The method comprises the steps of carrying out frame extraction processing on video segment samples obtained by screening in a healthy baby video database, randomly selecting partial key frames from the video segment samples, manually marking baby joint point information in the partial key frames, and manually adding human posture estimation marks, wherein the human posture estimation marks comprise positions of eyes on two sides, a neck, shoulders on two sides, elbows on two sides, wrists on two sides, buttocks, knees on two sides and ankles on two sides in an image, so that a small-sized baby supine position posture estimation data set is constructed.

Training of supine position posture estimation model of baby

The method comprises the steps of obtaining an OpenPose human posture estimation model pre-trained on a general large-scale human posture estimation data set, fine-tuning the pre-trained OpenPose model on the infant supine position posture estimation data set, and obtaining a more accurate infant supine position posture estimation model. The transfer learning method can use relatively small amount of data labels, and meanwhile, the data distribution difference between the infant supine position posture estimation and the general human body posture estimation is relieved to a great extent.

Fourth, construction of static posture characteristics of baby

And extracting the baby posture information in all video clip samples in the video database of the supine position of the healthy baby by using the fine-tuned baby supine position posture estimation model. And encoding the baby posture information of each frame to obtain the baby static posture characteristic of the frame.

In particular, the refined openpos model can accurately and efficiently identify the eyes, shoulders, neck, elbows, wrists, hips, knees, and ankles of a supine infant in a video keyframe. Based on which the baby static pose feature vector in the current keyframe can be constructed. The infant posture information contains 68 dimensions, and specifically includes the following information (note that some information includes both sides of the body):

4.1) location information: eyes at two sides, neck, shoulders at two sides, elbows at two sides, wrists at two sides, buttocks, knees at two sides and ankles at two sides.

4.2) position offset vector (i.e. orientation information) of different parts of the body: first from neck to hip, second from neck to shoulder (both sides), third from shoulder to elbow (both sides), fourth from elbow to wrist (both sides), fifth from neck to binocular center, sixth from hip to knee (both sides), seventh from knee to ankle (both sides)

4.3) angle information: the angle between the vector pointing to the neck and the elbow is centered on the shoulder (on both sides), the angle between the vector pointing to the shoulder and the wrist is centered on the elbow (on both sides), the angle between the vector pointing to the neck and the knee is centered on the hip (on both sides), and the angle between the vector pointing to the hip and the ankle is centered on the knee (on both sides).

Fifth, network training for extracting dynamic posture features of infants

A baby dynamic posture feature extraction network is constructed, and time sequence information such as motion direction, frequency, speed and the like can be coded on the basis of frame-by-frame posture recognition. For each video clip sample in the database, the baby static posture characteristics of each frame are used as input, a baby dynamic posture characteristic extraction network is used for coding the whole video clip sample, and a baby dynamic posture characteristic sequence integrating time sequence information is constructed.

Specifically, the baby dynamic posture feature extraction network is based on a single-layer Transformer encoder structure. To avoid overfitting during training, this encoder contains only one attention branch, and the hidden state vector has dimension 128. The self-attention mechanism in the transform encoder can automatically capture the correlation between the static posture characteristics of the infant at different time instants and the differential motion characteristics, such as the direction and frequency of motion, speed and acceleration, and the like.

In the embodiment, on the premise of no need of additional labeling, the baby dynamic posture feature extraction network is trained in a self-supervision mode through a mask reconstruction task. When the method of self-supervision representation learning is used, sequence mask reconstruction is used as an agent task, and dynamic attitude feature extraction network is guided to mine association and differential motion features between static attitude features at different moments. The specific training process is as follows:

5.1) carrying out random initialization on the weights of the baby dynamic posture feature extraction network.

And 5.2) performing frame extraction on the screened video clip samples, and extracting infant posture information for each key frame by using a fine-tuned infant supine position posture estimation model, wherein the infant posture information is encoded into 68-dimensional static posture characteristic vectors. At the end of this vector a one-dimensional is added, representing the time interval from the start time to the current key-frame. Through the processing, each video clip corresponds to a 69-dimensional static feature sequence with an indefinite length and is used as the input of the baby dynamic posture feature extraction network.

5.3) randomly screening a static feature sequence formed by the baby static posture features of each frame in a video clip sample, and replacing the baby static posture feature at a random position in the sequence with a fixed random vector, namely erasing an element in the sequence through a random mask. And inputting the replaced static characteristic sequence into the baby dynamic attitude characteristic extraction network to obtain a coded dynamic characteristic sequence.

5.4) extracting the feature vector of the dynamic feature sequence at the replacement position, inputting the feature vector into a simple multi-layer perceptron network, and predicting the static posture feature of the replaced baby in the step 5.3).

5.5) calculating the two-norm loss between the predicted baby static posture characteristic and the real baby static posture characteristic, and updating parameters in the baby dynamic posture characteristic extraction network by using a back propagation and gradient descent algorithm to gradually reduce a loss function.

5.6) repeating the steps from 5.3) to 5.5) until the optimization algorithm converges to obtain the trained baby dynamic posture feature extraction network.

The training method utilizes the continuity of baby posture change in the video, and reconstructs erased static posture information through the context information of the time dimension at the replaced moment, thereby enhancing the capability of baby dynamic posture feature extraction network mining time sequence posture feature.

Sixth, feature space construction

And coding video clips in all video databases of the supine position of the healthy baby into a dynamic posture characteristic sequence by using the trained baby dynamic posture characteristic extraction network. Then, a multi-dimensional Gaussian distribution is used to fit the distribution of the dynamic posture characteristics of the healthy baby. In this feature space, infants at risk of cerebral palsy can be screened using a method based on density-based anomaly detection.

In this embodiment, the probability distribution used for fitting is a multidimensional gaussian distribution. In order to capture the correlation between different dimensions of the dynamic attitude feature, no independence assumption is made here, that is, the covariance matrix of the multidimensional gaussian distribution is not necessarily a diagonal matrix. The specific form of distribution is as follows:

in this embodiment, the method for fitting the distribution of the dynamic posture features of the healthy baby estimates the parameters in the distribution of the multidimensional gaussian distribution by using a maximum likelihood method, and the specific method is as follows:

the above parameters are defined as described above and are not described in detail.

Seventh, practical application

In the application stage, based on the constructed feature space, the trained infant supine position posture estimation model and the trained dynamic posture feature extraction module, the early cerebral palsy screening system based on infant dynamic posture estimation, which is composed of a healthy infant feature space acquisition module, an infant video acquisition module, a static posture feature extraction module, a dynamic posture feature extraction module and an abnormality detection module, can be established. In the system, a supine position video of any section of baby is input into a baby video acquisition module, a static posture feature extraction module extracts the static posture feature of the baby of each frame in the video, and then a trained baby dynamic posture feature extraction network is used for extracting features. And then evaluating the risk of the abnormal posture of the current baby according to the distribution of the dynamic posture characteristics of the healthy baby. And if the risk value is larger than a certain threshold value, determining that the current infant has the risk of cerebral palsy.

It should be noted that the threshold value of the probability density for determining whether the infant is at risk of cerebral palsy needs to be determined in practical applications by considering the recall rate and accuracy of the screening.

Also, based on the same inventive concept, another preferred embodiment of the present invention further provides a computer electronic device corresponding to the early screening system for cerebral palsy based on baby dynamic posture estimation provided by the foregoing embodiments, which is characterized by comprising a memory and a processor;

the memory for storing a computer program;

the processor is configured to output the detection result when executing the computer program, by using the early screening system for cerebral palsy based on baby dynamic posture estimation as described in the foregoing embodiments.

Also, based on the same inventive concept, another preferred embodiment of the present invention further provides a computer-readable storage medium corresponding to the early screening system for infantile dynamic posture estimation based cerebral palsy provided by the above embodiments, wherein the storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the computer program can output the detection result by using the early screening system for infantile dynamic posture estimation based cerebral palsy as described in the previous embodiments.

The modules in the early screening system for cerebral palsy based on baby dynamic posture estimation are executed as program modules which are executed in sequence, so that the system is essentially a flow for executing data processing. Specifically, when being executed, the computer program in the above embodiment is equivalent to executing a method for screening early cerebral palsy based on baby dynamic posture estimation, and the process is as follows:

step 1, acquiring a feature space constructed by a baby dynamic posture feature sequence of a healthy baby without cerebral palsy;

step 2, acquiring a video segment to be detected, wherein the baby to be screened is in a non-shielding supine position and moves autonomously;

step 3, extracting baby posture information in the to-be-detected video clip frame by frame through a pre-trained baby supine position posture estimation model, and coding each frame of baby posture information to obtain corresponding baby static posture characteristics; the infant supine position posture estimation model is obtained by fine tuning a pre-trained human body posture estimation model on an infant supine position posture estimation data set, wherein the infant supine position posture estimation data set is composed of key frames of a supine position posture infant motion video marked by joint points;

step 4, taking the baby static attitude feature of each frame in the video clip to be detected as input, extracting the network coding of the whole video clip to be detected by the pre-trained baby dynamic attitude feature, and obtaining a baby dynamic attitude feature sequence after the time sequence information is synthesized in the video clip to be detected;

and 5, carrying out abnormity detection on the baby dynamic posture characteristic sequence of the video segment to be detected in the characteristic space, and outputting a detection result of the risk that the baby suffers from cerebral palsy in the video.

Because the principle of solving the problems of the early screening method for cerebral palsy based on baby dynamic posture estimation is similar to that of the early screening system for cerebral palsy based on baby dynamic posture estimation in the embodiment of the present invention, the detailed implementation forms of the modules of the device in this embodiment may also be referred to the detailed implementation forms of the above system parts, and the repeated parts are not described again.

It is understood that the storage medium and the Memory may be Random Access Memory (RAM) or Non-Volatile Memory (NVM), such as at least one disk Memory. Meanwhile, the storage medium may be various media capable of storing program codes, such as a U-disk, a removable hard disk, a magnetic disk, or an optical disk.

It is understood that the Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components.

It should be further noted that, as will be clearly understood by those skilled in the art, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again. In the embodiments provided in the present application, the division of the steps or modules in the apparatus and method is only one logical function division, and in actual implementation, there may be another division manner, for example, multiple modules or steps may be combined or may be integrated together, and one module or step may also be split.

Also, based on the same inventive concept, another preferred embodiment of the present invention further provides an early screening device for cerebral palsy, which corresponds to the early screening system for cerebral palsy based on baby dynamic posture estimation provided by the foregoing embodiment, and comprises a video capturing device and a detecting device;

the detection device is used for reading the video segment shot by the video acquisition device as a video segment to be detected, detecting by using the early screening system for cerebral palsy based on baby dynamic posture estimation in the embodiment, and outputting the detection result.

It should be noted that the video capture device may be a camera carried on a video camera, a mobile phone, or other devices, and may capture and maintain the video of the activity of the infant after receiving the capture instruction. The aforementioned detection device may be a data processing device with a result prompting function, such as a computer with a function of displaying a detection result through a display, or a computer capable of outputting a detection result report, or a web page that provides an uploaded video clip through a cloud server and displays the detection result.

It should be noted that, for convenience and simplicity of description, a specific working process of the system described above may refer to a corresponding process in the foregoing method embodiment, and details are not described herein again. In the embodiments provided in the present application, the division of the steps or modules in the system and method is only one logical function division, and when the system and method are actually implemented, there may be another division manner, for example, multiple modules or steps may be combined or may be integrated together, and one module or step may also be split.

The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims

1. An early screening system for cerebral palsy based on baby dynamic posture estimation, comprising:

2. The early screening system for cerebral palsy based on baby dynamic posture estimation as claimed in claim 1, wherein in the healthy baby feature space obtaining module, the feature space obtaining method is as follows:

s1: acquiring a video data set of a healthy baby, wherein the age of the baby in each section of video fragment sample in the data set meets a preset age interval, the baby belongs to the healthy baby without cerebral palsy, and the baby needs to be in a supine position without being shielded in a video and perform autonomous movement;

3. The early stage screening system for cerebral palsy based on baby dynamic pose estimation as claimed in claim 1 or 2, wherein the baby supine position pose estimation model takes a human pose estimation model OpenPose pre-trained on a human pose estimation data set as a training starting point, and a back propagation algorithm is used to fine tune OpenPose model parameters on the artificially labeled baby supine position pose estimation data set, thereby obtaining a neural network model dedicated for baby supine position pose estimation; for a piece of image of the infant in the supine position, the infant supine position posture estimation model extracts position information of 8 body parts as posture information, the extracted body parts including bilateral eyes, a neck, bilateral shoulders, bilateral elbows, bilateral wrists, a hip, bilateral knees and bilateral ankles.

4. The system as claimed in claim 3, wherein the method for encoding the infant posture information of each frame is as follows:

5. The system as claimed in claim 1, wherein the baby dynamic posture feature extraction network uses a Transformer encoder as a network structure, and its input is the baby static posture feature and the time interval between the time of the frame corresponding to the static posture feature and the start time of the video segment.

6. The system for early screening of cerebral palsy based on baby dynamic posture estimation as claimed in claim 2, wherein when the baby dynamic posture feature extraction network is trained in advance in a self-supervision manner, the mask reconstruction is used as an agent task to generate a training signal, so as to enhance the timing information in the encoded features, and the specific training steps are as follows:

7. The early stage screening system for cerebral palsy based on baby dynamic posture estimation as claimed in claim 1, wherein the abnormality detection module specifically performs the following steps:

then, for the baby dynamic attitude characteristic sequence x of the video segment to be detected ^* Calculating the probability density value of the healthy baby in the dynamic posture distribution:

finally, according to the calculated probability density value p (x) ^* ) Making risk judgment if p (x) ^* ) If the value is less than the threshold epsilon, the baby in the video segment to be detected is considered to have the risk of suffering from cerebral palsy, otherwise, the baby in the video segment to be detected is considered to be a normal baby without suffering from cerebral palsy.

8. A computer electronic device comprising a memory and a processor;

the memory for storing a computer program;

the processor, when executing the computer program, is capable of outputting the detection result by using the early screening system for cerebral palsy based on baby dynamic posture estimation as claimed in any one of claims 1 to 7.

9. A computer-readable storage medium, wherein the storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the computer program can output the detection result by using the early screening system for infantile dynamic posture estimation based cerebral palsy as claimed in any one of claims 1-7.

10. The early screening equipment for the cerebral palsy is characterized by comprising video acquisition equipment and detection equipment;

the detection device is used for reading the video clip shot by the video acquisition device as a video clip to be detected, detecting the video clip by using the early screening system for the cerebral palsy based on the baby dynamic posture estimation as claimed in any one of claims 1 to 7, and outputting the detection result.