CN113762085B

CN113762085B - Artificial intelligence-based infant incubator system and method

Info

Publication number: CN113762085B
Application number: CN202110917475.7A
Authority: CN
Inventors: 汤福南; 羊月祺; 耿向南; 管亚飞; 任义梅; 张晖; 汪缨; 张作恒
Original assignee: Jiangsu Province Hospital First Affiliated Hospital With Nanjing Medical University
Current assignee: Jiangsu Province Hospital First Affiliated Hospital With Nanjing Medical University
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2022-04-19
Anticipated expiration: 2041-08-11
Also published as: CN113762085A

Abstract

The invention relates to an artificial intelligence-based infant incubator system and method. The main controller comprises a baby limb action detection module, a baby face detection module and a sound detection module. The baby limb motion detection module: analyzing the video stream file, judging whether the baby in the incubator has abnormal limb movement, and if so, sending related alarm information to a computer; the infant face detection module: analyzing the video stream file, detecting the facial expressions of the infant, including whether eyes and mouth are opened or not and the respective angles under the opening condition, and if the expressions are judged to be crying, sending related alarm information to a computer; the sound detection module: and analyzing the audio stream file by a deep learning method, and judging whether the sound in the incubator is normal or not, whether the sound of crying and crying of the baby exists or not, or whether the sound of alarming of the incubator exists or not.

Description

Artificial intelligence-based infant incubator system and method

Technical Field

The invention relates to the technical field of infant incubators, in particular to a method for designing an infant incubator based on deep learning and computer vision and computer hearing technologies.

Background

The infant incubator integrates the advanced technologies of clinical medicine, machinery, computer automatic control, sensors and other subjects, and provides a good environment with clean air, proper temperature and humidity and similar to the mother uterus for premature infants and sick infants. The humiture in the infant incubator can be set for according to the doctor's advice, and air temperature and humidity, infant skin temperature all have digital display, surpass normal value and can report to the police with the reputation mode, for example overtemperature alarm, fan warning etc..

A plurality of infant incubators are placed in a concentrated mode in a newborn room of a hospital, parents cannot enter into the visiting room, nurses need to enter the room regularly to patrol, and whether infants are abnormal in each infant incubator or not is checked in a short distance. If during two rounds the baby has abnormal behaviour such as crying, struggling etc. a certain risk is created. However, in the infant incubator in the prior art, parameters such as internal temperature, humidity, oxygen concentration and the like are only detected and an alarm is given, so that medical care personnel cannot avoid time delay when getting over. Moreover, the prior art can not carry out intellectual detection system to baby's self state through the environment in the conventional sensor detection incubator only.

The infant has high requirements on the environment, such as certain requirements on conditions of temperature, humidity, noise and the like. However, when the environmental conditions are satisfied, the infant may still experience discomfort by the movement of the body due to the disease or external causes. In addition, the abnormal behavior of the infant in the incubator is usually accompanied by some facial states, such as eyes being closed, mouth being opened and crying, etc. In particular, in the field of neonatal clinical care, great attention is paid to checking whether the infant has convulsion, which is an important index for judging whether the nervous system of the infant is normal. Tics are rapid, repetitive clonic or tonic involuntary motor impulses that start at the face, especially around the eyes and mouth, and extend to the extremities and trunk. Convulsions are manifested as blinking convulsions in the eyes, arm tremors and foot-pedaling convulsions in the limbs. Convulsions are likely to occur during two rounds of visits by medical personnel, and if the convulsions are not discovered and treated early, serious consequences can be caused.

Therefore, how to provide a system and a method capable of detecting and judging the state of the infant in the infant incubator in time is a technical problem to be urgently solved by technical personnel in the field, and the system and the method comprise the steps of judging whether the infant is in a normal state or not by detecting the limb movement, the facial features and the sound of the infant, and detecting whether the infant generates abnormality such as twitch or the like in real time.

Disclosure of Invention

In view of the above problems, the present invention provides an artificial intelligence based infant incubator system and method, which can determine whether an infant has behavior similar to twitching or struggling by detecting the limb movement of the infant in the infant incubator. Whether the infant is in a quiet sleeping state is judged by detecting facial features of the infant in the infant incubator. Whether the baby cries or not and the alarm sound of the incubator are judged by detecting the sound in the baby incubator.

In order to solve the above problems, the present invention provides an artificial intelligence based infant incubator system, which includes an infant incubator main body, a main controller, a camera module, a microphone module, a conventional sensor module, and a computer. The microphone module collects sound, and can accurately identify whether the sound inside the incubator is normal or whether the baby cries or gives an alarm. The camera module collects video streams, and can identify the limb actions and the facial expressions of the infants.

When no anomalies are detected by the data of the conventional sensor module, the baby can still feel sick or otherwise untimely, often crying and following some limb movement of greater magnitude. Therefore, the invention is based on an artificial intelligence mode, the camera module is used for collecting the internal picture of the infant incubator, and the facial expression and the limb movement of the infant are detected to judge whether the infant state is abnormal; meanwhile, the microphone module is used for collecting the sound inside the infant incubator, the sound is classified in a deep learning mode, and the existence of crying and the alarm of the incubator are detected.

Wherein conventional sensor modules include, but are not limited to, air temperature sensors, skin temperature sensors, humidity sensors, oxygen sensors, load cells, noise sensors.

The camera module is used for collecting video pictures in the infant incubator and transmitting the video pictures to the main controller in a video stream mode; the microphone module is used for collecting sound in the infant incubator and transmitting the sound to the main controller. The computer is used for simultaneously displaying the video pictures of a plurality of infant incubators, and is beneficial to centralized management. Meanwhile, the computer receives the data of the conventional sensors of the respective infant incubators transmitted from the main controller, and collectively displays and processes the data.

The main controller comprises a baby limb action detection module, a baby face detection module, a sound detection module and a sensor data acquisition module. The baby limb motion detection module: analyzing the video stream file, judging whether the baby in the incubator has abnormal limb movement, and if so, sending related alarm information to a computer; the infant face detection module: analyzing the video stream file, detecting the facial expressions of the infant, including whether eyes and mouth are opened or not and the respective angles under the opening condition, and if the expressions are judged to be crying, sending related alarm information to a computer; the sound detection module: analyzing the audio stream file by a deep learning method, and judging whether the sound in the incubator is normal or not, whether the sound of crying and crying of the baby exists or not, or whether the sound of alarming of the incubator is normal or not; the sensor data acquisition module: data of conventional sensors of the infant incubator are collected.

The baby limb motion detection module:

1) acquiring a video stream through a camera in the infant incubator, and reading each frame of image in the video stream;

2) inputting each frame of image into the trained neural network model in sequence, and detecting the positions and confidence degrees of key points of the limbs of the baby, wherein the key points of the limbs comprise left and right wrists, left and right elbows, left and right ankles, left and right knees, left and right shoulders and left and right hips;

3) by connecting specific limb key pointsConnecting into line segments, and marking the left arm, the left elbow, the right arm, the right elbow, the left thigh, the right thigh and the calf; the number of the key points of the limbs is more than or equal to 12, and when the number of the key points of the limbs is 12, (x)₁，y₁) Is the key point coordinate of the left wrist; (x)₂，y₂) Coordinates of key points of the left elbow; (x)₃，y₃) Is the key point coordinate of the right wrist; (x)₄，y₄) Coordinates of key points of the right elbow; (x)₅，y₅) Is the key point coordinate of the left ankle; (x)₆，y₆) Coordinates of key points of the left knee; (x)₇，y₇) Is the coordinate of the key point of the right ankle; (x)₈，y₈) Is the key point coordinate of the right knee; (x)₉，y₉) Is the key point coordinate of the left shoulder; (x)₁₀，y₁₀) Is the key point coordinate of the right shoulder; (x)₁₁，y₁₁) The coordinates of key points of the left hip; (x)₁₂，y₁₂) Is the coordinate of the key point of the right hip; when the number of the limb key points is more than 12, the coordinates of the limb key points can be randomly selected among the 12 limb key points;

4) the angle formed between the significant segments, including the angle θ between the left arm and the left elbow, is calculated and recorded₁Angle theta of right arm and right elbow₂Angle between left elbow and body θ₃Angle between right elbow and body θ₄Angle θ between left thigh and calf₅Angle θ between right thigh and lower leg₆Angle theta formed by the line segment connecting the left thigh and the left and right buttocks₇The angle theta formed by the line segment connecting the right thigh and the right and left buttocks₈。θ₁～θ₈The calculation method of (2) is as follows:

at theta₁For example, the sum θ is calculated first₁The lengths a, b and c of three line segments connected by three related key points, wherein a represents the length of the line segment connected by the key points of the left elbow and the left shoulder, b represents the length of the line segment connected by the key points of the left elbow and the left wrist, and c represents the length of the line segment connected by the key points of the left shoulder and the left wrist, and then theta is calculated according to the cosine theorem₁；

5) Combining the key point position in the continuous N frames of images with theta₁～θ₈Recording, and carrying out correlation calculation and comparison by taking the first frame image as an initial position state; n can be set manually, the value of N is related to the frame rate and time, and when the frame rate is g, if the change of pictures for f seconds is required to be checked, the N is g multiplied by f; for example: when the frame rate is 30, if the change of the continuous 5-second picture is to be checked, taking N as 30 × 5 as 150; when the frame rate is 60, if the change of the continuous 10-second picture is to be checked, taking N as 60 × 10 as 600; the value of N is preferably 30-600.

6) In the continuous N frames of images, calculating the coordinates of key points corresponding to the left and right wrists or the left and right ankles in the nth frame of image (N is more than 1 and less than or equal to N) in real time according to the following formula to obtain epsilon, taking the key point of the left wrist as an example, and the coordinate of the key point of the left wrist in the 1 st frame of image is

The coordinates of the key point of the left wrist in the nth frame image are

And calculating according to a formula to obtain the epsilon corresponding to the frame. When epsilon is more than or equal to epsilon₀Judging that the key point moves greatly; when epsilon < epsilon₁When the key point is detected to be in the preset position, the key point is judged to have only small movement. E is changed from e within a preset second (preferably 0.5 second)₁Up to epsilon₀Then rapidly drops to epsilon₁The limb associated with this keypoint is represented by a large amplitude back and forth movement of the infant. Epsilon₀、ε₁Is a preset value, can be freely selected, epsilon₀Preferably 0.2 to 0.5,. epsilon₁Preferably 0.01 to 0.1. μ in the formula₁、μ₂Can be set manually, mu₁Preferably 0.5 to 0.8 mu₂Preferably 0.2 to 0.5.

7) If theta is greater than theta₁～θ₈Wherein the variation Delta theta of one angle theta is larger than the percentage K of the angle theta_θI.e. Delta theta > K_θAt x θ, (K)_θThe decimal value is 0-1), the body part related to the theta is considered to have larger actions, such as swinging arms, pedaling feet and the like.

8) If at a period of time t₁Within seconds (t)₁The value is preferably 3-10), and the number of times of continuously recording the larger movement of the hands or feet of the baby is more than N₁If the moving frequency of the hands or the feet is high, the infant is judged to have abnormal behaviors (E1 is automatically set to be 1), and related alarm content needs to be sent possibly because special conditions are struggling or twitch behaviors and the like occur; n is a radical of₁The preset value can be freely selected, and is preferably 3-15.

9) If at a period of time t₂Within seconds (t)₂The value is preferably 3-10), and the number of times of larger actions of the limb part of the baby is continuously recorded to be more than N₂If the swing frequency of the limb part is higher, it is determined that the infant has an abnormal behavior (E1 is automatically set to 1), and related alarm content needs to be sent because a special condition is struggling or a twitch behavior occurs; n is a radical of₂The preset value can be freely selected, and is preferably 3-15.

10) If limb movement abnormality is not detected, E1 is set to 0 by default;

the infant face detection module:

2) inputting each frame image into a trained neural network model for detecting the face, detecting the position and confidence of key points of the face of the baby, and selecting S around the left eye and the right eye₁A key point (S)₁Even number), S is selected around the mouth₂A key point (S)₂An even number).

3) The present invention uses α to represent the degree to which the eye is open; when the value of S1 is n, n is more than or equal to 4, and alpha is calculated according to the following formula; p₁～P_nIs a key point around the eye, where P₁And P_nIs a left and a right end points of an eye, P₂、P₄、P₆To P_n-2Is the key point of the upper half of the eye contour, P₃、P₅、P₇、P_n-1Is the lower half of the eye contour, and P₂And P₃Is a key point of upper and lower symmetry, P₄And P₅Is a key point of upper and lower symmetry, P₆And P₇Is the key point of the upper and lower symmetry, … …, and so on, P_n-2And P_n-1Is a key point of upper and lower symmetry.

4) When alpha remains a constant alpha greater than 0₀When the eyes of the baby are in an open state; when α remains a constant α close to 0₁When the infant is in the eye-closing state, the infant is in the eye-closing state; when alpha is from alpha within a preset second (preferably 0.5 second)₀Rapidly drops to alpha₁Indicating that the infant has performed a blinking action. Wherein alpha is₀And alpha₁Can be respectively preset artificially according to the actual conditions of the infants, alpha₀The value range is 0.2-0.3, alpha₁The value range is 0-0.1.

5) When the eye is open at alpha₀And alpha₁And may have blinked twitches (E2 automatically set to 1) and associated alert content needs to be sent.

The baby mouth detection module comprises:

2) inputting each frame image into a trained neural network model for detecting the mouth, detecting the position and confidence of key points of the mouth of the baby, and selecting S around the mouth₂Selecting a plurality of key points to represent the opening degree of the mouth, and using beta to represent the key points; when S is₂When the value is n, n is more than or equal to 4 and is an even number, and beta is calculated by the following formula; q₁～Q_nIs a key point around the mouth, where Q₁And Q_nIs a mouth with two left and right end points, Q₂、Q₄、Q₆To Q_n-2Is the key point of the upper half of the mouth contour, Q₃、Q₅、Q₇、Q_n-1Is the lower half of the mouth contour, and Q₂And Q₃Is a key point of upper and lower symmetry, Q₄And Q₅Is a key point of upper and lower symmetry, Q₆And Q₇Is the key point of the upper and lower symmetry, … …, and so on, Q_n-2And Q_n-1Is a key point of upper and lower symmetry.

3) Likewise, when β remains a constant β greater than 0₀When the baby is in the open state, the baby mouth is in the open state; when beta remains a constant beta close to 0₁When the baby is in a closed state, the baby stands for that the mouth of the baby is in a closed state; wherein beta is₀And beta₁Can be respectively preset artificially according to the actual conditions of the infants, beta₀The value range is 0.2-0.5, beta₁The value range is 0-0.1.

4) When the mouth opening degree is beta₀And beta₁And if the change is repeated, the state is abnormal (E2 is automatically set to 1), and related alarm content needs to be sent.

5) E2 is set to 0 by default if no blink tic abnormalities or mouth region abnormalities are detected;

the sound detection module:

1) acquiring an audio stream through a microphone in a baby incubator;

2) selecting sound segments with the time length within preset seconds from the audio stream;

3) inputting the sound segments into a trained sound classification model, and outputting sound classification and confidence;

4) when the output sound is classified as baby crying and the confidence is greater than X₁When the alarm is received, the main controller sends a relevant alarm to the computer;

X₁the preset value is set between 80 percent and 90 percent;

5) when the output sound is classified as an incubator alarm sound and the confidence is greater than X₂When the alarm is received, the main controller sends a relevant alarm to the computer; x₂The preset value is set between 80 percent and 90 percent;

6) when the output sound is classified as normal environmental sound, the sound in the incubator is considered to be abnormal;

7) further, the training process of the sound classification model specifically includes:

8) firstly, collecting an audio data set related to a baby incubator, wherein the data set comprises 3 or more audio categories and sound clips including artificially marked baby crying, incubator alarm sound and other categories;

9) then establishing a sound classification neural network model which comprises C₁A convolutional layer and C₂A full connection layer; c₁The value is preferably 4-12, C₂The value is preferably 2.

10) Setting training parameters of the sound classification neural network model, wherein the training parameters comprise a training time Epoch and a Learning rate Learning _ rate;

11) and importing the audio data set into a sound classification neural network model, finishing training and storing the trained model.

The invention also provides an artificial intelligence-based infant incubator method, which comprises

The method for detecting the limb movement of the infant specifically comprises the following steps:

3) marking the left arm, the left elbow, the right arm, the right elbow, the left thigh, the right thigh and the lower leg by connecting specific limb key points into line segments; the number of the key points of the limbs is more than or equal to 12, and when the number of the key points of the limbs is 12, (x)₁，y₁) Is the key point coordinate of the left wrist; (x)₂，y₂) Coordinates of key points of the left elbow; (x)₃，y₃) Is the key point coordinate of the right wrist; (x)₄，y₄) Coordinates of key points of the right elbow; (x)₅，y₅) Is the key point coordinate of the left ankle; (x)₆，y₆) Coordinates of key points of the left knee; (x)₇，y₇) Is the coordinate of the key point of the right ankle; (x)₈，y₈) Is the key point coordinate of the right knee; (x)₉，y₉) Is the key point coordinate of the left shoulder; (x)₁₀，y₁₀) Is the key point coordinate of the right shoulder; (x)₁₁，y₁₁) The coordinates of key points of the left hip; (x)₁₂，y₁₂) Is the coordinate of the key point of the right hip; when the number of the limb key points is more than 12, the coordinates of the limb key points can be randomly selected among the 12 limb key points;

The coordinates of the key point of the left wrist in the nth frame image are

8) If at a period of time t₁Within seconds (t)₁The value is preferably 3-10), and the number of times of continuously recording the movement of hands or feet of the baby is more than N₁If the moving frequency of the hands or the feet is high, the infant is judged to have abnormal behaviors, and the infant struggles possibly due to special conditions; n is a radical of₁The preset value can be freely selected, and is preferably 3-15.

9) If at a period of time t₂Within seconds (t)₂The value is preferably 3-10), and the number of times of larger actions of the limb part of the baby is continuously recorded to be more than N₂That is, the frequency of the swing of the limb part is high, it is determined that the baby has abnormal behavior, possibly because ofSpecial cases are struggling; n is a radical of₂The preset value can be freely selected, and is preferably 3-15.

Infant face detection comprising the steps of:

4) When alpha remains a constant alpha greater than 0₀When the eyes of the baby are in an open state; when α remains a constant α close to 0₁When the infant is in the eye-closing state, the infant is in the eye-closing state; when alpha is from alpha within a preset second (preferably 0.5 second)₀Rapidly drops to alpha₁Indicating that the infant has performed a blinking action. Wherein alpha is₀And alpha₁Can be respectively preset artificially according to the actual conditions of the infants, alpha₀The value range is 0.2-0.3, alpha₁Value rangeThe circumference is 0-0.1.

Infant mouth detection comprising the steps of:

4) When the mouth opening degree is beta₀And beta₁If the difference is repeated, it is abnormal.

Sound detection, comprising the steps of:

1) acquiring an audio stream through a microphone in a baby incubator;

4) when the output sound is classified as baby crying and the confidence is greater than X₁When the alarm is received, the main controller sends a relevant alarm to the computer; x₁The preset value is set between 80 percent and 90 percent;

The software process of the artificial intelligent incubator system comprises the following steps:

reading the video stream from the camera module, and detecting the limb movement and the face area of the baby;

reading the audio stream from the microphone module, and carrying out sound classification detection in the infant incubator;

after the detection module is called, if the abnormity is found, the corresponding alarm content is sent to a computer end for displaying; if no exception is found, the above flow continues from the beginning.

And after receiving the corresponding alarm content, the computer scores the infant state in the infant incubator. The scoring formula is Score equal to E1 × w1+ E2 × w2+ E3 × w 3. Wherein E1, E12 and E3 are Boolean values, the value of E1 represents whether the abnormal limb movement occurs, the value of E1 is 0 represents that the abnormal limb movement is not detected, and the value of E1 is 1 represents that the abnormal limb movement is detected; the value of E2 represents whether the face area is detected to be abnormal or not, the value of E2 represents that the face area is not detected to be abnormal, and the value of E2 represents that the face area is detected to be abnormal; the value of E3 represents whether the sound detection is abnormal or not, the value of E3 is 0 which represents that the abnormal sound is not detected, and the value of E3 is 1 which represents that the abnormal sound is detected; w1, w2 and w3 represent the weight of the three types of events respectively, the value ranges of w1, w2 and w3 are preferably (0, 0.3), (0, 0.3) and (0, 0.4) respectively, Score represents the attention degree of the current infant, and the higher Score value represents the attention degree of the infant in the corresponding incubator.

In the embodiments provided in the embodiments of the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways.

The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).

It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should also be noted that for some implementations of the required links, for example, labeling of critical points of the limb, may be based on a plane (2D) or in three-dimensional space (3D).

In addition, each functional module in the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part. For example, the software flow diagram of fig. 2 may be executed in the main controller or migrated to the computer for execution.

It should be emphasized that, in terms of infant expression recognition, limb movement recognition and sound recognition in the infant incubator, a model stored after training by using self data or an existing model is called, which are considered as the protection scope of the present invention.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Advantageous effects

The invention can accurately acquire the state of the baby in real time by adopting the baby limb action detection technology, the baby eye detection technology, the baby mouth detection technology and the sound detection technology.

The prior art relies primarily on data changes from various conventional sensors to monitor the infant from the side in an incubator. For example, it is not appropriate to uniformly specify a comfortable range because the adaptability of each infant to the environment is different, by judging whether the temperature in the incubator is suitable for the survival of the infant or not through the air temperature sensor, or judging whether the air in the incubator is suitable for the breathing of the infant or not through the oxygen concentration sensor. The infant itself is intuitive to the environment within the incubator, and the infant will show through body language as well as crying and facial expressions. In particular, in the case that all conventional sensors have correct values, the infant may also present discomfort due to diseases or other reasons, through body language, crying and facial expressions, which requires real-time detection of the infant itself, so that the infant can be noticed in the incubator in time and better attended by medical staff.

The invention focuses more on the state of the baby, can judge whether the baby is in a healthy and comfortable state under the condition that the conventional sensor cannot detect the abnormality, and has strong clinical practicability.

The system can reduce the patrol pressure of clinical medical staff, and particularly can continuously detect the self state of the baby and carry out intelligent judgment under the condition that the number of the staff participating in nursing and patrolling at night is reduced by half;

aiming at the current situation that parents of infants cannot enter a room for visiting, the system can rebroadcast the real-time picture of the camera to the parents through the computer, so that the visiting requirement is met;

the invention discovers whether the baby has convulsion as soon as possible. Some infants are admitted because of convulsion caused by nervous system problems, and medical staff need to patrol the infants in the infant incubator for at least half an hour, but cannot perform one-to-one observation for 24 hours, and if the convulsion occurs in the period of two patrol, the infants are probably overlooked, thereby causing more serious consequences. The system can detect whether the baby twitches in real time, record corresponding pictures and send an alarm to inform medical staff to perform targeted treatment on the baby.

Drawings

FIG. 1 is a general block diagram of an artificial intelligence based infant incubator system and method according to the present invention.

FIG. 2 is a flow chart of an artificial intelligence based infant incubator system and method according to the present invention.

Fig. 3 is a schematic diagram of extraction of key points of limbs of an infant according to the present invention.

Fig. 4 is a diagram of the coordinates and the angle marks of the key points of the limbs of the baby according to the invention.

FIG. 5 is a schematic diagram of the calculation of key angles according to the present invention.

Fig. 6 is a schematic diagram of the eye key points and calculation according to the present invention.

Fig. 7 is a schematic diagram of key points and calculation of the mouth according to the present invention.

FIG. 8 is a flow chart of the training process of the infant incubator sound classification deep learning model according to the present invention.

Fig. 9 is a diagram of 68 keypoint identifiers of a human face according to an embodiment.

Detailed Description

The invention is described in further detail below:

the COCO dataset is an existing public dataset; dat is a public human face feature point library; PANNs are models of sound classification trained on Audioset data, and CNN14 is one of the pre-trained models.

The preferable embodiment of the infant incubator system and method based on artificial intelligence is as follows:

the method comprises the following steps in the aspect of detecting the limb movement of the baby:

2) and sequentially inputting each frame of image into a trained neural network model Posenet, wherein Posenet is a real-time human body posture model based on deep learning, and the model can realize real-time human body pose estimation by using an authoritative public COCO data set. Since the COCO data set represents the key points of the human body as 17 joints including the nose, the left and right eyes, the left and right ears, the left and right shoulders, the left and right elbows, the left and right wrists, the left and right hips, the left and right knees, and the left and right ankles, in this embodiment, the infant limb is detected using PoseNet, 12 key points are selected and the position and confidence thereof are saved, and the 12 key points includeLeft and right wrists, left and right elbows, left and right shoulders, left and right ankles, left and right knees, left and right hips; wherein (x)₁，y₁) The key point coordinates of the left wrist; (x)₂，y₂) Coordinates of key points of the left elbow; (x)₃，y₃) The key point coordinates of the right wrist; (x)₄，y₄) Coordinates of key points of the right elbow; (x)₅，y₅) Is the key point coordinate of the left ankle; (x)₆，y₆) Coordinates of key points of the left knee; (x)₇，y₇) Is the coordinate of the key point of the right ankle; (x)₈，y₈) Is the key point coordinate of the right knee; (x)₉，y₉) Is the key point coordinate of the left shoulder; (x)₁₀，y₁₀) Is the key point coordinate of the right shoulder; (x)₁₁，y₁₁) The coordinates of key points of the left hip; (x)₁₂，y₁₂) Is the coordinate of the key point of the right hip;

3) marking the left arm, the left elbow, the right arm, the right elbow, the left thigh, the right thigh and the lower leg by connecting specific key points into line segments;

4) the angles formed between some important line segments, including the angle theta between the left arm and the left elbow, are calculated and recorded₁Angle theta of right arm and right elbow₂Angle between left elbow and body θ₃Angle between right elbow and body θ₄Angle θ between left thigh and calf₅Angle θ between right thigh and lower leg₆Angle theta formed by the line segment connecting the left thigh and the left and right buttocks₇The angle theta formed by the line segment connecting the right thigh and the right and left buttocks₈。θ₁～θ₈The calculation method of (2) is as follows:

at theta₁For example, the sum θ is calculated first₁The lengths a, b and c of three line segments connected by three related key points, wherein a represents the length of the line segment connected by the key points of the left elbow and the left shoulder, b represents the length of the line segment connected by the key points of the left elbow and the left wrist, and c represents the length of the line segment connected by the key points of the left shoulder and the left wrist, and then theta is calculated according to the cosine theorem₁。

5) Combining the key point position in the continuous N frames of images with theta₁～θ₈Recording, and carrying out correlation calculation and comparison by taking the first frame image as an initial position state; n may be set manually, and in this embodiment, N is set to 30.

The coordinates of the key point of the left wrist in the nth frame image are

And calculating according to a formula to obtain the epsilon corresponding to the frame. When epsilon is more than or equal to epsilon₀Judging that the key point moves greatly; when epsilon < epsilon₁When the key point is detected to be in the preset position, the key point is judged to have only small movement. E is changed from e within a preset second (preferably 0.5 second)₁Up to epsilon₀Then rapidly drops to epsilon₁The limb associated with this keypoint is represented by a large amplitude back and forth movement of the infant. In this example ε₀Is set to 0.3, ∈₁Set to 0.1. μ in the formula₁、μ₂Can be set manually, and mu in the embodiment₁Set to 0.6, mu₂Set to 0.4.

7) If theta is greater than theta₁～θ₈Wherein the variation Delta theta of one angle theta is larger than the percentage K of the angle theta_θ(K_θThe value is 0.3), namely when delta theta is larger than theta multiplied by 30%, the relevant limb part is considered to have larger swinging actions, such as swinging arms, pedaling feet and the like. For example, in the 1 st and 3 rd frame images, θ₁Are respectively theta₁、θ₁', the corresponding angle change amount Delta theta₁＝|θ₁’-θ₁When Δ e₁When the angle is larger than theta multiplied by 30 percent, the swing action is considered to occur between the left wrist and the left arm;

8) if at a period of time t₁Within seconds (t)₁The value is 3-10), the moving times of hands or feet of the baby are continuously recorded to be more than N₁If the moving frequency of the hands or the feet is high, the infant is judged to have abnormal behaviors, and the infant struggles possibly due to special conditions; t is t₁A value of 3, N₁The value is 5.

9) If at a period of time t₂Within seconds (t)₂The value is 3-10), and continuously recording that the number of times of large swing actions of the limb part of the baby is more than N₂If the swing frequency of the limb part is higher, the infant is judged to have abnormal behavior, and the infant struggles possibly due to special conditions; t is t₂A value of 3, N₂The value is 5.

The method comprises the following steps in the aspect of infant face detection:

2) and sequentially inputting each frame image into a trained face landmark predictor shape _ predictor 68 face _ maps for detecting faces, wherein the predictor is a trained face feature point library and can identify faces by 68 key points. The predictor can be used to obtain the positions and the confidence degrees of the key points of the face of the baby, in the embodiment, 6 key points are respectively selected around the left eye and the right eye, and 6 key points are selected around the mouth.

3) The present embodiment uses α to indicate the degree to which the eyes are open. 6 key points are selected around the left eye and the right eye, and alpha is calculated according to the following formula. Wherein P is₁～P₆Is a key point around the eye, P₁And P₆Is the left and right two end points of the eye, P₂And P₄One third and two thirds of the location point, P, of the upper half of the eye contour, respectively₃And P₅One third and two thirds of the location point of the lower half of the eye contour, respectively.

6) The baby's mouth region is also detected in this way. As shown in FIG. 9, since there are 20 feature points in the mouth, 6 of them are selected to represent the feature points, and the degree of mouth opening is represented by β, which is calculated as follows, wherein Q₁～Q₆Is a key point around the mouth, Q₁And Q₆Is the left and right ends of the mouth, Q₂And Q₄One third and two thirds of the location point, Q, of the upper half of the mouth contour, respectively₃And Q₅One third and two thirds of the location point of the lower half of the mouth contour, respectively.

4) When alpha is kept at a constant alpha larger than 0₀When the eyes of the baby are in an open state; when α remains at some constant α close to 0₁When the infant is in the eye-closing state, the infant is in the eye-closing state; when alpha is from alpha₀Rapidly drops to alpha₁Indicating that the infant has performed a blinking action. Wherein alpha is₀And alpha₁Can be respectively preset artificially according to the actual conditions of the infants, alpha₀The value range is 0.2-0.3, alpha₁The value range is 0-0.1.

5) Similarly, when β remains at some constant β greater than 0₀When the baby is in the open state, the baby mouth is in the open state; when beta remains at some constant beta close to 0₁When the baby is in a closed state, the baby stands for that the mouth of the baby is in a closed state; wherein beta is₀And beta₁Can be respectively preset artificially according to the actual conditions of the infants, beta₀The value range is 0.2-0.5, beta₁The value range is 0-0.1.

At a period of time t₄Within seconds (t)₄The value range is 0-5), if the baby is detected to be in a quiet state, for example, eyes and mouths are closed within 10 seconds or occasionally blinking motions (blinking ranges from 2 to 5 times), the baby state is normal; if eye closure is detected, the mouth is open at beta₀And beta₁The change is repeated, which indicates that the baby is possibly in a crying state and needs to be paid attention by medical staff in time; t is t₄The value is 3;

the method comprises the following steps in terms of sound detection:

1) acquiring an audio stream through a microphone in a baby incubator;

2) selecting time length t from audio stream₅Sound fraction of seconds, t₅The value is 10;

4) if the output sound is classified as baby crying and the confidence coefficient is more than 80%, the main controller sends a related alarm to the computer;

5) if the output sound is classified as an incubator alarm sound and the confidence is greater than 80%, the main controller sends a relevant alarm to the computer;

6) if the output sound is classified as normal environmental sound, the sound in the incubator is considered to be abnormal;

8) firstly, collecting a data set of 3000 infant incubator audios, wherein the data set comprises 3 categories including artificially marked infant crying, incubator alarm sound and sound under normal conditions;

9) then calling a neural network model CNN14, wherein the model comprises 12 convolutional layers and 2 full-connection layers; setting training parameters of the sound classification neural network model, wherein the training parameters comprise training times Epoch, Learning rate Learning _ rate and the like;

the Epoch value is 5000, and the Learning _ rate value is 0.2;

10) and importing the audio data set into the neural network model, finishing training and storing the trained model.

11) After the detection module is called, if the abnormity is found, the corresponding alarm content is sent to a computer end for displaying;

12) if no exception is found, the above flow continues from the beginning.

When the sound detection module uses the neural network model to identify the sound in the incubator, the sound detection module is interfered by the crying of the baby in other incubators in the same room, so that the sound detection module cannot be used for final judgment directly according to the output result of the neural network. The mouth detection module is combined with the baby mouth detection module to judge, and if the sound classification detection module detects crying of the baby, the mouth detection module is required to judge whether crying movement exists in the mouth of the baby. If the two modules simultaneously judge that the baby cries, the related abnormal conditions in the incubator can be confirmed; if the sound classification detection module detects crying, but the mouth detection module judges that the mouth of the baby is not opened, the baby is considered to be crying interference of babies in other incubators, and therefore errors of alarm content are avoided.

Claims

1. An artificial intelligence-based infant incubator system comprises an infant incubator main body and a conventional sensor module, wherein the conventional sensor module comprises an air temperature sensor, a skin temperature sensor, a humidity sensor, an oxygen sensor, a weighing sensor and a noise sensor; the system is characterized by also comprising a main controller, a camera module, a microphone module and a computer;

the camera module is used for collecting video pictures in the infant incubator and transmitting the video pictures to the main controller in a video stream mode;

the microphone module is used for collecting sound in the infant incubator and transmitting the sound to the main controller;

the main controller comprises a baby limb action detection module, a baby face detection module, a sound detection module, a sensor data acquisition module and a transmission module;

the baby limb motion detection module: analyzing the video stream file, judging whether the baby in the incubator has abnormal limb movement, and if so, sending related alarm information to a computer;

the infant face detection module: analyzing the video stream file, detecting the facial expressions of the infant, including whether eyes and mouth are opened or not and the respective angles under the opening condition, and if the expressions are judged to be crying, sending related alarm information to a computer;

the sound detection module: analyzing the audio stream file by a deep learning method, and judging whether the sound in the incubator is normal or not, whether the sound of crying and crying of the baby exists or not, or whether the sound of alarming of the incubator is normal or not;

the sensor data acquisition module: collecting data of a conventional sensor of an infant incubator;

the transmission module is as follows: the judgment result and the collected data are sent to a computer,

the computer is used for simultaneously displaying video pictures of a plurality of infant incubators, alarm information analyzed through artificial intelligence and various data acquired through various sensors;

the baby limb motion detection module:

3) by connecting specific limb key points into line segments, the left arm, the left elbow, the right elbow, the left elbow, the right elbow, the left elbow, the right elbow, the left elbow, the right elbow, the left elbow, the right elbow, the left elbow, the right elbow, the left elbow, the right,The right elbow and the left and right thighs and calves; the number of the key points of the limbs is more than or equal to 12, and when the number of the key points of the limbs is 12, (x)₁,y₁) Is the key point coordinate of the left wrist; (x)₂,y₂) Coordinates of key points of the left elbow; (x)₃,y₃) Is the key point coordinate of the right wrist; (x)₄,y₄) Coordinates of key points of the right elbow; (x)₅,y₅) Is the key point coordinate of the left ankle; (x)₆,y₆) Coordinates of key points of the left knee; (x)₇,y₇) Is the coordinate of the key point of the right ankle; (x)₈,y₈) Is the key point coordinate of the right knee; (x)₉,y₉) Is the key point coordinate of the left shoulder; (x)₁₀,y₁₀) Is the key point coordinate of the right shoulder; (x)₁₁,y₁₁) The coordinates of key points of the left hip; (x)₁₂,y₁₂) Is the coordinate of the key point of the right hip; when the number of the limb key points is more than 12, the coordinates of the limb key points can be randomly selected among the 12 limb key points;

4) the angle formed between the significant segments, including the angle θ between the left arm and the left elbow, is calculated and recorded₁Angle theta of right arm and right elbow₂Angle between left elbow and body θ₃Angle between right elbow and body θ₄Angle θ between left thigh and calf₅Angle θ between right thigh and lower leg₆Angle theta formed by the line segment connecting the left thigh and the left and right buttocks₇The angle theta formed by the line segment connecting the right thigh and the right and left buttocks₈；θ₁～θ₈The calculation method of (2) is as follows:

5) Combining the key point position in the continuous N frames of images with theta₁～θ₈Recording, and carrying out correlation calculation and comparison by taking the first frame image as an initial position state; n is a preset value, the value of N is related to the frame rate and time, and when the frame rate is g, if the change of pictures for f seconds is required to be checked, the N is g multiplied by f;

6) in the continuous N frame images, the nth frame (1)<N is less than or equal to N) in the image, calculating the coordinates of the key points corresponding to the left wrist, the right ankle or the left ankle in real time according to the following formula to obtain a parameter epsilon, taking the key point of the left wrist as an example, and the coordinate of the key point of the left wrist in the 1 st frame of image as

The coordinates of the key point of the left wrist in the nth frame image are

Calculating to obtain an epsilon corresponding to the frame according to a formula; when epsilon is more than or equal to epsilon₀Judging that the key point moves greatly; when epsilon<ε₁Judging that the key point only has small movement; when epsilon is within preset second₁Up to epsilon₀Then rapidly drops to epsilon₁Representing that the limb associated with the key point of the baby moves back and forth by a larger amplitude; epsilon₀、ε₁Is a preset value of epsilon₀The value is 0.2-0.5 epsilon₁The value is 0.01-0.1; μ in the formula₁、μ₂Is a preset value, mu₁The value is 0.5-0.8 mu₂The value is 0.2-0.5;

6) if theta is greater than theta₁～θ₈Wherein the variation Delta theta of one angle theta is larger than the percentage K of the angle theta_θI.e. Delta theta>K_θAt x θ, K_θIf the value is decimal between 0 and 1, the limb part related to the theta is considered to have larger action;

7) if the moving frequency of hands or feet of the baby is continuously recorded to be greater than the preset frequency within the preset second, namely the moving frequency of the hands or the feet is higher, the baby is judged to have abnormal behaviors;

9) if the fact that the number of times of large actions of the limb part of the baby is larger than the preset number of times is continuously recorded within the preset second, namely the swing frequency of the limb part is high, the baby is judged to have abnormal behaviors;

10) if no limb movement abnormality is detected.

2. The artificial intelligence based infant incubator system of claim 1, wherein the infant face detection module:

2) inputting each frame image into a trained neural network model for detecting the face, detecting the position and confidence of key points of the face of the baby, and selecting S around the left eye and the right eye₁A key point, S₁Is an even number;

3) using the aspect ratio α of the eye to represent the degree to which the eye is open; when the value of S1 is n, n is more than or equal to 4, and alpha is calculated according to the following formula; p₁～P_nIs a key point around the eye, where P₁And P_nIs a left and a right end points of an eye, P₂、P₄、P₆To P_n-2Is the key point of the upper half of the eye contour, P₃、P₅、P₇、P_n-1Is the lower half of the eye contour, and P₂And P₃Is a key point of upper and lower symmetry, P₄And P₅Is a key point of upper and lower symmetry, P₆And P₇Is the key point of the upper and lower symmetry, … …, and so on, P_n-2And P_n-1Are key points which are symmetrical up and down;

4) when alpha remains a constant alpha greater than 0₀When the eyes of the baby are in an open state; when α remains a constant α close to 0₁When the infant is in the eye-closing state, the infant is in the eye-closing state; when alpha is within preset second from alpha₀Rapidly drops to alpha₁Indicating that the infant has performed a blinking action; wherein alpha is₀And alpha₁Respectively preset artificially according to the actual conditions of the infants, alpha₀The value range is 0.2-0.3, alpha₁The value range is 0-0.1.

3. The artificial intelligence based infant incubator system of claim 1, wherein the main controller further comprises an infant mouth detection module that:

2) inputting each frame image into a trained neural network model for detecting the mouth, detecting the position and confidence of key points of the mouth of the baby, and selecting S around the mouth₂A key point; selecting a plurality of key points to represent the opening degree of the mouth, and using beta to represent the opening degree of the mouth; when S is₂When the value is n, n is more than or equal to 4 and is an even number, and beta is calculated by the following formula; q₁～Q_nIs a key point around the mouth, where Q₁And Q_nIs a mouth with two left and right end points, Q₂、Q₄、Q₆To Q_n-2Is the key point of the upper half of the mouth contour, Q₃、Q₅、Q₇To Q_n-1Is the lower half of the mouth contour, and Q₂And Q₃Is a key point of upper and lower symmetry, Q₄And Q₅Is a key point of upper and lower symmetry, Q₆And Q₇Is the key point of the upper and lower symmetry, … …, and so on, Q_n-2And Q_n-1Are key points which are symmetrical up and down;

3) likewise, when β remains a constant β greater than 0₀When the baby is in the open state, the baby mouth is in the open state; when beta remains a constant beta close to 0₁When the baby is in a closed state, the baby stands for that the mouth of the baby is in a closed state; wherein beta is₀And beta₁Can be respectively preset artificially according to the actual conditions of the infants, beta₀The value range is 0.2-0.5, beta₁The value range is 0-0.1;

4. The artificial intelligence based infant incubator system of claim 1, wherein the sound detection module:

1) acquiring an audio stream through a microphone in a baby incubator;

9) then establishing a sound classification neural network model, wherein the model comprises a convolution layer and a full-connection layer;

5. An artificial intelligence based infant incubator method according to the system of claim 1, wherein: the method comprises the following steps of detecting the limb movement of the baby:

3) marking the left arm, the left elbow, the right arm, the right elbow, the left thigh, the right thigh and the lower leg by connecting specific limb key points into line segments; the number of the key points of the limbs is more than or equal to 12, and when the number of the key points of the limbs is 12, (x)₁,y₁) Is the key point coordinate of the left wrist; (x)₂,y₂) Coordinates of key points of the left elbow; (x)₃,y₃) Is the key point coordinate of the right wrist;(x₄,y₄) Coordinates of key points of the right elbow; (x)₅,y₅) Is the key point coordinate of the left ankle; (x)₆,y₆) Coordinates of key points of the left knee; (x)₇,y₇) Is the coordinate of the key point of the right ankle; (x)₈,y₈) Is the key point coordinate of the right knee; (x)₉,y₉) Is the key point coordinate of the left shoulder; (x)₁₀,y₁₀) Is the key point coordinate of the right shoulder; (x)₁₁,y₁₁) The coordinates of key points of the left hip; (x)₁₂,y₁₂) Is the coordinate of the key point of the right hip; when the number of the limb key points is more than 12, the coordinates of the limb key points can be randomly selected among the 12 limb key points;

8) Combining the key point position in the continuous N frames of images with theta₁～θ₈Recording, and carrying out correlation calculation and comparison by taking the first frame image as an initial position state; n is a preset value, the value of N is related to the frame rate and time, and when the frame rate is g, if the change of pictures for f seconds is required to be checked, the N is g multiplied by f;

The coordinates of the key point of the left wrist in the nth frame image are

7) if theta is greater than theta₁～θ₈Wherein the variation Delta theta of one angle theta is larger than the percentage K of the angle theta_θI.e. Delta theta>K_θAt x θ, K_θIf the value is decimal between 0 and 1, the limb part related to the theta is considered to have larger action;

8) if the moving frequency of hands or feet of the baby is continuously recorded to be greater than the preset frequency within the preset second, namely the moving frequency of the hands or the feet is higher, the baby is judged to have abnormal behaviors;

10) if no limb movement abnormality is detected.

6. The artificial intelligence based infant incubator method of claim 5, wherein: the method also comprises the following steps of detecting the face of the baby:

7. The artificial intelligence based infant incubator method of claim 5, wherein: the method also comprises baby mouth detection, and specifically comprises the following steps:

2) inputting each frame image into a trained neural network model for detecting the mouth, detecting the position and confidence of key points of the mouth of the baby, and selecting S around the mouth₂A key point; selecting a plurality of key points to represent the opening degree of the mouth, and using beta to represent the opening degree of the mouth; when S is₂When the value is n, n is more than or equal to 4 and is an even number, and beta is calculated by the following formula; q₁～Q_nIs a key point around the mouth, where Q₁And Q_nIs a mouth with two left and right end points, Q₂、Q₄、Q₆To Q_n-2Is the key point of the upper half of the mouth contour, Q₃、Q₅、Q₇、Q_n-1Is the lower half of the mouth contour, and Q₂And Q₃Is a key point of upper and lower symmetry, Q₄And Q₅Is vertically symmetricalKey point, Q₆And Q₇Is the key point of the upper and lower symmetry, … …, and so on, Q_n-2And Q_n-1Are key points which are symmetrical up and down;

8. The artificial intelligence based infant incubator method of claim 5, wherein:

the method also comprises sound detection, and specifically comprises the following steps:

1) acquiring an audio stream through a microphone in a baby incubator;