CN111931550A - Infant monitoring method based on intelligent distance perception technology - Google Patents

Infant monitoring method based on intelligent distance perception technology Download PDF

Info

Publication number
CN111931550A
CN111931550A CN202010443078.6A CN202010443078A CN111931550A CN 111931550 A CN111931550 A CN 111931550A CN 202010443078 A CN202010443078 A CN 202010443078A CN 111931550 A CN111931550 A CN 111931550A
Authority
CN
China
Prior art keywords
image
distance
camera
pooling
baby
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010443078.6A
Other languages
Chinese (zh)
Inventor
吕森
陈仁海
王晋超
冯志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202010443078.6A priority Critical patent/CN111931550A/en
Publication of CN111931550A publication Critical patent/CN111931550A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/02Alarms for ensuring the safety of persons
    • G08B21/04Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons
    • G08B21/0438Sensor means for detecting
    • G08B21/0476Cameras to detect unsafe condition, e.g. video cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Gerontology & Geriatric Medicine (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a baby monitoring method based on an intelligent distance perception technology, which is used for framing the environment around a baby; segmenting the baby in the acquired image; the distance of the divided baby relative to the bed is determined. The method for improving the upsampling part in the image segmentation algorithm by using the method with the pooling index is provided, so that more characteristic information can be obtained in the upsampling process, and the accuracy of semantic segmentation on the boundary outline of the image is improved. Experiments show that the method with the pooling indexes is superior to the current popular semantic segmentation algorithm in the accuracy of the edge contour; meanwhile, the actual distance of the baby in the real scene is intelligently sensed by using a stereoscopic vision algorithm, and the monitoring efficiency of the baby is improved.

Description

Infant monitoring method based on intelligent distance perception technology
Technical Field
The invention relates to the technical field of computer vision, in particular to the fields of stereoscopic vision and image segmentation, and particularly relates to an infant monitoring method based on an intelligent distance perception technology.
Background
With the opening of the "second-birth" policy, there is an increasing need for more intelligent techniques to alleviate the parents' strain while reducing the risks to the baby. Because the body organs of the infant are not completely developed, the self-ability of the infant is poor, the emotion cannot be expressed by words, and meanwhile, because parents cannot accompany the infant at any time, the infant is exposed to higher risk of accidents, so that the infant intelligent monitoring scheme based on the stereoscopic vision by utilizing the camera is provided. There is a need to enhance the monitoring of infants to reduce the risk of danger.
Infant monitoring in the current market is simply connected with devices such as a mobile phone and the like in a remote mode through a sensor or a camera. Parents observe the condition of the baby through the terminal equipment, and the temperature, the humidity and the air quality are monitored and detected through the sensor. Some monitoring through the non-contact of camera has the quilt that prevents that children kick off: the state of the quilt covered on the body of the child is judged by utilizing the infrared perspective mirror through the infrared energy difference generated when the child covers the quilt on the body and does not cover the quilt (or partially covers the quilt). The known infant monitoring software BabbyCam is used for detecting whether the face of an infant appears on an infant bed by using an image recognition technology. Parents can also observe the real-time state of the baby in real time through the mobile phone APP.
Prevent in the reality that the baby from turning over out the bed edge, at present mainly have the use camera, guardianship baby whether in bed, the thinking of main part is: images captured by the camera are periodically saved, and the difference between two adjacent images is compared. When the number of the noise points reaches a certain degree, alarm information is sent out. In another proposal, an infrared sensing device is arranged on the bed body. Early warning is given to the baby trying to turn out of the crib. However, the schemes for preventing the baby from turning out of the bed edge cannot play a role in prevention, have high cost and are not suitable for popularization. It is also proposed to segment the infant and the background by using the conventional image segmentation method, and then monitor the infant according to the spatial distance relationship between the segmented infant image and the crib.
In order to separate the target from the image region, the main methods of traditional image segmentation are edge-based segmentation, graph theory-based segmentation and clustering-based segmentation.
1) Edge-based segmentation. Based on the principle that two sides of the gray value of the contour pixel points of different object boundaries in the image have obvious changes, the contour of the object in the image is judged through whether the second derivative of the gray image crosses the zero point or not. Common edge detection differential operators include a Sobel operator and a Laplace operator, but the method cannot ensure the continuity of object edge segmentation or form a closed loop, and has a poor effect on small region segmentation in an image.
2) Segmentation based on graph theory. By using the graph theory method, the image is regarded as a non-directional graph with weight, the image is divided into different subsets, and the weight is continuously iterated by calculating the weight graph. The GraphCut algorithm is characterized in that the vertexes of the two called terminal vertexes are added on the basis, and the segmentation problem is converted into the problem of solving the minimum segmentation. There are also algorithms like GrabCut, etc., which use the grey level of the image and the region boundary information.
3) Clustering-based segmentation. Firstly, the pixels in the image are divided into superpixel blocks, and then region fusion is carried out. A Meanshift algorithm based on kernel density gradient estimation forms a superpixel block on an image by establishing a high-dimensional sphere and continuously iterating to a place with the maximum image density through a Meanshift vector. An SLIC (simple linear iterative clustering) algorithm is an application of a K-means algorithm in image segmentation, and is used for changing pixel points in an image into five-dimensional vectors and constructing a distance measurement standard through the distance of pixels in the image and the similar size of pixel colors. However, the clustering algorithm is sensitive to the selection of the initial point, the segmentation effect is affected, and the anti-interference performance is poor.
Disclosure of Invention
The invention provides an infant monitoring method based on an intelligent distance perception technology, and aims to optimize an image semantic segmentation method and improve the efficiency of an infant monitoring system by using a stereoscopic vision method.
In order to achieve the purpose of the invention, the infant monitoring method based on the intelligent distance perception technology provided by the invention comprises the following steps: the method comprises the following steps: framing an environment around the infant; step two: segmenting the baby in the acquired image; step three: judging the distance between the separated baby and the bed so as to judge whether the baby is in danger of falling; when the judgment distance is smaller than the preset threshold value, alarming; wherein the acquired image is segmented by a semantic segmentation method.
Furthermore, the acquired image is segmented by a semantic segmentation method based on a full convolution neural network encoder-decoder structure, a deep convolution neural network is improved, a convolution pooling part of a VGG-19 network is used as a feature extractor of the image, a pooling index structure is improved, and the infant image is segmented semantically.
Furthermore, the full convolution neural network is divided into two stages, namely an encoding stage and a decoding stage, the full connection layer in the network is removed by utilizing the framework of a deep convolution neural network VGG-19 model in the encoding stage, and the information stored for the maximum pooling position in the pooling process is reserved; the Encoder of each layer corresponds to a layer of Deconder; in the decoding stage, the maximum value information of the corresponding position is restored by utilizing the maximum pooling information stored in the encoding stage, the image is up-sampled in such a way, the resolution of the feature map is amplified to be the same as that of the original image, a softmax classifier is connected to the rear end of the decoder network to generate class probability for each pixel, a channel number probability map equal to the class number is formed, and the class corresponding to the maximum network output probability value is predicted.
Furthermore, in the up-sampling process, the weight of the corresponding index position in the feature map is set to be equal to the value to be up-sampled in a pooling index mode, and the weight is set to be half of the maximum feature value at other positions corresponding to the pooling core, so that the features around the maximum feature position are restored while the image is filtered out of the maximum feature position information through pooling.
Further, a binocular camera mode is used for ranging; wherein, the binocular camera is in the projection geometry of X-O-Y plane under the world space coordinate system, OR、OTIs the optical centers of two cameras, and the projection points of any point P in space on two image planes are P, P', XR、XTThe distance between the two cameras is the distance from the origin of the image plane, i.e. the length of the base line is b, Z is the distance from the point P to the base line, F is the distance from the optical center to the image plane, i.e. the focal length, and then the distances of delta Ppp' to delta PO are knownROTTherefore, the following relationship exists:
Figure BDA0002504858990000031
let D be XR-XTThe following can be obtained:
Figure BDA0002504858990000032
similarly, other coordinate information of the point P can be obtained:
Figure BDA0002504858990000041
Figure BDA0002504858990000042
wherein (x)R,yR) And (x)T,yT) The coordinates of the projection points of the point P in space on the phase plane of the binocular camera, and D the parallax (disparity) of the two images. The distance of the object from the camera can be calculated through the parallax; then calibrating the camera, wherein the calibration is mainly used for acquiring internal parameters, external parameters and distortion coefficients of the camera; the spatial three-dimensional information of the object is calculated by establishing a camera imaging geometric model through calibration, and the three-dimensional calibration is to calculate the geometric relationship between two cameras with different positions in space, obtain the corresponding relationship between rotation and translation of the cameras and finally obtain the three-dimensional spatial information of the object.
Furthermore, an MATLAB tool box is used for calibration, an internal reference matrix, radial distortion, tangential distortion, a rotation matrix and a translation vector of the camera are obtained through calculation, and in the calibration process, the selected calibration plates are uniformly distributed at all corners of the image, so that the radial distortion and the tangential distortion of the lens can be effectively calculated.
Further, the calculation of the distances of different objects in the image is performed by the following method:
firstly, obtaining specific coordinate values of a space object through a binocular camera:
Figure BDA0002504858990000043
wherein u isL、uRRepresenting pixel distances of the left camera and the right camera from the left edge of each image plane; f. ofxIs an internal reference value through calibration; d ═ uL-uR. b is the optical center distance of the two cameras; taking the optical center of the left camera as the origin of a world coordinate system, and u and v represent coordinates of points under a pixel coordinate system; u. ofo、voThe coordinate value of the central origin of the left camera image plane under the pixel coordinate system; converting the coordinate into homogeneous coordinates through a simultaneous equation set, and obtaining the coordinate in a matrix form:
Figure BDA0002504858990000044
wherein X is W Xw,Y=W*Yw,Z=W*ZwSince the numerical value of the rotation matrix T between the binocular cameras in the x direction represents the amount of translation of the camera 2 in the horizontal direction of the camera 1, b-Tx. Let u0、u0' as coordinate values of the horizontal component of the origin of the left and right camera image planes in the pixel coordinate system, since the difference between the two values is small, and therefore, is approximately 0, the following equation is obtained:
order to
Figure BDA0002504858990000051
Figure BDA0002504858990000052
Then it can be deduced that:
Figure BDA0002504858990000053
usually Q is called a re-projection matrix, and by means of a 4X 4 re-projection matrix Q, two-dimensional plane coordinates can be converted into spatial three-dimensional coordinates, the coordinates of the spatial point being (X)w=X/W,Yw=Y/W,ZwZ/W), the spatial position distance between different objects in the image can be determined by combining the stereoscopic vision with the image segmentation.
Furthermore, the expression characteristics of the baby are detected through the convolutional neural network model, so that the state of the baby is judged, and when crying occurs, the alarm unit is triggered to give an early warning in time.
Compared with the prior art, the method has the beneficial effects that the method with the pooling index is used for improving the upsampling part in the image segmentation algorithm, so that more characteristic information can be obtained in the upsampling process, and the accuracy of semantic segmentation on the boundary outline of the image is improved. Experiments show that the method with the pooling indexes is superior to the current popular semantic segmentation algorithm in the accuracy of the edge contour; meanwhile, the actual distance of the baby in the real scene is intelligently sensed by using a stereoscopic vision algorithm, and the monitoring efficiency of the baby is improved.
Drawings
FIG. 1 is a schematic diagram of an infant monitoring protocol arrangement according to the present application;
FIG. 2 is a general flow chart of the operation of the infant monitoring system set forth herein;
FIG. 3 is a network architecture employed by the present application;
FIG. 4 is an improved pooling process proposed herein;
FIG. 5 is a flow chart of the binocular camera stereo ranging of the present application;
FIG. 6 is a two-dimensional block diagram of the binocular range finding of the present application;
FIG. 7 is a schematic view of the epipolar constraint of the present application;
FIG. 8 is an original image captured by a binocular camera according to the present application;
FIG. 9 is a corrected image of the present application;
FIG. 10 is a comparison of an original image and a semantically segmented image according to the present application;
FIG. 11 is a line inspection of the baby contour convex hull and bed contour of the present application;
FIG. 12 is a graph of experimental data and error calculations for the present application.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1 to 11, the present application mounts a camera above the body of an infant lying in a bed, and uses the camera to view the environment around the infant. And judging whether the infant is in danger of falling or not by the distance between the infant and the bed in the image. The semantic segmentation method is adopted to distinguish the target object in the image pixel by pixel, and the specific position is not marked through a detection Box (Bounding Box) in the image detection, so that the segmentation of the pixel level can be more precise for the position relation between the target main body and other objects in the image. The distance between the baby pixel point and the bed is more accurate than the judgment distance of the detection frame in image detection. By using the method, the misjudgment of the actual distance can be reduced.
In order to design a critical condition for alarming, the actual height is measured by a camera, and the actual minimum distance is obtained by utilizing a similar triangle principle, so that whether the state of the baby is dangerous or not is judged. Meanwhile, the expression characteristics of the baby are detected through the convolutional neural network model, so that the state of the baby is judged. When crying occurs, early warning is timely carried out to prevent accidents.
The method is based on a full convolutional neural network (FCN) encoder-decoder structure, a deep convolutional neural network is improved, a convolutional pooling part of a VGG-19 network is used as a feature extractor of an image, a pooling index structure is improved, and semantic segmentation is carried out on the infant image. The minimum pixel distance of the image is obtained through image operation, the distance between the baby and the camera is further obtained through the stereo vision technology of the binocular camera, and finally the actual size of the minimum boundary distance is indirectly obtained.
The design of the monitoring system will be described in more detail below. First, a method of image segmentation is described. In the application, the images in a specific scene only have three categories, namely baby, bed and background. The size of the target object in the image is moderate relative to the whole image, the multi-scale problem of the target in the image does not need to be processed, and the aim is to enable the speed of the model processing segmentation task to be as fast as possible under the condition that certain fine granularity is kept. Therefore, the design utilizes a pre-training model of the VGG network as a framework, and the feature map of the encoding stage and the feature map of the up-sampling are subjected to feature fusion in the FCN model, so that the roughness of image segmentation is improved through a jump connection structure. However, in the part of the pooling layer in the convolution operation, the down-sampling causes the semantic information of the image to be continuously lost, and the obtained image segmentation result is rough.
The network adopted by the application is divided into two stages, namely an encoding (Encoder) stage and a decoding (Decode) stage. And in the coding stage, a full connection layer in the network is removed by utilizing the architecture of a deep convolutional neural network VGG-19 model, and information stored for the maximum pooling position in the pooling process is reserved. Encoder of each layer corresponds to a layer of Deconder. In the decoding stage, the maximum value information of the corresponding position is restored by utilizing the maximum pooling information stored in the encoding stage. In this way, the image is up-sampled, and the resolution of the feature map is enlarged to the same size as the original image. And a softmax classifier is connected to the rear end of the decoder network to generate class probability for each pixel, a channel number probability graph equal to the class number is formed, and the class corresponding to the maximum network output probability value is predicted. In the decoding stage, the image needs to be upsampled (upsampling) so that the image can be restored to the same size as the original image. There are generally three ways to upsample, one is to use, for example, bilinear (bilinear) interpolation on the image, and directly scale the image by using a difference algorithm. The other method is to perform up-sampling on the image by using a Transposed Convolution (Transposed Convolution) and restore the feature map after Convolution to the space with the size of the original image. The last method is called pooling (unpooling) operation, which keeps the information of the maximum value in the original feature map and realizes the magnification of the resolution by performing zero-filling operation on other positions. When the transposed convolution is used for up-sampling, the accurate position relation between pixels needs to be learned through calculation between matrixes, so that the parameters of the network become more, the performance of the network is greatly influenced, the space overhead of a system for storing low characteristic diagram information is increased, and the performance of a model is influenced. Another method is to restore the information of the pixel values in the corresponding upsampling process by storing the filtered characteristic position information in the pooling process, and setting the corresponding positions of the neighboring neighbors (Nearest Neighbor) to equal values. The method simply restores the position information of the original image, but because the adjacent positions are set to be the same value, the restored image cannot be effectively described and distinguished for the edge details of the image, and the restored image is not good in overall detail.
The method with the pooling index is adopted, upsampling is improved, and the image segmentation effect is improved. In the up-sampling process, the weight value of the corresponding index position in the feature map is set to be equal to the value to be up-sampled in a pooling index mode.
And setting the weight value to be half of the maximum characteristic value at other positions corresponding to the pooling cores. Therefore, the maximum characteristic position information of the image is filtered through pooling, and the characteristics around the maximum characteristic position are restored, so that more characteristic information can be obtained in the up-sampling process of the image, and the semantic segmentation is more accurate on the boundary outline of the image.
The ranging method employed in the present design is described next. Firstly, the type of a distance measuring camera is selected, and finally, the distance measuring is carried out by using a binocular camera-based mode according to various factors such as a distance measuring principle, precision and hardware cost.
Geometrical configuration of the projection of the binocular camera on the X-O-Y plane in the world space coordinate System, OROTIs the optical center of both cameras. The projection points of any point P in the space on the two image planes are P, P', XR、XTIs its distance from the origin of the image plane. The distance between the two cameras, i.e. the base length, is b. Z is the distance from point P to the baseline. F is the distance from the optical center to the image plane, i.e. the focal length. It can be seen that Δ Ppp' to Δ POROTTherefore, the following relationship exists:
Figure BDA0002504858990000091
let D be XR-XTThe following can be obtained:
Figure BDA0002504858990000092
similarly, other coordinate information of the point P can be obtained:
Figure BDA0002504858990000093
Figure BDA0002504858990000094
wherein (x)R,yR) And (x)T,yT) The coordinates of the projection points of the point P in space on the phase plane of the binocular camera, and D the parallax (disparity) of the two images. The distance of the object from the camera can be calculated by the parallax.
And then calibrating the camera, wherein the calibration is mainly used for acquiring internal parameters, external parameters and distortion coefficients of the camera. And calculating the spatial three-dimensional information of the object by calibrating and establishing a geometric model imaged by the camera. The three-dimensional calibration is to calculate the geometric relationship between two cameras with different positions in space to obtain the corresponding relationship between rotation and translation of the cameras, and finally obtain the three-dimensional space information of the object. The calibration method comprises the methods of manual calibration, OpenCV function based, MATLAB toolbox based and the like. MATLAB has higher accuracy and stronger robustness than other methods, and the design adopts the MATLAB toolbox to calibrate the camera. Parameters such as an internal reference Matrix (Intrinsic Matrix), Radial Distortion (Radial Distortion), Tangential Distortion (Tangential Distortion), Rotation Matrix (Rotation Matrices), Translation Vectors (Translation Vectors) and the like of the camera are calculated by using an MATLAB toolbox. In the calibration process, the selected calibration plates are uniformly distributed at each corner of the image, so that the radial distortion and the tangential distortion of the lens can be effectively calculated. And correcting the image by using the obtained parameters. Avoiding optical distortion of the image. In practice, images captured by the binocular camera are required to be aligned strictly in a horizontal position. However, because of mechanical errors, a perfectly aligned binocular camera does not exist in practice, and therefore, the two captured images are subjected to stereo correction. By studying the relationship of the two camera images, the three-dimensional image has a geometrical relationship on a two-dimensional image plane, which is called epipolar geometry.
Firstly, obtaining specific coordinate values of a space object through a binocular camera:
Figure BDA0002504858990000101
wherein u isL、uRRepresenting the pixel distance of the left and right cameras from the left edge of each image plane. f. ofxIs an internal reference value through calibration. d ═ uL-uR. b is the optical center distance of the two cameras. And taking the optical center of the left camera as the origin of a world coordinate system, and u and v represent coordinates of points under a pixel coordinate system. u. ofo、voAnd the coordinate values of the central origin of the left camera image plane under the pixel coordinate system. Converting the coordinate into homogeneous coordinates through a simultaneous equation set, and obtaining the coordinate in a matrix form:
Figure BDA0002504858990000102
wherein X is W Xw,Y=W*Yw,Z=W*Zw. Since the value of the rotation matrix T between the binocular cameras in the x direction represents the amount of translation of the camera 2 in the horizontal direction of the camera 1, b-Tx. Let u0、u0' is a coordinate value of the origin horizontal component of the left and right camera image planes in the pixel coordinate system, and is approximately 0 since the difference between the two values is small. The following equation is obtained:
order to
Figure BDA0002504858990000103
Figure BDA0002504858990000104
Then it can be deduced that:
Figure BDA0002504858990000111
usually Q is called a reprojection matrix (reprojection matrix). By means of a 4 x 4 reprojection matrix Q, we can convert two-dimensional planar coordinates into spatial three-dimensional coordinates. The coordinates of the space point are (X)w=X/W,Yw=Y/W,ZwZ/W). By combining stereo vision with image segmentation, the spatial position distance between different objects in the image can be determined.
In the invention, an intelligent distance perception infant monitoring system is designed based on the stereoscopic vision principle. The method for improving the upsampling part in the image segmentation algorithm by using the method with the pooling index is provided, so that more characteristic information can be obtained in the upsampling process, and the accuracy of semantic segmentation on the boundary outline of the image is improved. Experiments show that the method with the pooling indexes is superior to the current popular semantic segmentation algorithm in the accuracy of the edge contour; meanwhile, the actual distance of the baby in the real scene is intelligently sensed by using a stereoscopic vision algorithm, and the monitoring efficiency of the baby is improved.
The experimental environment in fig. 12 is briefly described below. The camera that the experiment was used is 1080p binocular camera module, and the focus of leaving the factory is 2.6MM, and the scope is 100. The sensor size is 1/3 "CMOS. The pixel size is 2.0um (H) x 2.0um (V). The adjustable range of the distance between the central points of the lenses is as follows: 25m-70 mm. The experimental equipment is provided with an Ubuntu18.04.3LTS operating system, and is provided with an Intel I5-9600K processor with the main frequency of 3.70GHz and a 16G RAM. The GPU model is GeForce RTX2070, 8G video memory. CUDA version 9.0.176, cuDNN version 7.3.1, tensoflow-gpu version 1.12.3. By combining the experimental data, the average error of the real minimum distance and the calculated value is 2.52%. The error in the experimental process is basically within the range of three millimeters, and the standard for judging the position information of the baby and the bed can be achieved. When the shortest distance is smaller than a certain size, early warning is sent out, and the monitoring efficiency of the baby is improved.
The technical means not described in detail in the present application are known techniques.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (8)

1. An infant monitoring method based on an intelligent distance perception technology is characterized by comprising the following steps:
the method comprises the following steps: framing an environment around the infant; step two: segmenting the baby in the acquired image; step three: judging the distance between the separated baby and the bed so as to judge whether the baby is in danger of falling; when the judgment distance is smaller than the preset threshold value, alarming; wherein the acquired image is segmented by a semantic segmentation method.
2. The infant monitoring method based on intelligent distance perception technology as claimed in claim 1, wherein the segmentation of the acquired image by the semantic segmentation method is based on a full convolution neural network encoder-decoder structure, and the semantic segmentation of the infant image is performed by improving a deep convolution neural network, using a convolution pooling part of a VGG-19 network as a feature extractor of the image, and improving a pooling index structure.
3. The infant monitoring method based on the intelligent distance perception technology according to claim 2, characterized in that the full convolutional neural network is divided into two stages, namely an encoding stage and a decoding stage, the encoding stage utilizes a deep convolutional neural network VGG-19 model architecture to remove a full connection layer in the network, and information stored for a maximum pooling position in a pooling process is retained; the Encoder of each layer corresponds to a layer of Deconder; in the decoding stage, the maximum value information of the corresponding position is restored by utilizing the maximum pooling information stored in the encoding stage, the image is up-sampled in such a way, the resolution of the feature map is amplified to be the same as that of the original image, a softmax classifier is connected to the rear end of the decoder network to generate class probability for each pixel, a channel number probability map equal to the class number is formed, and the class corresponding to the maximum network output probability value is predicted.
4. The infant monitoring method based on the intelligent distance perception technology according to claim 3, wherein in the upsampling process, the weight of the corresponding index position in the feature map is set to a value equal to the value to be upsampled in a pooling index mode, and the weight is set to half of the maximum feature value at other positions corresponding to the pooling kernel, so that the features around the maximum feature position are restored while the image is filtered out of the maximum feature position information through pooling.
5. The infant monitoring method based on the intelligent distance perception technology according to claim 3, characterized in that distance measurement is performed by means of a binocular camera; wherein, the binocular camera is in the projection geometry of X-O-Y plane under the world space coordinate system, OR、OTIs the optical centers of two cameras, and the projection points of any point P in space on two image planes are P, P', XR、XTThe distance between two cameras is the distance from the two cameras to the origin of the image plane, namely the length of the base line is b, Z is the distance from the point P to the base line, F is the distance from the optical center to the image plane, namely the focal length, and the delta P is knownpp′~ΔPOROTTherefore, the following relationship exists:
Figure RE-FDA0002715254170000021
let D be XR-XTThe following can be obtained:
Figure RE-FDA0002715254170000022
similarly, other coordinate information of the point P can be obtained:
Figure RE-FDA0002715254170000023
Figure RE-FDA0002715254170000024
wherein (x)R,yR) And (x)T,yT) The coordinates of the projection points of the point P in space on the phase plane of the binocular camera, and D the parallax (disparity) of the two images. The distance of the object from the camera can be calculated through the parallax; then calibrating the camera, wherein the calibration is mainly used for acquiring internal parameters, external parameters and distortion coefficients of the camera; the spatial three-dimensional information of the object is calculated by establishing a camera imaging geometric model through calibration, and the three-dimensional calibration is to calculate the geometric relationship between two cameras with different positions in space, obtain the corresponding relationship between rotation and translation of the cameras and finally obtain the three-dimensional spatial information of the object.
6. The infant monitoring method based on the intelligent distance perception technology according to claim 5, characterized in that an MATLAB toolbox is used for calibration, an internal reference matrix, radial distortion, tangential distortion, a rotation matrix and a translation vector of a camera are obtained through calculation, and in the calibration process, the selected calibration plates are uniformly distributed at each corner of an image, so that the radial distortion and the tangential distortion of a lens can be effectively calculated.
7. The infant monitoring method based on intelligent distance perception technology as claimed in claim 5, wherein the calculation of the distance between different objects in the image is performed by the following method:
firstly, obtaining specific coordinate values of a space object through a binocular camera:
Figure RE-FDA0002715254170000031
wherein u isL、uRRepresenting pixel distances of the left camera and the right camera from the left edge of each image plane; f. ofxIs an internal reference value through calibration; d ═ uL-uR. b is the optical center distance of the two cameras; taking the optical center of the left camera as the origin of a world coordinate system, and u and v represent coordinates of points under a pixel coordinate system; u. ofo、voThe coordinate value of the central origin of the left camera image plane under the pixel coordinate system; converting the coordinate into homogeneous coordinates through a simultaneous equation set, and obtaining the coordinate in a matrix form:
Figure RE-FDA0002715254170000032
wherein X is W Xw,Y=W*Yw,Z=W*ZwSince the numerical value of the rotation matrix T between the binocular cameras in the x direction represents the amount of translation of the camera 2 in the horizontal direction of the camera 1, b-Tx. Let u0、u0Is the horizontal component of the origin of the left and right camera image planesThe coordinate values in the pixel coordinate system are approximately 0 because the difference between the two values is small, so that the following equation is obtained:
Figure RE-FDA0002715254170000033
order to
Figure RE-FDA0002715254170000034
Then it can be deduced that:
Figure RE-FDA0002715254170000035
usually Q is called a re-projection matrix, and by means of a 4X 4 re-projection matrix Q, two-dimensional plane coordinates can be converted into spatial three-dimensional coordinates, the coordinates of the spatial point being (X)w=X/W,Yw=Y/W,ZwZ/W), the spatial position distance between different objects in the image can be determined by combining the stereoscopic vision with the image segmentation.
8. The infant monitoring method based on the intelligent distance perception technology as claimed in claim 1, wherein the expression characteristics of the infant are detected through a convolutional neural network model to judge the state of the infant, and when crying and screaming occur, an alarm unit is triggered to give an early warning in time.
CN202010443078.6A 2020-05-22 2020-05-22 Infant monitoring method based on intelligent distance perception technology Pending CN111931550A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010443078.6A CN111931550A (en) 2020-05-22 2020-05-22 Infant monitoring method based on intelligent distance perception technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010443078.6A CN111931550A (en) 2020-05-22 2020-05-22 Infant monitoring method based on intelligent distance perception technology

Publications (1)

Publication Number Publication Date
CN111931550A true CN111931550A (en) 2020-11-13

Family

ID=73317273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010443078.6A Pending CN111931550A (en) 2020-05-22 2020-05-22 Infant monitoring method based on intelligent distance perception technology

Country Status (1)

Country Link
CN (1) CN111931550A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034613A (en) * 2021-03-25 2021-06-25 中国银联股份有限公司 External parameter calibration method of camera and related device
CN113538350A (en) * 2021-06-29 2021-10-22 河北深保投资发展有限公司 Method for identifying depth of foundation pit based on multiple cameras
CN114209509A (en) * 2021-12-30 2022-03-22 常德职业技术学院 Timely response system suitable for mother and infant nurses

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354975A (en) * 2015-08-26 2016-02-24 马晓阳 Infanette monitoring and alarming system capable of real time communication with mobile terminal
CN108009629A (en) * 2017-11-20 2018-05-08 天津大学 A kind of station symbol dividing method based on full convolution station symbol segmentation network
CN108682112A (en) * 2018-05-15 2018-10-19 京东方科技集团股份有限公司 A kind of infant monitoring device, terminal, system, method and storage medium
CN110049471A (en) * 2018-01-16 2019-07-23 南京邮电大学 A kind of long-range infant monitoring system based on wireless sense network
CN110334678A (en) * 2019-07-12 2019-10-15 哈尔滨理工大学 A kind of pedestrian detection method of view-based access control model fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354975A (en) * 2015-08-26 2016-02-24 马晓阳 Infanette monitoring and alarming system capable of real time communication with mobile terminal
CN108009629A (en) * 2017-11-20 2018-05-08 天津大学 A kind of station symbol dividing method based on full convolution station symbol segmentation network
CN110049471A (en) * 2018-01-16 2019-07-23 南京邮电大学 A kind of long-range infant monitoring system based on wireless sense network
CN108682112A (en) * 2018-05-15 2018-10-19 京东方科技集团股份有限公司 A kind of infant monitoring device, terminal, system, method and storage medium
CN110334678A (en) * 2019-07-12 2019-10-15 哈尔滨理工大学 A kind of pedestrian detection method of view-based access control model fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
大奥特曼打小怪兽: ""双目视觉之空间坐标计算"", 《博客园WWW.CNBLOGS.COM/ZYLY/P/9373991.HTML》 *
路文平: ""人脸检测和识别技术及其在婴幼儿食品监控中的应用"", 《中国优秀硕士学位论文全文数据库》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034613A (en) * 2021-03-25 2021-06-25 中国银联股份有限公司 External parameter calibration method of camera and related device
CN113034613B (en) * 2021-03-25 2023-09-19 中国银联股份有限公司 External parameter calibration method and related device for camera
CN113538350A (en) * 2021-06-29 2021-10-22 河北深保投资发展有限公司 Method for identifying depth of foundation pit based on multiple cameras
CN113538350B (en) * 2021-06-29 2022-10-04 河北深保投资发展有限公司 Method for identifying depth of foundation pit based on multiple cameras
CN114209509A (en) * 2021-12-30 2022-03-22 常德职业技术学院 Timely response system suitable for mother and infant nurses
CN114209509B (en) * 2021-12-30 2023-04-14 常德职业技术学院 Timely response system suitable for mother and infant nurses

Similar Documents

Publication Publication Date Title
CN111931550A (en) Infant monitoring method based on intelligent distance perception technology
CN110799991B (en) Method and system for performing simultaneous localization and mapping using convolution image transformations
WO2020035002A1 (en) Methods and devices for acquiring 3d face, and computer readable storage media
CN111462172B (en) Three-dimensional panoramic image self-adaptive generation method based on driving scene estimation
CN107274483A (en) A kind of object dimensional model building method
US8917317B1 (en) System and method for camera calibration
CN102714695A (en) Image processing device, image processing method and program
WO2007139067A1 (en) Image high-resolution upgrading device, image high-resolution upgrading method, image high-resolution upgrading program and image high-resolution upgrading system
WO2012028866A1 (en) Physical three-dimensional model generation apparatus
CN111160232B (en) Front face reconstruction method, device and system
JP2015100066A (en) Imaging apparatus, control method of the same, and program
CN111480164A (en) Head pose and distraction estimation
CN107809610B (en) Camera parameter set calculation device, camera parameter set calculation method, and recording medium
CN112083403B (en) Positioning tracking error correction method and system for virtual scene
JP6192507B2 (en) Image processing apparatus, control method thereof, control program, and imaging apparatus
JP5901447B2 (en) Image processing apparatus, imaging apparatus including the same, image processing method, and image processing program
KR20150074544A (en) Method of tracking vehicle
JPWO2019107180A1 (en) Encoding device, coding method, decoding device, and decoding method
CN114998328A (en) Workpiece spraying defect detection method and system based on machine vision and readable storage medium
JP6803570B2 (en) Camera parameter set calculation device, camera parameter set calculation method, and program
Buquet et al. Evaluating the impact of wide-angle lens distortion on learning-based depth estimation
CN117058183A (en) Image processing method and device based on double cameras, electronic equipment and storage medium
CN113647093A (en) Image processing device, 3D model generation method, and program
Wang et al. RGB-guided depth map recovery by two-stage coarse-to-fine dense CRF models
AU2020294259A1 (en) Object association method, apparatus and system, electronic device, storage medium and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201113