CN101795400B

CN101795400B - Method for actively tracking and monitoring infants and realization system thereof

Info

Publication number: CN101795400B
Application number: CN 201010125536
Authority: CN
Inventors: 王绍宇; 杨松绍; 廖小勇; 罗友军; 张茵; 盛秀梅
Original assignee: SHANGHAI FUKONG HUALONG MICROSYSTEM TECHNOLOGY Co Ltd
Current assignee: SHANGHAI FUKONG HUALONG MICROSYSTEM TECHNOLOGY Co Ltd
Priority date: 2010-03-16
Filing date: 2010-03-16
Publication date: 2013-03-27
Anticipated expiration: 2030-03-16
Also published as: CN101795400A

Abstract

The invention discloses a method for actively tracking and monitoring infants, which firstly detects movement regions; then extracts smaller candidate regions including human face and hand or the like, and then detecting human faces. The method matches the detected faces, extracts the contour of the human face to be detected, and calculates the distance d between the middle point of the central connecting line of eyes of the contour and the geometric center of the chin and the area s of the surrounding region of the nose contour. A linear resolution analysis method is utilized to find out the best projection direction w capable of distinguishing between infant faces and non-infant faces. The distance d and the area s are projected in the direction w, and according to the projected points the baby value which expresses the human faces to be detected close to the infant characteristic is obtained. A middle point of two individual central connecting lines is taken as a threshold value K which distinguishes non-infant face and infant face. Whether the target to be detected is an infant or not is judged through comparing the baby value with the threshold value, if so, then taking the target as a track object. The invention also provides an embedded realization system for actively tracking infants.

Description

A kind of infant initiatively follows the tracks of the method for monitoring and realizes system

Technical field:

The present invention relates to the Image Information Processing field, method and embedded hardware thereof that particularly a kind of family expenses infant initiatively follows the tracks of monitoring are realized system.

Background technology:

The intelligent video treatment technology is constantly expanded in the application in monitoring field at present, the existing video monitoring system for infant in domestic environment, be all generally to use camera to be connected to network by PC, or be directly connected to network in the mode of web camera independently.These equipment generally all have the measuring abilities such as sound detection.Although have with The Cloud Terrace, can at client (as browser) hand-guided The Cloud Terrace, to the infant in movement, rotate by preset point in advance or user.But existing this type of family expenses monitor system has following shortcoming in practice: the user can not be supervised the infant in client for a long time, and camera can only be guarded the infant in the place of fixed viewpoint, therefore can not automatically to the infant in the movement in family, carry out the real-time tracking shooting, can't reach intelligence nurse and warning purpose.

Existing other supervisory control system of technical grade, for example open source literature number mainly utilizes a main camera to obtain global scene information for the Chinese invention patent of CN1658570A (theming as " the intelligent-tracking supervisory control system with multiple-camera "), carry out target detection and track algorithm, then utilize the information-driven that obtains target to rotate from video camera the tracking and the details that realize target and show.But under the domestic. applications background, if carry out the monitoring to the infant with principal and subordinate's video camera, two video cameras and a The Cloud Terrace need to be installed at home, this can cause the difficulty that monitoring is deployed troops on garrison duty to strengthen, while also greatly increased cost, uneconomical also inapplicable.

And the existing supervisory control system towards domestic. applications, as open source literature, number for the utility model patent of CN201262766Y (theming as " Multifunction intelligence household monitoring system ") utilizes the infrared sensor of the human body perception, whether someone enters, when having the people to enter, intelligent early-warning final drive camera is recorded a video.But its camera does not load The Cloud Terrace and causes it can not multi-angle rotation, nor the function that possesses automatically in scene identification and choose the infant and it is followed the tracks of.

Summary of the invention:

In view of above-mentioned technical problem; for making the user by not possessing the infant's of sense of self-protection safe condition in supervisory control system real-time tracking family; accomplish timely strick precaution and monitoring to it, primary and foremost purpose of the present invention is to provide a kind of monitoring method that utilizes supervisory control system to realize initiatively following the tracks of the infant.This inventive method can be applicable in home environment, the active that does not possess sense of self-protection infant target be followed the tracks of and monitoring; improve specific aim and the automatic capability of existing home videos monitor system, help father and mother to accomplish timely strick precaution and the safe condition monitoring to infant's fortuitous event.

The technical scheme of foregoing invention method is: after analog video signal collection A/D conversion, this vision signal is carried out to algorithm process: by continuous three frame frame difference methods, extract moving region, adopt again the Face Detection model filter to fall the non-face zone in moving region, and in the candidate region dwindled, use the AdaBoost algorithm to carry out the detection of people's face; To the people's face detected, use active shape model to be mated, extract the profile of people's face to be detected, and obtain the area s apart from d and nose profile enclosing region of this profile the inside eye center line intermediate point to chin profile geometric center, and the best projection direction w that utilizes linear resolved analysis method to find out to distinguish infant's face and non-infant's face.D and s are carried out to projection on the w direction, and according to the some point x=[ds after projection] ^T, by w, the two-dimensional space of being opened with the s direction by d can be converted into the feature space of one dimension: babyvalue=w ^TX, the babyvalue value means the degree of people's face to be identified close to infant's feature, the threshold value of distinguishing non-infant's face and infant's face is taken as the mid point of the individual line of centres of two classes, obtain threshold value k, judge whether infant of target to be detected by the relation of babyvalue and threshold value k relatively, if target is the infant, using it as tracing object; After above-mentioned algorithm process finishes, recycling pid algorithm and PELCO D agreement, control The Cloud Terrace and rotate up and down, thereby automatically realize the active real-time tracking monitoring to the infant, greatly strengthens initiative and the practicality of existing household video monitoring system.

In addition, according to said method, the invention allows for a kind of embedded system for implementing hardware of initiatively following the tracks of the infant that uses above method.This system has realized automatic detection, location and the tracking to infant in home environment by modules such as video acquisition, AD conversion, FVID interface and cradle head control.The concrete structure of this system is as follows, comprising:

One is used for video input signals is comprised to three frame difference methods, skin color segmentation, the detection of people's face, statistical shape model coupling, the identification of infant's target and PID control the master control processing unit of these algorithm process, thereby this processing unit is adjusted the monitoring visual angle according to the result after algorithm process to the instruction of The Cloud Terrace sending action, the vision signal with indicating monitored object mark after simultaneously processing shows in real time or exports by output interface by display unit;

One is connected with the AD modular converter for the CMOS photographing module of the analog video signal of taking and generate family's scene;

One is converted into by the analog video signal of CMOS photographing module the AD modular converter that digital signal sends the master control processing unit to simultaneously;

One code when connecting the operation of master control processing unit storage system and the random access memory module of interim video data;

One by connecting the read-only memory module of master control processing unit saved system start-up code and system-program code;

One is mounted with the CMOS photographing module, and the action command sent according to the master control processing unit is regulated the shooting view and the The Cloud Terrace of real-time tracking is carried out at visual angle to target object.

In such scheme, described master control processing unit (DM6437) comprises video interface module (CCDC controller), EMIF interface module, algorithm processing module, Online Video display module (OSD), video encoding module (VENC) and at least more than one DAC interface; Described video interface module is connected with the AD modular converter, receives by the digital video signal after the AD module processing; Described algorithm processing module is according to the digital video signal received, extract moving region by continuous three frame frame difference methods, adopt again the Face Detection model filter to fall the non-face zone in moving region, and in the candidate region dwindled, use the AdaBoost algorithm to carry out the detection of people's face; To the people's face detected, use active shape model to be mated, extract the profile of people's face to be detected, and obtain the area s apart from d and nose profile enclosing region of this profile the inside eye center line intermediate point to chin profile geometric center, and the best projection direction w that utilizes linear resolved analysis method to find out to distinguish infant's face and non-infant's face.D and s are carried out to projection on the w direction, and according to the some point x=[ds after projection] ^T, by w, the two-dimensional space of being opened with the s direction by d can be converted into the feature space of one dimension: babyvalue=w ^TX, the babyvalue value means the degree of people's face to be identified close to infant's feature, the threshold value of distinguishing non-infant's face and infant's face is taken as the mid point of the individual line of centres of two classes, obtain threshold value k, judge whether infant of target to be detected by the relation of babyvalue and threshold value k relatively, if target is the infant, using it as tracing object; Then algorithm processing module is regulated camera angle by the UART interface module to the instruction of The Cloud Terrace sending action according to the judgement structure, thereby realizes under family's monitoring scene automatic detection and location to infant's target; And the digital video signal that will process sends the real-time demonstration that the Online Video display module carries out video image to simultaneously, or via after the video compression coding of video encoding module by this digital video signal output.

In such scheme, described system also is provided with a multimedia coding module that is used for realizing audio/video coding, this multimedia coding module is by connecting I2C bus and ethernet transceiver, and the digital video signal that will be transmitted by the AD modular converter is sent to strange land by the Internet; The digital video signal of perhaps algorithm processing module being processed is sent to strange land by the Internet; Described multimedia coding module can adopt multimedia association to process the C627 chip.

In such scheme, described CMOS photographing module specifically adopts the CMOS130 camera.Described random access memory module is connected with the EMIF interface of master control processing unit, the 128M SDRAM of employing.Described read-only memory module is connected with the EMIF interface of master control processing unit, adopts the 64M Flash chip of NOR type.

In such scheme, described system also is provided with a light-emitting diode display, and this display is connected with the DAC interface of master control processing unit, adopts JV-M50D model display.Described system adopts the commutator transformer power supply of a 5V, and the voltage device of this 5V produces three kinds of voltages to different equipment power supplies, and wherein: 1.2V is to the DSP kernel, and 3.3V is to the I/O port of DSP, and 1.8V is to internal memory.

In such scheme, described The Cloud Terrace adopts ST-SP8206PS model The Cloud Terrace, by the RS485 interface, with the UART interface module of algorithm processing module, is connected.Described cradle head control instruction sends to The Cloud Terrace according to PELCO D agreement, comes direction and the position of active accommodation camera by the rotation of controlling The Cloud Terrace.

In such scheme, described UART interface module adopts the SC16C550 chip, and the receiver that it is inner and transmitter respectively have the FIFO of 16B, can process the serial signal up to 3Mbps speed.

In such scheme, described system is obtained video data by the preview engine from the CMOS photographing module, and is translated into the YUV422 form; The DAC interface of 4 54M is provided by the VENC encoder, NTSC, PAL, S-Video and the output of YPbPr video are provided, also provide the digital video of 24 to output to rgb interface simultaneously.

The main running of foregoing invention system is: at first obtain analogue video signal, recycling AD conversion chip is translated into YUV signal, vision signal is input to 6437 chips and carries out the intelligent algorithm processing, automatically to choose infant's target of tracking, recycling pid algorithm and PELCO D agreement, controlling cloud platform rotation makes camera aim at all the time the infant's target in moving, thereby realize the active real-time tracking monitoring to the infant, greatly strengthen initiative and the practicality of existing household video monitoring system.

Comprehensive, the inventive method of above-mentioned proposition is compared with existing correlation technique with the system that realizes, has following apparent outstanding character characteristics and remarkable advantage:

1, overcome the shortcoming that the existing monitor system towards family can not automatic distinguishing infant target.

2, the existing supervisory control system for family expenses is all also simple web camera, also have following shortcoming even loaded the web camera of The Cloud Terrace: the user must long-rangely manually control cloud platform rotation by the web mode, there is no Intelligent Target detection and tracking function.

3, cmos sensor is placed in separately on The Cloud Terrace, and video frequency processing chip and peripheral hardware are independent of The Cloud Terrace and cmos sensor, to reduce the The Cloud Terrace load, improves The Cloud Terrace reaction speed and useful life.

4, tripod head equipment is connected with the UART interface of video frequency processing chip by the RS485 interface, can realize movement (above, upper right, the right side of 8 directions of cmos sensor by the PELCO-D agreement, ，Xia， lower-left, bottom right, a left side, upper left), realize the track up to the infant.

5, control for convenience the cloud platform rotation direction, the The Cloud Terrace position is set as 9 zones that are positioned at even size, and algorithm only needs to judge that current goal drops on that zone, then according to sequence number, carries out the movement (zone line does not move) of corresponding 8 directions.

The present invention has overcome traditional domestic monitoring system needs two cameras to carry out the shortcoming of initiatively following the tracks of to target, uses a camera to realize detection and active tracking method and the system to infant's target in conjunction with The Cloud Terrace and three frame difference methods etc.

The accompanying drawing explanation:

Further illustrate the present invention below in conjunction with the drawings and specific embodiments.

Fig. 1 is that the described infant of the inventive method initiatively follows the tracks of the monitoring method schematic diagram.

Fig. 2 behave face profile index point and feature s and d schematic diagram.

Fig. 3 (a)-Fig. 3 (d) is for training the facial contour sample schematic diagram of statistical shape model.

Fig. 4 (a) is the schematic diagram to 8 unjustified people's face shapes.

Fig. 4 (b) is the schematic diagram of the later people's face shape of alignment.

Fig. 5 is system configuration schematic diagram of the present invention.

Fig. 6 is three frame difference method flow charts.

Fig. 7 is video subregion schematic diagram.

Fig. 8 is The Cloud Terrace PID control flow chart.

Embodiment:

For technological means, creation characteristic that the present invention is realized, reach purpose and effect is easy to understand, below in conjunction with concrete diagram, further set forth the present invention.

(i) the family expenses infant initiatively follows the tracks of the method for monitoring;

Referring to Fig. 1, be below concrete steps:

(1) video acquisition process: system is from the analog video signal of camera collection PAL or TSC-system formula.

(2) A/D transfer process: the digital video letter that is converted to the YUV422 form by the TVP5146 chip, vision signal is stored in SDRAM, by the FVID interface, the data that collect are conducted interviews, and it is sent in the DM6437 chip and carries out concrete algorithm process.The primary signal resolution of system acquisition is 640 * 480, for improving efficiency of algorithm, is translated into 320 * 240 resolution.

(3) moving target detects: for adapting to the impact of initiatively following the tracks of cloud platform rotation, use three frame frame difference methods to detect moving region.Respectively adjacent three two field pictures are carried out to the Gaussian smoothing denoising, then do continuous frame difference method, 2 frame difference image results obtaining are carried out to threshold process, obtain bianry image.Respectively two poor bianry images of frame are first corroded and expansion process, then the two frame bianry images that obtained by top Morphology Algorithm are carried out and computing.The image obtained with computing is carried out to the profile extraction, and the zone that profile surrounds is the result that moving target detects.

(4), for dwindling processing region, on the basis of three frame frame difference method results, extract the less object candidate area at positions such as comprising people's face and hand according to complexion model.After original image being carried out to YUV → RGB → YCrCb complexion model transformation, can obtain an image based on the YCrCb color model, choose suitable Cr and Cb threshold value, being divided into area of skin color and non-area of skin color on image.

(5) adopt AdaBoost people's face detection algorithm to carry out the detection of people's face in area of skin color, thereby reduce the region of search of people's face detection algorithm, improve speed and the accuracy of algorithm, more be conducive to the real-time processing of embedded hardware.When Adaboost detects for people's face, extracting a large amount of one dimension simple feature from people's face. these simple feature have certain people's face and non-face differentiation. and final system is used thousands of one dimension simple classification devices, combine and reach good classifying quality, and speed also can meet the requirement of real-time.

(6) according to people's face testing result, by active shape model, people's face is mated, extract corresponding profile information, 58 index points of each contour correspondence, form one 116 vector of tieing up.

For setting up active shape model, need to carry out the index point demarcation to the facial image in training sample, as shown in Figure 2, index point is 58.By index point, definite profile is totally 7 sections, comprising: facial contour (13 index points), left eye profile (8 index points), right eye profile (8 index points), left eyebrow profile (5 index points), right eyebrow profile (5 index points), face outline (8 index points) and nose profile (11 index points).The index point choosing method is: at first, consider the key point that with the naked eye can directly offer an explanation out, as canthus and the corners of the mouth; Secondly, to be uniformly distributed other index point between these key points as far as possible; Finally, the distribution density of index point will be considered follow-up embedded hardware enforcement demand, and density is crossed conference increases workload and the reduction arithmetic speed that index point is demarcated, and the too small meeting of density can't reach desirable effect.

The facial image X demarcated for a width _i, can carry out meaning of shape to it with x and the y coordinate position of 58 index points:

X _i＝[x _i1，y _i1，x _i2，y _i2，…，x _i58，y _i58] ^T

Wherein, facial contour is partly [x _I1, y _I1..., x _I13, y _I13] ^TThe left eye profile is [x _I14, y _I14..., x _I21, y _I21] ^TThe right eye profile is [x _I22, y _I22..., x _I29, y _I29] ^TLeft eyebrow profile is [x _I30, y _I30..., x _I34, y _I34] ^TRight eyebrow profile is [x _I35, y _I35..., x _I39, y _I39] ^TThe face profile is [x _I40, y _I40..., x _I47, y _I47] ^TThe nose profile is [x _I48, y _I48..., x _I58, y _I58] ^T.The N width just can be used training set { X for the shape of the uncalibrated image of training like this _i: i=1 ... N} means.People's face shape training sample that Fig. 3 (a)-Fig. 3 (d) is part.

For people's face shape of having demarcated, owing to there being 3 differences: 1) the absolute position difference of its facial image of living in; 2) difference of facial image size; 3) difference of direction.Therefore, directly training shapes is set up to the rule that statistical shape model can not truly reflect the conversion of people's face shape, training shapes need to be carried out on approximate meaning to shape alignment (shape alignment).Everyone face shape is selected to suitable translation, convergent-divergent and rotation, allow training shapes be arranged in same comparable cartesian coordinate system, make shape and the difference between average shape minimum (under the least square meaning) after alignment, wherein average shape can obtain from whole training sample centralized calculation.

The process of shape alignment is: at first, choose a comparatively desirable shape as initial sample, make all alignment with it of other all shapes; Then, the average shape after the alignment calculated is carried out to the normalization processing, and using the average shape after normalization as initial sample; The shape of finally, other having been alignd is before this alignd with new initial sample.Repeat this alignment procedure, until the average shape difference of adjacent twice is less than a threshold value.

In the alignment procedure of people's face shape, for the situation that shape increases gradually or reduces appears in the average shape that prevents each circulation in iterative process, all need that in the process of iteration each time average shape is done to normalization and process.Concrete grammar is: keeping the distance between certain 2 in uncalibrated image is a certain constant, all makes average shape be positioned at a certain fixing angle at every turn, and the translation average shape arrives a certain fixing position.

The result of Fig. 4 (a) for 8 unjustified people's face shape Fig. 4 (b) are alignd.As seen from the figure, through form fit, the distribution of the index point that each is corresponding is more concentrated and is reasonable.After shape sample in training set is all alignd, just can set up on this basis the statistical shape model of people's face.

After concentrated people's face shape aligns at training sample, can obtain the shape of N alignment

Everyone face shape is provided by x and the y coordinate of 58 index points, and the average shape of these training samples is

Its covariance matrix C is:

People's face shape of alignment is 116 dimensions (2 * 58).If these shapes are plotted in the space of 116 dimensions, their variations on some direction will be greater than other directions, and these directions are not necessarily consistent with original reference axis certainly.These directions be on earth which and between them relative significance level can obtain by the Orthogonal Decomposition to covariance matrix, i.e. solving equation:

CP _i＝λ _iP _i

Can obtain characteristic value (λ by solving ₁, λ ₂..., λ ₁₁₆) with and characteristic of correspondence vector (P ₁, P ₂... P ₁₁₆).Due to the direction of the characteristic vector relevant with larger characteristic value corresponding to the larger variance of training data, comprised more shape change information.Therefore can use corresponding to the characteristic vector of larger characteristic value and carry out approximate representation people's face shape vector arbitrarily.Choosing front k maximum characteristic value meets:

\frac{Σ_{i = 1}^{k} λ_{i}}{Σ_{i = 1}^{116} λ_{i}} < 98 %

Therefore can obtain front k maximum characteristic value characteristic of correspondence vector P=(P ₁, P ₂..., P _k), what any one people's face shape X can be similar to is expressed as:

X = \overset{&OverBar;}{X} + Pb

Wherein, the form parameter that b is people's face shape vector.It can be obtained by top formula:

b = P^{T} (X - \overset{&OverBar;}{X})

The difference of b, represented that the difference of people's face shape changes.Therefore people's face shape data set of a two dimension just can be similar to the statistical shape model that only only has a parameter b, and the value that changes within the specific limits b can generate rational new person's face shape sample.

In active shape model, near the image local feature each index point of local gray level model representation.Generally adopt the differential of the gradation of image value of perpendicular shape line to mean, and carry out normalization according to integral value, can remove to a certain extent like this impact of illumination variation.

In each human face region detected at the AdaBoost algorithm, the original shape model trained is placed on to this regional center, the result that the yardstick of original shape, direction and displacement parameter can detect according to AdaBoost people's face is estimated.When given, after the initial value of people's face shape model, people's face active shape model can utilize the gray feature of profile to carry out iteration.In each step iteration, change position and the shape of "current" model by adjusting relevant parameter, and then produce new model instance, finally complete the coupling of model and test pattern profile.Iterative algorithm is: at first, in the search facial image, the point mated most with this gray-scale statistical model is found in the zone of each index point in zone, obtains new people's face shape.Secondly, according to the displacement of index point, the attitude parameter in calculating people face statistical shape model and the variation of form parameter.Finally, upgrade attitude parameter and form parameter, get back to algorithm and start place, until restrain or reach the upper limit threshold of iterations.

(7) utilize the area s apart from d and nose profile enclosing region of the eye center line mid point of the profile of obtaining to the facial contour geometric center, input using these two features as linear resolved analysis, find out the best projection direction w that can distinguish infant's face and non-infant's face.To mean that two feature d of target to be detected and the point of s carry out projection on the w direction, and according to this x=[ds] ^T, by w, the two-dimensional space of being opened with the s direction by d can be converted into the feature space of one dimension: babyvalue=w ^TX, the babyvalue value means the degree of people's face to be identified close to infant's feature, the threshold value of distinguishing non-infant's face and infant's face is taken as the mid point of the individual lines of centres of two classes, obtains threshold value k, by the relation of babyvalue and threshold value k relatively, judges whether infant of target to be detected.

Generally, the organ of infant face, mainly that eyes and eyebrow position are relatively low, open wiring is near face's central region, and that nose is grown up usually is little and short, according to above analysis, this aspect utilizes statistical shape model to obtain the area s apart from d and nose profile institute enclosing region that two eyes are wired to the geometric center point of chin profile, and these two features are analyzed.As shown in Figure 2.

For feature d, d is less, means that the eyes line is shorter apart from the distance of chin, and namely open wiring is lower, and people's face is more close to infant's face.Character pair s, nose is more short less, and the area s that will cause the nose profile to surround is less, and people's face more approaches infant's face.Therefore can utilize these two features of people's face to realize the detection to infant's target in conjunction with linear resolved analysis.

The main thought of linear resolved analysis is: through a Linear Mapping, sample data is mapped to feature space, makes between same quasi-mode sample at a distance of nearer apart from each other between the different mode sample.In feature space, similar sample is intensive as much as possible, and inhomogeneous sample separates as much as possible, wishes: in total class, dispersion is less, and between class, dispersion is the bigger the better.

(8) target following process: mainly comprise according to PELCO D agreement and PID control method The Cloud Terrace is carried out to the rotation of different directions so that infant's target all is positioned at the middle section of video pictures all the time.

(ii) the embedded hardware that the family expenses infant initiatively follows the tracks of monitoring is realized system;

As shown in Figure 5, mainly comprise DM6437 chip, analog cmos camera, AD conversion chip, the 128M size of TI SDRAM, 64M size Flash ROM, UART chip, multimedia coding chip C627, light-emitting diode display and The Cloud Terrace.

The DM6437 chip, function in system is to realize the main algorithm of video input signals is processed, comprise: the functions such as three frame difference methods, skin color segmentation, the detection of people's face, statistical shape model coupling, the identification of infant's target, PID control are the core of native system.

The CMOS camera, the function in system is the analog video signal that generates family's scene, it is connected with the AD conversion chip.Specifically can adopt CMOS130 camera (ov9650 chip).

The AD conversion chip, the function in system is that analog video signal is converted into to digital signal, it is connected with the Video IN interface of the CCDC controller of DM6437 development board.Specifically can adopt the TVP5146M2 chip of TI company, it supports Composite or S Video, sampling precision can reach 10 bits, output format is supported CCIR-656 and BT656, analog signal can be converted into to 8/16 bit strip is capable, the YCbCr 4:2:2 digital video frequency flow of field sync signal, then be input in DSP and processed.

Random access memory (RAM), the code when function in system is the storage system operation and interim video data, be connected with the EMIF interface of DM6437 development board.Specifically can adopt the 128M SDRAM of MICRON (Micron Technology).

Read-only memory (ROM), the function in system is saved system start-up code and system-program code, with the EMIF interface of DM6437 development board, is connected.Specifically can adopt the 64M Flash chip of the NOR type of Spansion company.

The multimedia coding chip, the function in system is to realize the audio/video coding function, with the I2C bus, is connected.Specifically can adopt the multimedia association of the many Microtronic A/S of intelligence to process the C627 chip.

Light-emitting diode display, the function in system is the on-the-spot result that the infant is followed the tracks of that shows, it is connected with the DAC D interface of DM6437 development board.Specifically can adopt the JV-M50D model display of the good company of kumquat.

The Cloud Terrace, the function in system is the result according to infant's target detection, rotates the CMOS camera target is carried out to real-time tracking, it is connected with the UART interface of video frequency processing chip by the RS485 interface.Specifically can adopt the prompt company in Ruian ST-SP8206PS model The Cloud Terrace.

Except with upper-part, the present invention also comprises some general peripheral hardwares: native system adopts the commutator transformer power supply of a 5V.Voltage device by this 5V produces three kinds of voltages to different equipment power supplies, and wherein: 1.2V is to the DSP kernel, and 3.3V is to the I/O port of DSP, and 1.8V is to internal memory.The UART chip is selected SC16C550, and the receiver that it is inner and transmitter respectively have the FIFO of 16B, can process the serial signal up to 3Mbps speed.The cradle head control instruction sends to The Cloud Terrace according to PELCO D agreement, comes direction and the position of active accommodation camera by the rotation of controlling The Cloud Terrace.Plagiarize for anti-software, can adopt the SAM chip as ciphering unit.Can add mixed-media network modules mixed-media in addition, mainly complete Internet Transmission task, video compression, the initialization of protocol stack, arrange DHCP, the functions such as Web page loading.

The video front VPFE of system obtains video data by the preview engine from cmos sensor, and is translated into the YUV422 form.The DACs of 4 54M is provided by the VENC encoder of video rear end VPBE, NTSC, PAL, S-Video and the output of YPbPr video are provided, also provide the digital video of 24 to output to rgb interface simultaneously.

The video output of system has 4 output DACs to coordinate from different outputting standards, supports multiple outputting standard.DACs can support composite video by programming, component (aberration) video or RGB.Can obtain S-video output by connector P1, this connector is driven by DACs B and C.

After system starts, the bootload program, boot copies the intelligent video handling procedure SDRAM to and carries out from FLASH, program based on the FVID interface exploitation realizes the relevant Intelligent treatment algorithms such as detection and tracking of DM6437 to the infant's target under domestic environment, and drives The Cloud Terrace in real time target to be followed the tracks of and taken according to its positional information.The technical characteristics of intelligent algorithm module is:

1, moving target detects:

Respectively the image (frame0, frame1, frame2) of adjacent three frames carried out to the Gaussian smoothing denoising, then the image of continuous adjacent 3 frames is done to the continuous frame difference method (fram1-frame0; Frame2-frame1), 2 frame difference image results obtaining are carried out to threshold process, obtain bianry image.Respectively two poor bianry images of frame are first corroded to 1 to 2 time, then expand 2 to 10 times, to eliminate the hole of noise and inside, fill area.The two frame bianry images that obtained by upper Area morphology are carried out and computing.On the basis with computing, enterprising road wheel is wide extracts, and the zone that profile surrounds is the result that moving target detects.Idiographic flow as shown in Figure 6.

2, complexion model dwindles processing region:

On the basis of three frame frame difference method results, extract the less object candidate area at positions such as comprising people's face and hand according to complexion model.

3, according to people's face testing result, by active shape model, people's face is mated, extracted corresponding profile information.

At first will set up people's face statistical shape model, because will obtain feature s and d, only need to obtain the information of human eye, nose and facial contour, therefore, this realizes that system chooses four sections profiles and set up statistical shape model, and they are: 1) facial contour part ([x _I1, y _I1, x _I13, y _I13] ^T); 2) left eye profile ([x _I14, y _I14..., x _I21, y _I21] ^T); 3) right eye profile ([x _I22, y _I22..., x _I29, y _I29] ^T); 4) nose profile ([x _I48, y _I48..., x _I58, y _I58] ^T).In the Statistical Shape training, every face is demarcated (comprising 13 points of facial contour, 8 points of left eye profile, 8 points of right eye profile and 11 points of nose profile) with 40 index points, the length of local gray level model is made as 7 (3 of every sides), and the search width is made as 2 pixels.Adopt the signature search of multiresolution in matching process, minute 4 layers of realization.At first searched under low resolution in search procedure, then rise to step by step under higher leveled resolution and searched for.Under low resolution, mainly consider the Global Information of image, the scope of search is larger, obtains proceeding to higher level after comparatively desirable matching result again, and the scope that reduces search is mated.

Pass through active shape model, obtain the coordinate (xlefteye of people's face left eye profile central point L, ylefteye), the coordinate C (xnose, ynose) of the coordinate (xrighteye, yrighteye) of right eye profile central point R and nose profile central point is respectively:

x_{lefteye} = \frac{1}{8} Σ_{i = 14}^{21} x_{i};

x_{righteye} = \frac{1}{8} Σ_{i = 22}^{29} x_{i};

x_{nose} = \frac{1}{11} Σ_{i = 48}^{58} x_{i}

y_{lefteye} = \frac{1}{8} Σ_{i = 14}^{21} y_{i};

y_{righteye} = \frac{1}{8} Σ_{i = 22}^{29} y_{i};

y_{nose} = \frac{1}{11} Σ_{i = 48}^{58} y_{i}

The coordinate at the eyes that utilization obtains and nose profile center, can carry out the work of people's face normalization: definition standard faces size is 64 * 64, and facial image is centered by nose profile mid point.Two profile centers and distance Normalization is 28, and the distance of nose profile center C to two a profile line of centres LR is 8.Can realize the normalization of people's face by L, R and C point.After obtaining people's face of normalization, then according to obtaining facial contour [x _I1, y _I1..., x _I13, y _I13] ^TCarry out the removal of people's face background.

4, infant's target identification:

It is two classification problems that the inspection of infant's target can be regarded as: one is adult's face, and one is infant's face.Face to be identified can be regarded 1 x in two-dimensional space as, both direction represent respectively d and s value (x=[ds] ^T), therefore, S _bAnd S _wCan be expressed as:

S _b＝(m _baby-m _nobaby)(m _baby-m _nobaby) ^T

S_{w} = 0.5 (\underset{x &Element; C_{baby}}{Σ} (x - m_{baby}) {(x - m_{baby})}^{T} + \underset{x &Element; C_{nobaby}}{Σ} (x - m_{nobaby}) {(x - m_{nobaby})}^{T})

Can obtain the w of the direction of maximum J (w):

w＝S _w ^-1(m _baby-m _nobaby)

By w, the two-dimensional space of being opened with the s direction by d can be converted into the feature space of one dimension:

babyvalue＝w ^Tx

The babyvalue value means the degree of people's face to be identified close to infant's feature, and using 50 infant infant's faces as Self-class, 50 non-infant's faces are as Imposter Group.Obtain projecting direction w by the LDA method, the threshold value of distinguishing non-infant's face and infant's face is taken as the mid point of the individual line of centres of two classes, obtains threshold value k, if babyvalue is greater than k, can judge that target is the infant, and the sizes values of definite degree and babyvalue is directly proportional.In this embodiment, because the babyvalue value obtained is greater than threshold value k, can judge that people's face to be identified is non-infant's target.

If the infant's target detected in scene, system be take this zone and is followed the tracks of as tracking target drives The Cloud Terrace, no longer calls above-mentioned infant's recognizer in the process of following the tracks of, and keeps the tracking to this moving region.Infant's target do not detected in scene, system can be chosen the zone of area maximum in the zone that three frame difference methods extract and be followed the tracks of, and now can reach the real-time monitoring to other targets of family, also can be used for general family's scene video monitoring application.

5, video subregion and The Cloud Terrace PID control:

Referring to Fig. 7, in order to simplify the tracking computation complexity, video is divided into to 9 districts, if target is positioned at 9th district, move the The Cloud Terrace bottom right, 8th district: move down; 7th district: move lower-left; 6th district: move right; 5th district: remain unchanged; 4th district: be moved to the left; 3rd district: move upper right; 2nd district: move up; 1st district: move upper left.

Simultaneously in order to guarantee that The Cloud Terrace can move fast and accurately, is numbered 5 zone thereby make target drop on.This need to eliminate shake and the overshoot phenomenon of The Cloud Terrace.System is used pid algorithm accurately to control The Cloud Terrace.P (Proportional: ratio) control response error rapidly, thereby reduce steady-state error, but it can not eliminate steady-state error, when P chooses conference, causes system stability wherein.(Intergration: the effect of integration) controlling is: when the error existence of system in I control, integral controller is accumulation constantly, the output controlled quentity controlled variable is eliminated this error, and making systematic error is 0, but integral action can cause too by force system concussion to occur.Therefore use D (Differentiation: differential) control and can reduce overshoot, overcome reforming phenomena.

Integral separation PID controling algorithm need be set integration and separate threshold epsilon, in utilizing cloud platform rotation tracking infant object procedure, when | e (k) | during>ε, be that difference e (k) between the position of The Cloud Terrace current location and infant's target is when larger, adopt PD to control, to reduce overshoot, system is had than fast-response; As | e (k) | during≤ε, difference hour, adopts PID to control, to guarantee that The Cloud Terrace can accurately put in place.(e is: the difference of The Cloud Terrace current location and target location).The controlled quentity controlled variable of system is e (k), and output variable is U (k), and control flow as shown in Figure 8.

Above demonstration and described basic principle of the present invention and principal character and advantage of the present invention.The technical staff of the industry should understand; the present invention is not restricted to the described embodiments; that in above-described embodiment and specification, describes just illustrates principle of the present invention; without departing from the spirit and scope of the present invention; the present invention also has various changes and modifications, and these changes and improvements all fall in the claimed scope of the invention.The claimed scope of the present invention is defined by appending claims and equivalent thereof.

Claims

1. the method that the infant initiatively follows the tracks of monitoring, it is characterized in that, after analog video signal collection A/D conversion, this vision signal is carried out to algorithm process: by continuous three frame frame difference methods, extract moving region, adopt again the Face Detection model filter to fall the non-face zone in moving region, and in the candidate region dwindled, use the AdaBoost algorithm to carry out the detection of people's face; To the people's face detected, use active shape model to be mated, extract the profile of people's face to be detected, and obtain the area s apart from d and nose profile enclosing region of this profile the inside eye center line intermediate point to chin profile geometric center, and the best projection direction w that utilizes linear resolved analysis method to find out to distinguish infant's face and non-infant's face; D and s are carried out to projection on the w direction, and according to the some x=[d s after projection] ^T, by w, the two-dimensional space of being opened with the s direction by d can be converted into the feature space of one dimension: babyvalue=w ^TX, the babyvalue value means the degree of people's face to be identified close to infant's feature, the threshold value of distinguishing non-infant's face and infant's face is taken as the mid point of the individual line of centres of two classes, obtain threshold value k, if babyvalue is greater than k, can judge that target is the infant, and the sizes values of definite degree and babyvalue is directly proportional; If target is the infant, using it as tracing object; After above-mentioned algorithm process finishes, recycling pid algorithm and PELCO D agreement, control The Cloud Terrace and rotate up and down, thereby automatically realize the active real-time tracking monitoring to the infant.

2. initiatively follow the tracks of the method for monitoring according to the infant of claim 1, it is characterized in that, described method is specifically further comprising the steps of:

(1) video acquisition process: system is from the analog video signal of camera collection PAL or TSC-system formula;

(2) A/D transfer process: the digital video letter that is converted to the YUV422 form by the TVP5146 chip, vision signal is stored in SDRAM, by the FVID interface, the data that collect are conducted interviews, and it is sent in the DM6437 chip and carries out concrete algorithm process; The primary signal resolution of system acquisition is 640 * 480, for improving efficiency of algorithm, is translated into 320 * 240 resolution;

(3) moving target detects: for adapting to the impact of initiatively following the tracks of cloud platform rotation, use three frame frame difference methods to detect moving region; Respectively adjacent three two field pictures are carried out to the Gaussian smoothing denoising, then do continuous frame difference method, 2 frame difference image results obtaining are carried out to threshold process, obtain bianry image; Respectively two poor bianry images of frame are first corroded and expansion process, then the two frame bianry images that obtained by top Morphology Algorithm are carried out and computing; The image obtained with computing is carried out to the profile extraction, and the zone that profile surrounds is the result that moving target detects;

(4) dwindle processing region, on the basis of three frame frame difference method results, extract the less object candidate area that comprises people's face and hand position according to complexion model; After original image being carried out to YUV → RGB → YCrCb complexion model transformation, can obtain an image based on the YCrCb color model, choose suitable Cr and Cb threshold value, being divided into area of skin color and non-area of skin color on image;

(5) adopt AdaBoost people's face detection algorithm to carry out the detection of people's face in area of skin color, thereby reduce the region of search of people's face detection algorithm, improve speed and the accuracy of algorithm, more be conducive to the real-time processing of embedded hardware; When Adaboost detects for people's face, extract a large amount of one dimension simple feature from people's face; These simple feature have certain people's face and non-face differentiation, and final system is used thousands of one dimension simple classification devices, combine and reach good classifying quality, and speed also can meet the requirement of real-time;

(6) according to people's face testing result, by active shape model, people's face is mated, extract corresponding profile information, 58 index points of each contour correspondence, form one 116 vector of tieing up;

(7) utilize the area s apart from d and nose profile enclosing region of the eye center line mid point of the profile of obtaining to chin profile geometric center point, input using these two features as linear resolved analysis, find out the best projection direction w that can distinguish infant's face and non-infant's face; Two the feature d and the s that mean target to be detected are carried out to projection on the w direction, and according to this x=[d s] ^T, by w, the two-dimensional space of being opened with the s direction by d can be converted into the feature space of one dimension: babyvalue=w ^TX, the babyvalue value means the degree of people's face to be identified close to infant's feature, the threshold value of distinguishing non-infant's face and infant's face is taken as the mid point of the individual lines of centres of two classes, obtains threshold value k, by the relation of babyvalue and threshold value k relatively, judges whether infant of target to be detected;

3. initiatively follow the tracks of the system that realizes of monitoring method according to a kind of infant of claim 1, it is characterized in that, described system comprises:

4. initiatively follow the tracks of the system that realizes of monitoring method according to the infant of claim 3, it is characterized in that, described master control processing unit comprises video interface module, EMIF interface module, algorithm processing module, Online Video display module, video encoding module and at least more than one DAC interface; Described video interface module is connected with the AD modular converter, receives by the digital video signal after the AD module processing; Described algorithm processing module is according to the digital video signal received, extract moving region by continuous three frame frame difference methods, adopt again the Face Detection model filter to fall the non-face zone in moving region, and in the candidate region dwindled, use the AdaBoost algorithm to carry out the detection of people's face; To the people's face detected, use active shape model to be mated, extract the profile of people's face to be detected, and obtain the area s apart from d and nose profile enclosing region of this profile the inside eye center line intermediate point to chin profile geometric center, and the best projection direction w that utilizes linear resolved analysis method to find out to distinguish infant's face and non-infant's face; D and s are carried out to projection on the w direction, and according to the some x=[d s after projection] ^T, by w, the two-dimensional space of being opened with the s direction by d can be converted into the feature space of one dimension: babyvalue=w ^TX, the babyvalue value means the degree of people's face to be identified close to infant's feature, the threshold value of distinguishing non-infant's face and infant's face is taken as the mid point of the individual line of centres of two classes, obtain threshold value k, judge whether infant of target to be detected by the relation of babyvalue and threshold value k relatively, if target is the infant, using it as tracing object; Then algorithm processing module is regulated camera angle by the UART interface module to the instruction of The Cloud Terrace sending action according to judged result, thereby realizes under family's monitoring scene automatic detection and location to infant's target; And the digital video signal that will process sends the real-time demonstration that the Online Video display module carries out video image to simultaneously, or via after the video compression coding of video encoding module by this digital video signal output.

5. initiatively follow the tracks of the system that realizes of monitoring method according to the infant of claim 3, it is characterized in that, described system also is provided with a multimedia coding module that is used for realizing audio/video coding, this multimedia coding module is by connecting I2C bus and ethernet transceiver, and the digital video signal that will be transmitted by the AD modular converter is sent to strange land by the Internet; The digital video signal of perhaps algorithm processing module being processed is sent to strange land by the Internet; Described multimedia coding module can adopt multimedia association to process the C627 chip.

6. initiatively follow the tracks of the system that realizes of monitoring method according to the infant of claim 3, it is characterized in that, described CMOS photographing module specifically adopts the CMOS130 camera; Described random access memory module is connected with the EMIF interface of master control processing unit, the 128M SDRAM of employing; Described read-only memory module is connected with the EMIF interface of master control processing unit, adopts the 64M Flash chip of NOR type.

7. initiatively follow the tracks of the system that realizes of monitoring method according to the infant of claim 3, it is characterized in that, described system also is provided with a light-emitting diode display, and this display is connected with the DAC interface of master control processing unit, adopts JV-M50D model display; Described system adopts the commutator transformer power supply of a 5V, and the voltage device of this 5V produces three kinds of voltages to different equipment power supplies, and wherein, 1.2V is to the DSP kernel, and 3.3V is to the I/O port of DSP, and 1.8V is to internal memory.

8. initiatively follow the tracks of the system that realizes of monitoring method according to the infant of claim 3, it is characterized in that, described The Cloud Terrace adopts ST-SP8206PS model The Cloud Terrace, by the RS485 interface, with the UART interface module of algorithm processing module, is connected; The control command of described The Cloud Terrace sends to The Cloud Terrace according to PELCO D agreement, comes direction and the position of active accommodation camera by the rotation of controlling The Cloud Terrace.

9. initiatively follow the tracks of the system that realizes of monitoring method according to the infant of claim 4, it is characterized in that, described UART interface module adopts the SC16C550 chip, and the receiver that it is inner and transmitter respectively have the FIFO of 16B, can process the serial signal up to 3Mbps speed.

10. initiatively follow the tracks of the system that realizes of monitoring method according to the infant of claim 3, it is characterized in that, described system is obtained video data by the preview engine from the CMOS photographing module, and is translated into the YUV422 form; The DAC interface of 4 54M is provided by the VENC encoder, NTSC, PAL, S-Video and the output of YPbPr video are provided, also provide the digital video of 24 to output to rgb interface simultaneously.