CN108171212A

CN108171212A - For detecting the method and apparatus of target

Info

Publication number: CN108171212A
Application number: CN201810055152.XA
Authority: CN
Inventors: 杜康
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-01-19
Filing date: 2018-01-19
Publication date: 2018-06-15

Abstract

The embodiment of the present application discloses the method and apparatus for detecting target.One specific embodiment of this method includes：Coloured image to be detected and depth image to be detected are obtained, wherein, coloured image to be detected and depth image to be detected include the image-region of target to be detected；The fisrt feature information of target to be detected is extracted from coloured image to be detected, and the second feature information of target to be detected is extracted from depth image to be detected；The fisrt feature information of target to be detected and the second feature information of target to be detected are merged, obtains the fusion feature information of target to be detected；The fusion feature information of target to be detected is input to target detection model trained in advance, obtains classification and the position of target to be detected, wherein, target detection model is used to characterize the correspondence between the fusion feature information of target and the classification of target and position.The embodiment combination coloured image and depth image carry out target detection, improve the accuracy of target detection.

Description

For detecting the method and apparatus of target

Technical field

The invention relates to field of computer technology, and in particular to image identification technical field, more particularly, to The method and apparatus for detecting target.

Background technology

Image object detects, that is, detects classification and/or the position of target included in image.Existing image object Detection method is typically using camera shoot coloured image, then software is recycled to carry out target inspection according to coloured image ash scale It surveys.

Invention content

The embodiment of the present application proposes the method and apparatus for detecting target.

In a first aspect, the embodiment of the present application provides one kind for detecting mesh calibration method, this method includes：It obtains to be checked Coloured image and depth image to be detected are surveyed, wherein, coloured image to be detected and depth image to be detected include target to be detected Image-region；Extract the fisrt feature information of target to be detected from coloured image to be detected, and from depth image to be detected The second feature information of middle extraction target to be detected；Merge target to be detected fisrt feature information and target to be detected second Characteristic information obtains the fusion feature information of target to be detected；The fusion feature information of target to be detected is input to advance instruction Experienced target detection model obtains classification and the position of target to be detected, wherein, target detection model is used to characterize melting for target Close the correspondence between the classification of characteristic information and target and position.

In some embodiments, in the second feature letter for merging the fisrt feature information of target to be detected and target to be detected Breath, before obtaining the fusion feature information of target to be detected, further includes：Infrared image to be detected is obtained, wherein, it is to be detected infrared Image includes the image-region of target to be detected；The third feature information of target to be detected is extracted from infrared image to be detected； And the fisrt feature information of target to be detected and the second feature information of target to be detected are merged, obtain melting for target to be detected Characteristic information is closed, including：Merge the fisrt feature information of target to be detected, the second feature information of target to be detected and to be detected The third feature information of target obtains the fusion feature information of target to be detected.

In some embodiments, the fisrt feature information of target to be detected is extracted from coloured image to be detected, and from treating The second feature information of target to be detected is extracted in detection depth image, including：Coloured image to be detected is input to advance instruction The first experienced convolutional neural networks obtain the fisrt feature information of target to be detected, wherein, the first convolutional neural networks are used to carry Take the fisrt feature information of target；Depth image to be detected is input to the second convolutional neural networks of training in advance, is treated The second feature information of target is detected, wherein, the second convolutional neural networks are used to extract the second feature information of target.

In some embodiments, the third feature information of target to be detected is extracted from infrared image to be detected, including：It will Infrared image to be detected is input to third convolutional neural networks trained in advance, obtains the third feature information of target to be detected, Wherein, third convolutional neural networks are used to extract the third feature information of target.

In some embodiments, target detection model is obtained by following training step：Training sample is obtained, In, training sample includes classification and the position of sample coloured image, sample depth image, sample infrared image and sample object, Wherein, sample coloured image, sample depth image and sample infrared image include the image-region of sample object；From sample colour The fisrt feature information of sample object is extracted in image, the second feature information of sample object is extracted from sample depth image, And the third feature information of sample object is extracted from sample infrared image；Merge fisrt feature information, the sample of sample object The second feature information of target and the third feature information of sample object, obtain the fusion feature information of sample object；By sample The fusion feature information of target is as input, and using the classification of sample object and position as output, training obtains target detection mould Type.

Second aspect, the embodiment of the present application provide a kind of device for being used to detect target, which includes：It obtains single Member is configured to obtain coloured image to be detected and depth image to be detected, wherein, coloured image to be detected and depth to be detected Image includes the image-region of target to be detected；Extraction unit is configured to extract mesh to be detected from coloured image to be detected Target fisrt feature information, and extract from depth image to be detected the second feature information of target to be detected；Integrated unit is matched It puts for merging the fisrt feature information of target to be detected and the second feature information of target to be detected, obtains target to be detected Fusion feature information；Detection unit is configured to for the fusion feature information of target to be detected to be input to target trained in advance Detection model obtains classification and the position of target to be detected, wherein, target detection model is used to characterize the fusion feature letter of target Correspondence between the classification and position of breath and target.

In some embodiments, acquiring unit is also configured to obtain infrared image to be detected, wherein, it is to be detected infrared Image includes the image-region of target to be detected；Extraction unit is also configured to extract from infrared image to be detected to be detected The third feature information of target；Integrated unit is also configured to merge the fisrt feature information of target to be detected, target to be detected Second feature information and target to be detected third feature information, obtain the fusion feature information of target to be detected.

In some embodiments, extraction unit includes：First extraction module is configured to input coloured image to be detected To the first convolutional neural networks of training in advance, the fisrt feature information of target to be detected is obtained, wherein, the first convolution nerve net Network is used to extract the fisrt feature information of target；Second extraction module is configured to depth image to be detected being input in advance The second trained convolutional neural networks obtain the second feature information of target to be detected, wherein, the second convolutional neural networks are used for Extract the second feature information of target.

In some embodiments, extraction unit further includes：Third extraction module is configured to infrared image to be detected is defeated Enter to third convolutional neural networks trained in advance, obtain the third feature information of target to be detected, wherein, third convolutional Neural Network is used to extract the third feature information of target.

The third aspect, the embodiment of the present application provide a kind of electronic equipment, which includes：One or more processing Device；Storage device, for storing one or more programs；When one or more programs are executed by one or more processors, make Obtain method of the one or more processors realization as described in realization method any in first aspect.

Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey Sequence realizes the method as described in realization method any in first aspect when the computer program is executed by processor.

Method and apparatus provided by the embodiments of the present application for detecting target are extracted from coloured image to be detected first The fisrt feature information of target to be detected, and extract from depth image to be detected the second feature information of target to be detected；So The fisrt feature information of target to be detected and the second feature information of target to be detected are merged afterwards, so as to obtain target to be detected Fusion feature information；The fusion feature information of target to be detected is finally input to the fusion feature information and target of characterization target Classification and position between correspondence target detection model, so as to obtain the classification of target to be detected and position.With reference to Coloured image and depth image carry out target detection, improve the accuracy of target detection.

Description of the drawings

By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other Feature, objects and advantages will become more apparent upon：

Fig. 1 is that this application can be applied to exemplary system architecture figures therein；

Fig. 2 is the flow chart for being used to detect one embodiment of mesh calibration method according to the application；

Fig. 3 is the flow chart for being used to detect another embodiment of mesh calibration method according to the application；

Fig. 4 is the structure diagram for being used to detect one embodiment of the device of target according to the application；

Fig. 5 is adapted for the structure diagram of the computer system of the electronic equipment for realizing the embodiment of the present application.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention rather than the restriction to the invention.It also should be noted that in order to Convenient for description, illustrated only in attached drawing and invent relevant part with related.

It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1, which is shown, to detect mesh calibration method or the implementation for detecting the device of target using the application The exemplary system architecture 100 of example.

As shown in Figure 1, system architecture 100 can include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be interacted with using terminal equipment 101,102,103 by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed, such as photography and vedio recording class should on terminal device 101,102,103 It is applied with, image processing class, searching class application etc..

Terminal device 101,102,103 can be the various electronic equipments for having display screen, including but not limited to intelligent hand Machine, tablet computer, pocket computer on knee and desktop computer etc..

Server 105 can be to provide the server of various services, such as the figure to the upload of terminal device 101,102,103 As the image processing server handled.Image processing server can be to the coloured image to be detected that receives and to be detected Depth image etc. carries out the processing such as analyzing, and handling result (such as the classification of target to be detected and position) is fed back to terminal and is set It is standby.

It should be noted that generally being held by server 105 for detecting mesh calibration method of being provided of the embodiment of the present application Row, correspondingly, the device for detecting target is generally positioned in server 105.

It should be pointed out that the local of server 105 can also directly store coloured image to be detected and depth to be detected Image, server 105 can directly extract local coloured image to be detected and depth image to be detected is detected, at this point, Exemplary system architecture 100 can not include terminal device 101,102,103 and network 104.

It is to be noted that can also be equipped with image processing class application in terminal device 101,102,103, terminal is set Standby 101,102,103 can also treat sense colors image and depth image to be detected progress target based on image processing class application Detection, at this point, can also be performed for detecting mesh calibration method by terminal device 101,102,103, correspondingly, for detecting mesh Target device can also be set in terminal device 101,102,103.At this point, exemplary system architecture 100 can not include clothes Business device 105 and network 104.

It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realization need Will, can have any number of terminal device, network and server.

With continued reference to Fig. 2, it illustrates the flows for being used to detect one embodiment of mesh calibration method according to the application 200.This is used to detect mesh calibration method, includes the following steps：

Step 201, coloured image to be detected and depth image to be detected are obtained.

In the present embodiment, for detecting electronic equipment (such as the service shown in FIG. 1 of mesh calibration method operation thereon Device 105) can by wired connection or radio connection from terminal device (such as terminal device shown in FIG. 1 101, 102nd, 103) coloured image to be detected and depth image to be detected are obtained.

Here, terminal device can be the various electronic equipments for having the function of photography and vedio recording, for example, terminal device can be Electronic equipment with depth camera.Wherein, depth camera can be called RGB-D cameras again, can be used for shooting RGB-D images.RGB-D images can include coloured image (RGB image) and depth image (Depth images).Coloured image The pixel value of each pixel can be the color value each put of captured target surface.In general, human eyesight can feel The all colours known be by red (R), green (G), blue (B) three Color Channels variation and they are mutual folded What Calais obtained.It is every with captured target surface that the pixel value of each pixel of depth image can be depth camera The distance between a point.In general, coloured image and depth image are registrations, thus the pixel of coloured image and depth image Between have one-to-one correspondence.

Here, coloured image to be detected and depth image to be detected can include the image-region of target to be detected.It is to be checked Survey target can be in numerous targets in captured coloured image to be detected and depth image to be detected any one or Multiple targets.

Step 202, the fisrt feature information of target to be detected is extracted from coloured image to be detected, and from depth to be detected The second feature information of target to be detected is extracted in image.

In the present embodiment, it is set based on the coloured image to be detected acquired in step 201 and depth image to be detected, electronics It is standby the fisrt feature information of target to be detected to be extracted from coloured image to be detected, and extracted from depth image to be detected The second feature information of target to be detected.Wherein, the fisrt feature information of coloured image to be detected can treat sense colors The information that clarification of objective to be detected in image is described, including but not limited to color characteristic, textural characteristics, two-dimensional shapes Feature, two-dimensional spatial relationship feature etc..The second feature information of depth image to be detected can be to depth image to be detected In the information that is described of clarification of objective to be detected, including but not limited to three-dimensional shape features, three-dimensional relationship feature Etc..

In some optional realization methods of the present embodiment, electronic equipment can be first detected in coloured image to be detected Target to be detected position；Then it treats the region residing for the position of the target to be detected in sense colors image and utilizes mathematics Model simultaneously combine image processing techniques carry out image analysis, with extract target to be detected at least one of characteristic information, and by its Fisrt feature information as target to be detected.Wherein, characteristic information can include but is not limited to shape of face information, the shape of face Information, the position of face and percent information, the shape information of four limbs, the position of four limbs and ratio etc..

In some optional realization methods of the present embodiment, coloured image to be detected can be input to pre- by electronic equipment First the first convolutional neural networks of training, so as to obtain the fisrt feature information of target to be detected.Wherein, the first convolution nerve net Network can be used for extracting the fisrt feature information of target.In practice, convolutional neural networks (Convolutional Neural Network, CNN) can be a kind of feedforward neural network, its artificial neuron can respond the week in a part of coverage area Unit is enclosed, has outstanding performance for large-scale image procossing.In general, the basic structure of convolutional neural networks includes two layers, one is Feature extraction layer, the input of each neuron are connected, and extract the feature of the part with the local acceptance region of preceding layer.It once should After local feature is extracted, its position relationship between other feature is also decided therewith；The second is Feature Mapping layer, feature Mapping layer uses activation primitive so that Feature Mapping has shift invariant.

It should be noted that above-mentioned first convolutional neural networks can utilize various machine learning methods and training sample To obtained from existing convolutional neural networks progress Training.Specifically, the first convolutional neural networks can be passed through What following training step obtained：

Firstth, training sample is obtained.

Here, training sample can include the characteristic information of sample coloured image and sample coloured image.

Second, using sample coloured image as input, using the characteristic information of sample coloured image as output, training obtains First convolutional neural networks.

It, can be by each network parameter (for example, weighting parameter and offset parameter) of convolutional neural networks with one in practice A little different small random numbers are initialized." small random number " is for ensureing that network will not enter saturation shape when weights are excessive State, so as to cause failure to train, " difference " is for ensureing that network can normally learn.The network parameter of convolutional neural networks exists It can constantly be adjusted in training process, until training can characterize between coloured image and the characteristic information of coloured image Correspondence the first convolutional neural networks until.For example, BP (Back Propagation, backpropagation) calculations may be used Method or SGD (Stochastic Gradient Descent, stochastic gradient descent) algorithms adjust the net of convolutional neural networks Network parameter.

It should be noted that the concrete operations of the second feature information of target to be detected are extracted from depth image to be detected The optional realization method of above two is may refer to, details are not described herein again.Electronic equipment can input depth image to be detected To in advance training the second convolutional neural networks, so as to obtain the second feature information of target to be detected, wherein, the second convolution god It is used to extract the second feature information of target through network, training method is similar with the training method of the first convolutional neural networks, Details are not described herein again.

Step 203, the fisrt feature information of target to be detected and the second feature information of target to be detected are merged, is treated Detect the fusion feature information of target.

In the present embodiment, fisrt feature information and the second feature letter for the target to be detected extracted based on step 202 Breath, electronic equipment can merge the fisrt feature information of target to be detected and the second feature information of target to be detected, so as to To the fusion feature information of target to be detected.

Here, it is to be checked if the fisrt feature information of target to be detected is characteristic image corresponding with coloured image to be detected The second feature information for surveying target is characteristic image corresponding with depth image to be detected, then the fusion feature of target to be detected Information can be the fusion feature image of target to be detected.It is understood that ideally, with coloured image pair to be detected The pixel distribution of characteristic image and characteristic image corresponding with depth image to be detected answered can be identical, in this way, can Directly to merge characteristic image corresponding with coloured image to be detected and characteristic image corresponding with depth image to be detected, obtain The fusion feature image of target to be detected.If characteristic image corresponding with coloured image to be detected and with depth image to be detected The pixel distribution of corresponding characteristic image is different, then can first by characteristic image corresponding with coloured image to be detected and with The corresponding characteristic image of depth image to be detected transforms to same pixel distribution and then carries out mixing operation.Specifically may be used To use various image conversion methods, image conversion method is techniques known, and is not the emphasis of the application, herein It repeats no more.

In some optional realization methods of the present embodiment, it is assumed that characteristic image corresponding with coloured image to be detected Pixel is distributed as：W × h, wherein, w and h are positive integer, i.e., characteristic image corresponding with coloured image to be detected laterally has w A pixel longitudinally has h pixel, and, each pixel includes three in characteristic image corresponding with coloured image to be detected The data of a channel are R channels, G channels and channel B respectively, represent that red, the color of three channels of green and blue are strong respectively Angle value.The pixel distribution for assuming again that characteristic image corresponding with depth image to be detected is also w × h, and, with depth to be detected Each pixel includes the data of a channel in the corresponding characteristic image of image, is D channels, represents that the depth of depth channel is strong Angle value.It so, will characteristic image corresponding with coloured image to be detected and characteristic image pair corresponding with depth image to be detected The data for the pixel answered are combined, so as to generate the data of corresponding pixel points in the fusion feature image of target to be detected, That is, each pixel in fusion feature image includes the data of four channels, it is that R channels, G channels, channel B and D lead to respectively Road represents the depth of the color intensity value of R channels, the color intensity value of G channels, the color intensity value of channel B and D channels respectively Intensity value.

Step 204, the fusion feature information of target to be detected is input to target detection model trained in advance, is treated Detect classification and the position of target.

In the present embodiment, the fusion feature information based on the obtained target to be detected of step 203, electronic equipment can be with The fusion feature information of target to be detected is input to classification and the position of fusion feature information and target that can characterize target Between correspondence target detection model, so as to obtain the classification of target to be detected and position.

In the present embodiment, according to different mode classifications, target can be divided into different classifications.As an example, Target can be people, article, animal, plant, building, place in physical world etc..As an example, target can also be specific People or animal body part, for example, face, the head etc. of animal.As an example, target can also be it is specific certain Animal or plant, for example, monkey, elephant, bushes etc..

In the present embodiment, the position of target can be position of the target in coloured image and/or depth image.Because There is one-to-one correspondence, so same target is in coloured image and depth between coloured image and the pixel of depth image The position spent in image is identical.The position of target can represent in several ways.For example, the position of target can use one The curve of a closing represents that the pixel for belonging to target is largely fallen in curve, and the pixel for being not belonging to target is largely fallen In extra curvature.

It should be noted that above-mentioned target detection model can be to existing using various machine learning methods and training sample Some machine learning models (such as various artificial neural networks etc.) are carried out obtained from Training.Specifically, target is examined Surveying model can be obtained by following training step：

Firstth, training sample is obtained.

Here, training sample can include sample coloured image, the classification of sample depth image and sample object and position, Wherein, sample coloured image and sample depth image can include the image-region of sample object.

Second, the fisrt feature information of sample object is extracted from sample coloured image, and carried from sample depth image Take the second feature information of sample object.

It should be noted that extracting the fisrt feature information of sample object from sample coloured image and from sample depth figure The concrete operations of the second feature information of extraction sample object can refer to the correlation in embodiment illustrated in fig. 2 in step 202 as in Description, details are not described herein.

Third merges the fisrt feature information of sample object and the second feature information of sample object, obtains sample object Fusion feature information.

It should be noted that the second feature information of the fisrt feature information and sample object of fusion sample object is specific Operation can refer to the associated description in embodiment illustrated in fig. 2 in step 203, and details are not described herein.

4th, using the fusion feature information of sample object as input, using the classification of sample object and position as exporting, Training obtains target detection model.

It, can be by each network parameter (for example, weighting parameter and offset parameter) of artificial neural network with one in practice A little different small random numbers are initialized." small random number " is for ensureing that network will not enter saturation shape when weights are excessive State, so as to cause failure to train, " difference " is for ensureing that network can normally learn.The network parameter of artificial neural network exists Can constantly be adjusted in training process, until train can characterize the fusion feature information of target and the classification of target and Until the target detection model of correspondence between position.

In some optional realization methods of the present embodiment, electronic equipment can also be by the classification of target to be detected and position It puts and feeds back to terminal device.For example, electronic equipment can be anti-by the text message for recording the classification for having target to be detected and position Terminal device is fed to, can also will be labeled with the classification of target to be detected and the coloured image to be detected of position and/or to be detected Depth image feeds back to terminal device.

It is provided by the embodiments of the present application to be used to detect mesh calibration method, it is extracted from coloured image to be detected first to be detected The fisrt feature information of target, and extract from depth image to be detected the second feature information of target to be detected；Then it merges The fisrt feature information of target to be detected and the second feature information of target to be detected, so as to obtain the fusion of target to be detected spy Reference ceases；The fusion feature information of target to be detected is finally input to the fusion feature information of characterization target and the classification of target The target detection model of correspondence between position, so as to obtain the classification of target to be detected and position.With reference to cromogram Picture and depth image carry out target detection, improve the accuracy of target detection.

With further reference to Fig. 3, it illustrates another embodiments for being used to detect mesh calibration method according to the application Flow 300.This is used for the flow 300 for detecting mesh calibration method, includes the following steps：

Step 301, coloured image to be detected and depth image to be detected are obtained.

In the present embodiment, the concrete operations of step 301 and the basic phase of operation of step 201 in embodiment shown in Fig. 2 Together, details are not described herein.

Step 301' obtains infrared image to be detected.

In the present embodiment, for detecting electronic equipment (such as the service shown in FIG. 1 of mesh calibration method operation thereon Device 105) can by wired connection or radio connection from terminal device (such as terminal device shown in FIG. 1 101, 102nd, 103) infrared image to be detected is obtained.

Here, terminal device can be the various electronic equipments for having the function of photography and vedio recording, for example, terminal device can be Electronic equipment with depth camera and infrared camera.Wherein, depth camera can be called RGB-D cameras again, It can be used for shooting RGB-D images.RGB-D images can include coloured image and depth image.Each pixel of coloured image The pixel value of point can be the color value each put of captured target surface.The pixel value of each pixel of depth image Can be the distance between depth camera and each point of captured target surface.Infrared camera can be used for shooting red Outer image.Infrared image is the infrared ray of captured target reflection or own transmission and the image that is formed.Infrared image it is every The pixel value of a pixel can be the intensity value of each point reflection of captured target surface or the infrared ray of transmitting.It is logical Often, coloured image, depth image and infrared image are to be registrated, thus the pixel of coloured image, depth image and infrared image There is one-to-one correspondence between point.

Here, coloured image to be detected, depth image to be detected and infrared image to be detected can include target to be detected Image-region.Target to be detected can be captured coloured image to be detected, depth image to be detected and to be detected infrared Any one or more targets in numerous targets in image.

Step 302, the fisrt feature information of target to be detected is extracted from coloured image to be detected, and from depth to be detected The second feature information of target to be detected is extracted in image.

In the present embodiment, the concrete operations of step 302 and the basic phase of operation of step 202 in embodiment shown in Fig. 2 Together, details are not described herein.

Step 302' extracts the third feature information of target to be detected from infrared image to be detected.

In the present embodiment, based on step 301' acquired in infrared image to be detected, electronic equipment can be to be detected The third feature information of target to be detected is extracted in infrared image.Wherein, the third feature information of target to be detected can be pair The information that clarification of objective to be detected in infrared image to be detected is described, including but not limited to two-dimensional shape feature, two Dimension space relationship characteristic etc..

In some optional realization methods of the present embodiment, electronic equipment can be first detected in infrared image to be detected Target to be detected position；Then mathematics is utilized to the region residing for the position of the target to be detected in infrared image to be detected Model simultaneously combine image processing techniques carry out image analysis, with extract target to be detected at least one of characteristic information, and by its Third feature information as target to be detected.Wherein, characteristic information can include but is not limited to shape of face information, the shape of face Information, the position of face and percent information, the shape information of four limbs, the position of four limbs and ratio etc..

In some optional realization methods of the present embodiment, infrared image to be detected can be input to pre- by electronic equipment First trained third convolutional neural networks, so as to obtain the third feature information of target to be detected.Wherein, third convolutional Neural net Network can be used for extracting the third feature information of target.

It should be noted that above-mentioned third convolutional neural networks can utilize various machine learning methods and training sample Obtained from Training being carried out to existing convolutional neural networks, training method and the training of the first convolutional neural networks Mode is similar, and details are not described herein again.

Step 303, the fisrt feature information of target to be detected, the second feature information of target to be detected and to be detected are merged The third feature information of target obtains the fusion feature information of target to be detected.

In the present embodiment, fisrt feature information and the second feature letter for the target to be detected extracted based on step 302 Breath and the third feature information of target to be detected extracted of step 302', electronic equipment can merge the of target to be detected The third feature information of one characteristic information, the second feature information of target to be detected and target to be detected, it is to be detected so as to obtain The fusion feature information of target.

Here, it is to be checked if the fisrt feature information of target to be detected is characteristic image corresponding with coloured image to be detected The second feature information for surveying target is characteristic image corresponding with depth image to be detected, the third feature information of target to be detected It is characteristic image corresponding with infrared image to be detected, then the fusion feature information of target to be detected can be target to be detected Fusion feature image.It is understood that ideally, characteristic image corresponding with coloured image to be detected, with it is to be checked The pixel distribution for surveying the corresponding characteristic image of depth image and characteristic image corresponding with infrared image to be detected can be phase With, in this way, characteristic image corresponding with coloured image to be detected, spy corresponding with depth image to be detected can directly be merged Image and characteristic image corresponding with infrared image to be detected are levied, obtains the fusion feature image of target to be detected.If with treating The corresponding characteristic image of sense colors image, characteristic image corresponding with depth image to be detected and with infrared image pair to be detected The pixel distribution for the characteristic image answered is different, then can be first by characteristic image corresponding with coloured image to be detected, with treating The detection corresponding characteristic image of depth image and characteristic image corresponding with infrared image to be detected transform to same pixel Distribution and then progress mixing operation.Various image conversion methods specifically may be used, image conversion method is the public affairs of this field Know technology, and be not the emphasis of the application, details are not described herein.

In some optional realization methods of the present embodiment, it is assumed that characteristic image corresponding with coloured image to be detected Pixel is distributed as：W × h, wherein, w and h are positive integer, i.e., characteristic image corresponding with coloured image to be detected laterally has w A pixel longitudinally has h pixel, and, each pixel includes three in characteristic image corresponding with coloured image to be detected The data of a channel are R channels, G channels and channel B respectively, represent that red, the color of three channels of green and blue are strong respectively Angle value.The pixel distribution for assuming again that characteristic image corresponding with depth image to be detected is also w × h, and, with depth to be detected Each pixel includes the data of a channel in the corresponding characteristic image of image, is D channels, represents that the depth of depth channel is strong Angle value.The pixel distribution for assuming again that characteristic image corresponding with infrared image to be detected is also w × h, and, it is and to be detected infrared Each pixel includes the data of a channel in the corresponding characteristic image of image, is H channels, represents the infrared ray of infrared channel Intensity value.So, will characteristic image corresponding with coloured image to be detected, characteristic image corresponding with depth image to be detected and The data of the corresponding pixel of corresponding with infrared image to be detected characteristic image are combined, so as to generate target to be detected The data of corresponding pixel points in fusion feature image, that is, each pixel in fusion feature image includes the number of five channels According to being R channels, G channels, channel B, D channels and H channels respectively, represent the color intensity value of R channels, the color of G channels respectively Intensity value, the color intensity value of channel B, the depth intensity value of D channels and H channels infra-red intensity value.

Step 304, the fusion feature information of target to be detected is input to target detection model trained in advance, is treated Detect classification and the position of target.

In the present embodiment, the fusion feature information based on the obtained target to be detected of step 303, electronic equipment can be with The fusion feature information of target to be detected is input to classification and the position of fusion feature information and target that can characterize target Between correspondence target detection model, so as to obtain the classification of target to be detected and position.

In the present embodiment, the position of target can be target in coloured image and/or depth image and/or infrared image In position.Because having one-to-one correspondence between the pixel of coloured image, depth image and infrared image, Position of the same target in coloured image, depth image and infrared image is identical.It the position of target can be by a variety of Mode represents.For example, the position of target can represent that the pixel for belonging to target is largely fallen in song with the curve of a closing In line, the pixel for being not belonging to target is largely fallen in extra curvature.

Firstth, training sample is obtained.

Here, training sample can include sample coloured image, sample depth image, sample infrared image and sample object Classification and position, wherein, sample coloured image, sample depth image and sample infrared image can include the figure of sample object As region.

Second, the fisrt feature information of sample object is extracted from sample coloured image, is extracted from sample depth image The second feature information of sample object, and from sample infrared image extract sample object third feature information.

It should be noted that the fisrt feature information of sample object is extracted from sample coloured image, from sample depth figure The second feature information of extraction sample object and the third feature information of extraction sample object from sample infrared image as in Concrete operations can refer to the associated description in step 302 and step 302' in embodiment illustrated in fig. 3, and details are not described herein.

Third merges the of the fisrt feature information of sample object, the second feature information of sample object and sample object Three characteristic informations obtain the fusion feature information of sample object.

It should be noted that fisrt feature information, the second feature information of sample object and the sample of fusion sample object The concrete operations of the third feature information of target can refer to the associated description in embodiment illustrated in fig. 3 in step 303, herein no longer It repeats.

From figure 3, it can be seen that compared with the corresponding embodiments of Fig. 2, in the present embodiment for detecting mesh calibration method Flow 300 increase from infrared image extract third feature information the step of.The scheme of the present embodiment description combines color as a result, Color image, depth image and infrared image carry out target detection, for captured image in the case of dark, improve The accuracy of target detection.

With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, this application provides one kind for detecting mesh One embodiment of target device, the device embodiment is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.

As shown in figure 4, the present embodiment can include for detecting the device 400 of target：Acquiring unit 401, extraction are single Member 402, integrated unit 403 and detection unit 404.Wherein, acquiring unit 401, be configured to obtain coloured image to be detected and Depth image to be detected, wherein, coloured image to be detected and depth image to be detected include the image-region of target to be detected；It carries Unit 402 is taken, is configured to extract the fisrt feature information of target to be detected from coloured image to be detected, and from depth measurement to be checked The second feature information of target to be detected is extracted in degree image；Integrated unit 403 is configured to merge the first of target to be detected The second feature information of characteristic information and target to be detected obtains the fusion feature information of target to be detected；Detection unit 404, It is configured to for the fusion feature information of target to be detected to be input to target detection model trained in advance, obtains target to be detected Classification and position, wherein, target detection model for characterize the fusion feature information of target and the classification of target and position it Between correspondence.

In the present embodiment, for detecting in the device 400 of target：Acquiring unit 401, extraction unit 402, integrated unit 403 and detection unit 404 specific processing and its caused technique effect can be respectively with reference to the step in 2 corresponding embodiment of figure 201st, the related description of step 202, step 203 and step 204, details are not described herein.

In some optional realization methods of the present embodiment, acquiring unit 401 can also be configured to obtain to be detected Infrared image, wherein, infrared image to be detected includes the image-region of target to be detected；Use can also be configured in extraction unit 402 In the third feature information that target to be detected is extracted from infrared image to be detected；Integrated unit 403 can also be configured to melt Close the third feature letter of the fisrt feature information of target to be detected, the second feature information of target to be detected and target to be detected Breath, obtains the fusion feature information of target to be detected.

In some optional realization methods of the present embodiment, extraction unit 402 can include：First extraction module (figure In be not shown), be configured to by coloured image to be detected be input in advance training the first convolutional neural networks, obtain to be detected The fisrt feature information of target, wherein, the first convolutional neural networks are used to extract the fisrt feature information of target；Second extraction mould Block (not shown) is configured to depth image to be detected being input to the second convolutional neural networks of training in advance, obtains The second feature information of target to be detected, wherein, the second convolutional neural networks are used to extract the second feature information of target.

In some optional realization methods of the present embodiment, extraction unit 402 can also include：Third extraction module (not shown) is configured to for infrared image to be detected to be input to third convolutional neural networks trained in advance, be treated The third feature information of target is detected, wherein, third convolutional neural networks are used to extract the third feature information of target.

In some optional realization methods of the present embodiment, target detection model can be obtained by following training step It arrives：Training sample is obtained, wherein, training sample includes sample coloured image, sample depth image, sample infrared image and sample The classification of this target and position, wherein, sample coloured image, sample depth image and sample infrared image include sample object Image-region；The fisrt feature information of sample object is extracted from sample coloured image, sample is extracted from sample depth image The second feature information of target, and from sample infrared image extract sample object third feature information；Merge sample object Fisrt feature information, the third feature information of the second feature information of sample object and sample object, obtain sample object Fusion feature information；Using the fusion feature information of sample object as input, using the classification of sample object and position as exporting, Training obtains target detection model.

Below with reference to Fig. 5, it illustrates suitable for being used for realizing the computer system 500 of the electronic equipment of the embodiment of the present application Structure diagram.Electronic equipment shown in Fig. 5 is only an example, to the function of the embodiment of the present application and should not use model Shroud carrys out any restrictions.

As shown in figure 5, computer system 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage section 508 and Perform various appropriate actions and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.

I/O interfaces 505 are connected to lower component：Importation 506 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.；Storage section 508 including hard disk etc.； And the communications portion 509 of the network interface card including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 510, as needed in order to be read from thereon Computer program be mounted into storage section 508 as needed.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product, including being carried on computer-readable medium On computer program, which includes for the program code of the method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 509 and/or from detachable media 511 are mounted.When the computer program is performed by central processing unit (CPU) 501, perform what is limited in the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer readable storage medium either the two arbitrarily combines.Computer readable storage medium for example can be --- but It is not limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor or arbitrary above combination. The more specific example of computer readable storage medium can include but is not limited to：Electrical connection with one or more conducting wires, Portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only deposit Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this application, computer readable storage medium can any be included or store The tangible medium of program, the program can be commanded the either device use or in connection of execution system, device.And In the application, computer-readable signal media can include the data letter propagated in a base band or as a carrier wave part Number, wherein carrying computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer-readable medium other than readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use In by instruction execution system, device either device use or program in connection.It is included on computer-readable medium Program code any appropriate medium can be used to transmit, including but not limited to：Wirelessly, electric wire, optical cable, RF etc., Huo Zheshang Any appropriate combination stated.

Can with one or more programming language or combinations come write for perform the application operation calculating Machine program code, described program design language include object-oriented programming language-such as Java, Smalltalk, C+ +, further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to perform on the user computer, partly perform, performed as an independent software package on the user computer, Part performs or performs on a remote computer or server completely on the remote computer on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including LAN (LAN) Or wide area network (WAN)-be connected to subscriber computer or, it may be connected to outer computer (such as utilizes Internet service Provider passes through Internet connection).

Flow chart and block diagram in attached drawing, it is illustrated that according to the system of the various embodiments of the application, method and computer journey Architectural framework in the cards, function and the operation of sequence product.In this regard, each box in flow chart or block diagram can generation The part of one module of table, program segment or code, the part of the module, program segment or code include one or more use In the executable instruction of logic function as defined in realization.It should also be noted that it in some implementations as replacements, is marked in box The function of note can also be occurred with being different from the sequence marked in attached drawing.For example, two boxes succeedingly represented are actually It can perform substantially in parallel, they can also be performed in the opposite order sometimes, this is depended on the functions involved.Also it to note Meaning, the combination of each box in block diagram and/or flow chart and the box in block diagram and/or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit can also be set in the processor, for example, can be described as：A kind of processor packet Include acquiring unit, extraction unit, integrated unit and detection unit.Wherein, the title of these units not structure under certain conditions The pairs of restriction of the unit in itself, for example, acquiring unit is also described as " obtaining coloured image to be detected and depth measurement to be checked Spend the unit of image ".

As on the other hand, present invention also provides a kind of computer-readable medium, which can be Included in electronic equipment described in above-described embodiment；Can also be individualism, and without be incorporated the electronic equipment in. Above computer readable medium carries one or more program, when said one or multiple programs are held by the electronic equipment During row so that the electronic equipment：Obtain coloured image to be detected and depth image to be detected, wherein, coloured image to be detected and Depth image to be detected includes the image-region of target to be detected；The first of target to be detected is extracted from coloured image to be detected Characteristic information, and extract from depth image to be detected the second feature information of target to be detected；Merge the of target to be detected The second feature information of one characteristic information and target to be detected obtains the fusion feature information of target to be detected；By mesh to be detected Target fusion feature information is input to target detection model trained in advance, obtains classification and the position of target to be detected, wherein, Target detection model is used to characterize the correspondence between the fusion feature information of target and the classification of target and position.

The preferred embodiment and the explanation to institute's application technology principle that above description is only the application.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to the technology that the specific combination of above-mentioned technical characteristic forms Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature The other technical solutions for arbitrarily combining and being formed.Such as features described above has similar work(with (but not limited to) disclosed herein The technical solution that the technical characteristic of energy is replaced mutually and formed.

Claims

1. one kind is used to detect mesh calibration method, including：

Coloured image to be detected and depth image to be detected are obtained, wherein, the coloured image to be detected and the depth measurement to be checked Degree image includes the image-region of target to be detected；

Extract the fisrt feature information of the target to be detected from the coloured image to be detected, and from the depth to be detected The second feature information of the target to be detected is extracted in image；

The fisrt feature information of the target to be detected and the second feature information of the target to be detected are merged, obtains described treat Detect the fusion feature information of target；

The fusion feature information of the target to be detected is input to target detection model trained in advance, is obtained described to be detected The classification of target and position, wherein, the target detection model is used to characterize the fusion feature information of target and the classification of target Correspondence between position.

2. according to the method described in claim 1, wherein,

In the fisrt feature information of the fusion target to be detected and the second feature information of the target to be detected, obtain Before the fusion feature information of the target to be detected, further include：

Infrared image to be detected is obtained, wherein, the infrared image to be detected includes the image-region of the target to be detected；

The third feature information of the target to be detected is extracted from the infrared image to be detected；

And the second feature information of the fisrt feature information and the target to be detected for merging the target to be detected, it obtains To the fusion feature information of the target to be detected, including：

Merge the fisrt feature information of the target to be detected, the second feature information of the target to be detected and described to be detected The third feature information of target obtains the fusion feature information of the target to be detected.

It is 3. described that the mesh to be detected is extracted from the coloured image to be detected according to the method described in claim 2, wherein Target fisrt feature information, and the second feature information of the target to be detected is extracted from the depth image to be detected, packet It includes：

The coloured image to be detected is input to the first convolutional neural networks of training in advance, obtains the target to be detected Fisrt feature information, wherein, first convolutional neural networks are used to extract the fisrt feature information of target；

The depth image to be detected is input to the second convolutional neural networks of training in advance, obtains the target to be detected Second feature information, wherein, second convolutional neural networks are used to extract the second feature information of target.

It is 4. described that the mesh to be detected is extracted from the infrared image to be detected according to the method described in claim 3, wherein Target third feature information, including：

The infrared image to be detected is input to third convolutional neural networks trained in advance, obtains the target to be detected Third feature information, wherein, the third convolutional neural networks are used to extract the third feature information of target.

5. according to the method described in one of claim 2-4, wherein, the target detection model is obtained by following training step It arrives：

Training sample is obtained, wherein, the training sample includes sample coloured image, sample depth image, sample infrared image Classification and position with sample object, wherein, the sample coloured image, the sample depth image and the infrared figure of the sample Image-region as including sample object；

The fisrt feature information of the sample object is extracted from the sample coloured image, is carried from the sample depth image The second feature information of the sample object is taken, and the third feature of the sample object is extracted from the sample infrared image Information；

Merge the fisrt feature information of the sample object, the second feature information of the sample object and the sample object Third feature information obtains the fusion feature information of the sample object；

Using the fusion feature information of the sample object as input, using the classification of the sample object and position as exporting, Training obtains the target detection model.

6. it is a kind of for detecting the device of target, including：

Acquiring unit is configured to obtain coloured image to be detected and depth image to be detected, wherein, the cromogram to be detected Picture and the depth image to be detected include the image-region of target to be detected；

Extraction unit is configured to extract the fisrt feature information of the target to be detected from the coloured image to be detected, And the second feature information of the target to be detected is extracted from the depth image to be detected；

Integrated unit is configured to merge the fisrt feature information of the target to be detected and the second spy of the target to be detected Reference ceases, and obtains the fusion feature information of the target to be detected；

Detection unit is configured to for the fusion feature information of the target to be detected to be input to target detection mould trained in advance Type obtains classification and the position of the target to be detected, wherein, the target detection model is used to characterize the fusion feature of target Correspondence between the classification and position of information and target.

7. device according to claim 6, wherein,

The acquiring unit is also configured to obtain infrared image to be detected, wherein, the infrared image to be detected includes described The image-region of target to be detected；

The extraction unit is also configured to extract the third feature of the target to be detected from the infrared image to be detected Information；

The integrated unit is also configured to merge the fisrt feature information of the target to be detected, the target to be detected The third feature information of second feature information and the target to be detected obtains the fusion feature information of the target to be detected.

8. device according to claim 7, wherein, the extraction unit includes：

First extraction module is configured to the coloured image to be detected being input to the first convolution nerve net of training in advance Network obtains the fisrt feature information of the target to be detected, wherein, first convolutional neural networks are used to extract the of target One characteristic information；

Second extraction module is configured to the depth image to be detected being input to the second convolution nerve net of training in advance Network obtains the second feature information of the target to be detected, wherein, second convolutional neural networks are used to extract the of target Two characteristic informations.

9. device according to claim 8, wherein, the extraction unit further includes：

Third extraction module is configured to for the infrared image to be detected to be input to third convolutional Neural net trained in advance Network obtains the third feature information of the target to be detected, wherein, the third convolutional neural networks are used to extract the of target Three characteristic informations.

10. according to the device described in one of claim 7-9, wherein, the target detection model is by following training step It obtains：

11. a kind of electronic equipment, including：

One or more processors；

Storage device, for storing one or more programs；

When one or more of programs are performed by one or more of processors so that one or more of processors are real The now method as described in any in claim 1-5.

12. a kind of computer readable storage medium, is stored thereon with computer program, wherein, the computer program is handled The method as described in any in claim 1-5 is realized when device performs.