CN109165644A

CN109165644A - Object detection method and device, electronic equipment, storage medium, program product

Info

Publication number: CN109165644A
Application number: CN201810770381.XA
Authority: CN
Inventors: 李聪
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2018-07-13
Filing date: 2018-07-13
Publication date: 2019-01-08

Abstract

The embodiment of the present application discloses a kind of object detection method and device, electronic equipment, storage medium, program product, wherein method includes: the characteristic information based on image, obtains at least one area-of-interest；Based on the area-of-interest, corresponding at least two target area of the area-of-interest is obtained, the area-of-interest is within the target area；Based on the characteristic information of at least two target area, the corresponding object detection results of the area-of-interest are determined.More contextual informations can be fused in area-of-interest by the embodiment of the present application, improved the ability to express of area-of-interest, improved detection accuracy.

Description

Object detection method and device, electronic equipment, storage medium, program product

Technical field

This application involves computer vision technique, especially a kind of object detection method and device, electronic equipment, storage are situated between Matter, program product.

Background technique

With various higher and higher spaces, time, the lift-off of spectral resolution satellite launch, satellite-based remotely-sensed data is opened Beginning is widely applied to every field, significant increase information acquisition efficiency.Of interest various of people are obtained from remotely-sensed data Target information, all significant to multiple industries, especially military affairs, finance, security protection etc. is related to national security and economic development Etc. key factors.

Summary of the invention

A kind of target detection technique provided by the embodiments of the present application.

According to the one aspect of the embodiment of the present application, a kind of object detection method for providing, comprising:

Characteristic information based on image obtains at least one area-of-interest；

Based on the area-of-interest, corresponding at least two target area of the area-of-interest is obtained, the sense is emerging Interesting region is within the target area；

Based on the characteristic information of at least two target area, the corresponding target detection knot of the area-of-interest is determined Fruit.

Optionally, described to be based on the area-of-interest, obtain at least two target areas, comprising:

Enhanced processing at least twice is carried out to the area-of-interest, obtains at least two target areas.

Optionally, the characteristic information based at least two target area determines that the area-of-interest is corresponding Object detection results, comprising:

Feature connection processing is carried out to the characteristic information of at least two target area, obtains connection features；

Based on the connection features, the corresponding object detection results of the area-of-interest are determined.

Optionally, described to be based on the connection features, it determines the corresponding object detection results of the area-of-interest, wraps It includes:

Based on the connection features, the first testing result is obtained；

The characteristic information of characteristic information and described image based on the area-of-interest, obtains the second testing result；

Based on first testing result and second testing result, the corresponding target inspection of the area-of-interest is determined Survey result.

Optionally, described to be based on first testing result and second testing result, determine the area-of-interest Corresponding object detection results, comprising:

First testing result and second testing result are averaging processing, the area-of-interest pair is obtained The object detection results answered.

Optionally, the characteristic information based on image, before obtaining at least one area-of-interest, further includes:

Feature extraction is carried out to described image, obtains the characteristic information of described image.

Optionally, the method utilizes target detection neural fusion；

In the characteristic information based on image, before obtaining at least one area-of-interest, the method also includes:

Based on the sample image with markup information, the training target detection neural network.

Optionally, the target detection neural network includes feature extraction network, region of interesting extraction network, the first inspection Survey neural network and the second detection neural network；

It is described based on the sample image with markup information, the training target detection neural network, comprising:

Feature extraction processing is carried out to the sample image using the feature extraction network, obtains sample characteristics；

The sample characteristics are handled using the region of interesting extraction network, it is emerging to obtain at least one prediction sense The characteristic information of each prediction area-of-interest corresponding at least two prediction target area in interesting region；

Mesh is predicted to each prediction area-of-interest corresponding at least two using the first detection neural network The characteristic information in mark region is handled, and the first prediction result of each prediction area-of-interest is obtained；

First prediction result based on each prediction area-of-interest at least one described prediction area-of-interest With the markup information of the sample image, each prediction area-of-interest at least one described prediction area-of-interest is obtained First-loss；

Based on the first-loss of each prediction area-of-interest at least one described prediction area-of-interest, described in adjustment The network parameter of feature extraction network, the region of interesting extraction network and the first detection neural network；

Based on it is described at least one prediction area-of-interest in it is each prediction area-of-interest first-loss and it is described extremely A few prediction area-of-interest obtains the second loss；

Based on second loss, the network ginseng of the feature extraction network and the second detection neural network is adjusted Number.

Optionally, first damage based on each prediction area-of-interest at least one described prediction area-of-interest At least one prediction area-of-interest, obtains the second loss described in becoming estranged, comprising:

Based on the first-loss, screening obtains at least one error sense from least one described prediction area-of-interest Interest region；

Using the second detection neural network to the characteristic information of at least one error area-of-interest and described Sample characteristics are handled, and the second prediction of each error area-of-interest at least one described error area-of-interest is obtained As a result；

The second prediction result and institute based on each error area-of-interest at least one described error area-of-interest The corresponding markup information of sample image is stated, the second loss is obtained.

Optionally, described first based on each prediction area-of-interest at least one described prediction area-of-interest is pre- The markup information of result and the sample image is surveyed, it is interested to obtain each prediction at least one described prediction area-of-interest The first-loss in region, comprising:

The first prediction result and institute based on each prediction area-of-interest at least one described prediction area-of-interest The markup information of sample image is stated, the index normalization loss and the loss of thunder lattice of each prediction area-of-interest are obtained；

Based on it is described prediction area-of-interest the index normalization loss and the thunder lattice loss the sum of, determine described in Predict the first-loss of area-of-interest.

Optionally, described first based on each prediction area-of-interest at least one described prediction area-of-interest is pre- The markup information of result and the sample image is surveyed, it is interested to obtain each prediction at least one described prediction area-of-interest After the first-loss in region, further includes:

Based on the first prediction result of each prediction area-of-interest at least one described prediction area-of-interest, obtain The center loss of each prediction area-of-interest at least one described prediction area-of-interest；

The first-loss based on each prediction area-of-interest at least one described prediction area-of-interest, adjustment The network parameter of the feature extraction network, the region of interesting extraction network and the first detection neural network, comprising:

The first-loss and institute based on each prediction area-of-interest at least one described prediction area-of-interest Center loss is stated, the feature extraction network, the region of interesting extraction network and the first detection neural network are adjusted Network parameter.

Optionally, described the based on each prediction area-of-interest at least one described prediction area-of-interest One prediction result obtains the center loss of each prediction area-of-interest at least one described prediction area-of-interest, comprising:

First prediction result of each prediction area-of-interest at least one described prediction area-of-interest is gathered Class obtains cluster centre；

First prediction result based on each prediction area-of-interest at least one described prediction area-of-interest At a distance from the cluster centre, the center loss of each prediction area-of-interest is obtained.

Optionally, described screened from least one described prediction area-of-interest based on the first-loss is obtained at least One error area-of-interest, comprising:

The prediction sense for being more than default loss threshold value for first-loss described at least one described prediction area-of-interest is emerging Interesting region is determined as the error area-of-interest.

According to the one aspect of the embodiment of the present application, a kind of object detecting device for providing, comprising:

Region of interesting extraction unit obtains at least one area-of-interest for the characteristic information based on image；

It is corresponding at least to obtain the area-of-interest for being based on the area-of-interest for target area obtaining unit Two target areas, the area-of-interest are within the target area；

Result detection unit determines the region of interest for the characteristic information based at least two target area The corresponding object detection results in domain.

Optionally, the region of interesting extraction unit, for being carried out at amplification at least twice to the area-of-interest Reason obtains at least two target areas.

Optionally, the result detection unit includes:

Link block carries out feature connection processing for the characteristic information at least two target area, is connected Connect feature；

Module of target detection determines the corresponding target detection knot of the area-of-interest for being based on the connection features Fruit.

Optionally, the module of target detection obtains the first testing result for being based on the connection features；Based on institute The characteristic information of area-of-interest and the characteristic information of described image are stated, the second testing result is obtained；Based on first detection As a result with second testing result, the corresponding object detection results of the area-of-interest are determined.

Optionally, the module of target detection is being based on first testing result and second testing result, determines When the corresponding object detection results of the area-of-interest, for first testing result and second testing result into Row average treatment obtains the corresponding object detection results of the area-of-interest.

Optionally, further includes:

Feature extraction unit obtains the characteristic information of described image for carrying out feature extraction to described image.

Optionally, described device combining target detects neural fusion target detection；

Described device further include:

Network training unit, for based on the sample image with markup information, the training target detection neural network.

The network training unit, comprising:

Sample process module carries out feature extraction processing to the sample image using the feature extraction network, obtains Sample characteristics；The sample characteristics are handled using the region of interesting extraction network, obtain at least one prediction sense The characteristic information of each prediction area-of-interest corresponding at least two prediction target area in interest region；Utilize described first Neural network is detected to carry out the characteristic information of each prediction area-of-interest corresponding at least two prediction target area Processing obtains the first prediction result of each prediction area-of-interest；

First-loss module, for based on each prediction area-of-interest at least one described prediction area-of-interest The markup information of first prediction result and the sample image obtains each prediction at least one prediction area-of-interest The first-loss of area-of-interest；

First-loss training module, for based on each prediction region of interest at least one described prediction area-of-interest The first-loss in domain adjusts the feature extraction network, the region of interesting extraction network and the first detection nerve net The network parameter of network；

Second loss module, for based on each prediction area-of-interest at least one described prediction area-of-interest First-loss and at least one described prediction area-of-interest, obtain the second loss；

Second loss training module, for adjusting the feature extraction network and described second based on second loss Detect the network parameter of neural network.

Optionally, the second loss module is specifically used for feeling based on the first-loss from least one described prediction Screening obtains at least one error area-of-interest in interest region；Using the second detection neural network at least one institute The characteristic information and the sample characteristics for stating error area-of-interest are handled, and it is emerging to obtain at least one described described error sense Second prediction result of each error area-of-interest in interesting region；Based at least one described described error area-of-interest The second prediction result and the corresponding markup information of the sample image of each error area-of-interest obtain the second loss.

Optionally, the first-loss module is specifically used for based on each at least one described prediction area-of-interest It predicts the first prediction result of area-of-interest and the markup information of the sample image, obtains each prediction region of interest The index normalization loss and the loss of thunder lattice in domain；It is lost based on the index normalization for predicting area-of-interest and described The sum of thunder lattice loss determines the first-loss of the prediction area-of-interest.

Optionally, the network training unit, further includes:

Module is lost at center, for based on each prediction area-of-interest at least one described prediction area-of-interest First prediction result obtains the center loss of each prediction area-of-interest at least one described prediction area-of-interest；

The first-loss training module is specifically used for based on each prediction at least one described prediction area-of-interest The first-loss of area-of-interest and center loss, adjust the feature extraction network, the area-of-interest mentions Take the network parameter of network and the first detection neural network.

Optionally, module is lost at the center, is specifically used for each pre- at least one described prediction area-of-interest The first prediction result for surveying area-of-interest is clustered, and cluster centre is obtained；Based at least one described prediction region of interest It is emerging to obtain each prediction sense at a distance from the cluster centre for the first prediction result of each prediction area-of-interest in domain It loses at the center in interesting region.

Optionally, the second loss module is being based on the first-loss from least one described prediction area-of-interest When middle screening obtains at least one error area-of-interest, for first-loss at least one prediction area-of-interest by described in Prediction area-of-interest more than default loss threshold value is determined as the error area-of-interest.

According to the other side of the embodiment of the present application, a kind of electronic equipment provided, including processor, the processor Including object detecting device described in any one as above.

According to the other side of the embodiment of the present application, a kind of electronic equipment that provides, comprising: memory, for storing Executable instruction；

And processor, it is as above any one to complete that the executable instruction is executed for communicating with the memory The operation of the item object detection method.

According to the other side of the embodiment of the present application, a kind of computer readable storage medium provided, based on storing The instruction that calculation machine can be read, described instruction are performed the operation for executing object detection method described in any one as above.

According to the other side of the embodiment of the present application, a kind of computer program product provided, including it is computer-readable Code, when the computer-readable code is run in equipment, the processor in the equipment is executed for realizing such as taking up an official post The instruction of one object detection method of meaning.

According to another aspect of the embodiment of the present application, another computer program product provided is calculated for storing Machine readable instruction, described instruction is performed so that computer executes the inspection of target described in any of the above-described possible implementation The operation of survey method.

In an optional embodiment, the computer program product is specially computer storage medium, at another In optional embodiment, the computer program product is specially software product, such as SDK etc..

According to the embodiment of the present application also provides another object detection methods and device, electronic equipment, computer storage Medium, computer program product, wherein the characteristic information based on image obtains at least one area-of-interest；Based on interested Region obtains corresponding at least two target area of area-of-interest；Based on the characteristic information of at least two target areas, determine The corresponding object detection results of area-of-interest.

A kind of object detection method and device, electronic equipment, storage medium, journey provided based on the above embodiments of the present application Sequence product, the characteristic information based on image obtain at least one area-of-interest；Based on area-of-interest, region of interest is obtained Corresponding at least two target area in domain；Based on the characteristic information of at least two target areas, determine that area-of-interest is corresponding More contextual informations have been fused in area-of-interest by object detection results, improve the ability to express of area-of-interest, Improve detection accuracy.

Below by drawings and examples, the technical solution of the application is described in further detail.

Detailed description of the invention

The attached drawing for constituting part of specification describes embodiments herein, and together with description for explaining The principle of the application.

The application can be more clearly understood according to following detailed description referring to attached drawing, in which:

Fig. 1 is a flow diagram of the embodiment of the present application object detection method.

Fig. 2 is the structural schematic diagram that the embodiment of the present application obtains one specific example of connection features by way of more ponds.

Fig. 3 is that the embodiment of the present application training objective detects one exemplary structural schematic diagram of neural network.

Fig. 4 is that one of the embodiment of the present application object detection method applies exemplary diagram.

Fig. 5 is a structural schematic diagram of the embodiment of the present application object detecting device.

Fig. 6 is the structural representation suitable for the electronic equipment of the terminal device or server that are used to realize the embodiment of the present application Figure.

Specific embodiment

The various exemplary embodiments of the application are described in detail now with reference to attached drawing.It should also be noted that unless in addition having Body explanation, the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally The range of application.

Simultaneously, it should be appreciated that for ease of description, the size of various pieces shown in attached drawing is not according to reality Proportionate relationship draw.

Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the application And its application or any restrictions used.

Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as part of specification.

It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.

Since the remotely-sensed data of acquisition is usually influenced seriously by sensor and weather conditions, image quality difference is obvious, together When target size in image of interest is relatively small or dense arrangement.It causes to deposit even if especially consistent target In huge feature difference；And the actual size of target constrains the ability in feature extraction of algorithm, so traditional detection method It is difficult to adapt to the target identification under various scenes, recall rate is low, and there are serious false-alarms, is not able to satisfy practical application request.

Fig. 1 is a flow diagram of the embodiment of the present application object detection method.This method can be filled by target detection Execution, such as terminal device, server, etc. are set, the embodiment of the present application does not make the specific implementation of the object detecting device It limits.As shown in Figure 1, the embodiment method includes:

Step 110, based on the characteristic information of image, at least one area-of-interest is obtained.

Optionally, the characteristic information of image may include the characteristic pattern of image, carry out to the characteristic pattern of image to be processed Area-of-interest (Region of Interest, ROI) extracts, and obtains one or more area-of-interests.Area-of-interest pair Clarification of objective region should be may include, wherein in some implementations, can by the extraction of neural fusion ROI, The unlimited fixture body of the application obtains the structure of the neural network of area-of-interest, alternatively, can also pass through other machines study side Method realizes the extraction of ROI, and the embodiment of the present application does not limit this.

In some embodiments, the characteristic information of image can be obtained by image progress feature extraction processing, or Person, can also obtain the characteristic information of image from other equipment, such as the spy of image that server receiving terminal equipment is sent Reference breath, etc., the embodiment of the present application does not limit this.

Optionally, the image in the present embodiment can be remote sensing images, since the usual coverage of remote sensing images is larger, because This target of interest size in image is relatively small or dense arrangement, and directly carrying out target detection to remote sensing images can lead Cause recall rate low, there are serious false-alarms, are not able to satisfy practical application request.The present embodiment, will by obtaining area-of-interest It may include mesh target area of interest in remote sensing images to split from original image, highlight the position of target in the picture It sets.Alternatively, the embodiment of the present application also can be applied to target other lesser situations of proportion in the picture, the application is implemented Example does not limit this.

Step 120, it is based on area-of-interest, obtains corresponding at least two target area of area-of-interest.

Wherein, area-of-interest is within target area.

In some implementations, at least two target areas include area-of-interest and by area-of-interest into At least one magnification region that row enhanced processing obtains.Alternatively, at least two target areas include by area-of-interest into At least two magnification regions, etc. that row enhanced processing obtains, the embodiment of the present application does not limit this.

In one or more optional embodiments, enhanced processing at least twice is carried out to area-of-interest, is obtained at least Two target areas.

Belong to the partial region in image due to area-of-interest, it is possible to can exist target not or not exclusively feel it is emerging In interesting region the case where (target may a part overseas in region of interest), the case where in order to avoid omitting target, the present embodiment The characteristic information of bigger regional scope (target area after enhanced processing) is added, to guarantee to obtain the accurate of object detection results Property.Optionally, which can be carried out centered on the center of area-of-interest, that is to say, that the center of magnification region It is identical as the center of area-of-interest.Enhanced processing respectively corresponds different amplification factors at least twice for this, such as: it will be interested 1.2 times and 1.5 times are amplified in region respectively, obtain two target areas.It in some implementations, can also be by area-of-interest As a target area, it is understood that for area-of-interest is amplified 1 times of obtained target area.

The present embodiment is in order to more accurately obtain target position, using the context language of more ponds (Multi-Pool) Adopted information fusion is basic frame with the corresponding region of area-of-interest, which is amplified different multiples, so as to extract Partial information other than area-of-interest realizes extraction and fusion to content (content) information, such as: if when only Observe ROI region feature can not judgment object classification when, usual observation information in larger scope can obtain more acurrate Judgement.

Step 130, the characteristic information based at least two target areas determines the corresponding target detection knot of area-of-interest Fruit.

By combining the characteristic information of at least two target areas, more contextual informations are obtained, are conducive to mention inspection Survey the accuracy of result.

Wherein, the characteristic information of target area can be divided from the characteristic information of image and obtain.

Based on the above embodiments of the present application provide a kind of object detection method, the characteristic information based on image, obtain to A few area-of-interest；Based on area-of-interest, corresponding at least two target area of area-of-interest is obtained；Based at least The characteristic information of two target areas determines the corresponding object detection results of area-of-interest, more contextual informations is merged It has arrived in area-of-interest, has improved the ability to express of area-of-interest, improve detection accuracy.

In one or more optional embodiments, step 130 more accurate object detection results in order to obtain are optional Ground carries out feature connection processing to the characteristic information of at least two target areas, obtains connection features；Based on connection features, really Determine the corresponding object detection results of area-of-interest.It wherein, in some implementations, can be to the feature of multiple target areas Information carries out fusion treatment, or carry out channel superposition, etc., connection features are obtained, the embodiment of the present application is to connection processing Specific implementation is not construed as limiting.

It in some implementations, can also include: the spy at least two target areas before obtaining connection features Reference breath (such as: feature vector or characteristic pattern) it is handled, make the identical (example of size of each characteristic information obtained after processing Such as: the dimension of feature vector is identical or the size of characteristic pattern is identical)；Correspondingly, identical at least two target area of connection size The characteristic information in domain obtains connection features.

Fig. 2 is the schematic diagram that the embodiment of the present application obtains one specific example of connection features by way of more ponds.Such as Fig. 2 Shown, the smallest frame indicates area-of-interest in left-side images, amplifies different multiples based on area-of-interest and obtains 3 respectively Different size of target area, then down-sampling operation is carried out to 3 different size of target areas, it obtains and area-of-interest phase With the region of size；Or down-sampling operation is carried out to target area, interpolation (such as: bilinear interpolation) is carried out to area-of-interest Operation, to obtain the identical target area of size and area-of-interest；Based on the identical target area of size and area-of-interest Determine objective result.The present embodiment can be applied to the target detection of remote sensing image, since the target size in remote sensing image is smaller, The feature only extracted in many cases to target region can not well be described target, often make At very high error, using the feature for carrying out different scale after more ponds (Multi pool) to target region of interest (ROI) It extracts and merges, target peripheral part environmental information can be integrated into target signature so together, realized and context language The combination of adopted information, reduces error, improves target detection precision.

Optionally, connection features are based on, determine the corresponding object detection results of area-of-interest, comprising:

Based on connection features, the first testing result is obtained；

The characteristic information of characteristic information and image based on area-of-interest, obtains the second testing result；

Based on the first testing result and the second testing result, the corresponding object detection results of area-of-interest are determined.

It, can be by two neural networks respectively to connection features and interested in order to obtain more accurate object detection results The characteristic information in region and the characteristic information of image are handled, and the first testing result and the second testing result are obtained, specifically, Connection features can be handled by the first detection neural network, obtain the first testing result, pass through the second detection nerve Network handles the characteristic information of area-of-interest and the characteristic information of image, obtains the second testing result, it is then possible to Comprehensive first testing result and the second testing result obtain the object detection results of area-of-interest.For example, the first detection nerve Network is region convolutional neural networks (Region Convolutional Neural Network, RCNN), the second detection nerve Network is the full convolutional network (Region-based Fully Convolutional Network, RFCN) based on region, is led to It crosses RCNN and RFCN and carries out target detection respectively, obtain the first testing result and the second testing result, and integrate two networks The object detection results that output result obtains are more acurrate.

Optionally, obtain object detection results mode may include: to the first testing result and the second testing result into Row average treatment obtains the corresponding object detection results of area-of-interest.

Wherein, average treatment can include but is not limited to: computational geometry is average, calculate arithmetic mean, weighting is averaging, this Apply for the mode of unlimited fixture body average treatment；The first testing result is combined by the object detection results that average treatment obtains With the second testing result, false alarm rate is reduced while guaranteeing that height recalls precision, promotes detection accuracy.

It should be understood that can be applied to some or all area-of-interests of image, example to the description of area-of-interest above Such as, above-mentioned process flow can be carried out to each area-of-interest of image, the embodiment of the present application does not limit this.

In one or more optional embodiments, prior to step 110, can also include:

Feature extraction is carried out to image, obtains the characteristic information of image.

Feature extraction is carried out to image, can be and realized by convolutional neural networks or at least one convolutional layer etc., this Shen Please unlimited fixture body obtain the mode of characteristics of image, but in order to handle characteristics of image faster, optional structure is simpler, processing Faster neural network carries out feature extraction, realizes the lightweight network of whole network, promotes target detection neural network and increases Image-capable and processing speed.

In one or more optional embodiments, the present embodiment method utilizes target detection neural fusion, accordingly It ground can also be using with markup information before carrying out target detection to image to be processed using target detection neural network Sample image training objective detect neural network, at this point, using the markup information of sample image as supervise.Alternatively, can also be with Neural network is detected using unsupervised mode training objective, the embodiment of the present application does not limit this.

In some optional implementations, which includes feature extraction network, area-of-interest Extract network, the first detection neural network and the second detection neural network.Correspondingly, sample image, training characteristics can be based on Extract network, region of interesting extraction network, the first detection neural network and the second detection neural network.Alternatively, the target is examined Surveying neural network also may include other networks, and the embodiment of the present application does not limit this.

In the embodiment of the present application, the markup information of sample image can be concentrated based on sample image and passes through the target Detection neural network carries out the object detection results that target detection obtains to the sample image, adjusts target detection neural network Network parameter.For example, feature extraction network, region of interesting extraction network in adjustable target detection neural network, The network parameter of at least one network in first detection neural network and the second detection neural network.

Neural network needs to be trained before the use, so that it has more specific aim to current task, such as: for dividing Generic task, the sample image feature set based on known classification results are trained target detection neural network, the net after training Network can obtain more accurately predicting classification results.

In some implementations, target detection neural network to the processing mode of sample image can with to be processed The processing mode of image is identical.

In some instances, the feature extraction network that can use in target detection neural network carries out spy to sample image Extraction process is levied, sample characteristics are obtained；

Sample characteristics are inputted into the region of interesting extraction network in target detection neural network, obtain at least one prediction Area-of-interest and the characteristic information of each prediction area-of-interest corresponding at least two prediction target area；

Using the first detection neural network to each prediction area-of-interest corresponding at least two prediction target area Characteristic information is handled, and the first prediction result of each prediction area-of-interest is obtained；

The first prediction result and sample graph based on each prediction area-of-interest at least one prediction area-of-interest The markup information of picture obtains the first-loss of each prediction area-of-interest；

Based on the first-loss of each prediction area-of-interest at least one prediction area-of-interest, feature extraction is adjusted The network parameter of network, region of interesting extraction network and the first detection neural network.

In some embodiments, it is also based on the network parameter of first-loss adjustment the second detection neural network.Example Such as, it can be predicted based on the first-loss of each prediction area-of-interest at least one prediction area-of-interest at least one Area-of-interest obtains the second loss；Based on the second loss, the ginseng of feature extraction network and the second detection neural network is adjusted Number.

The first detection neural network and the second detection neural network are individually trained in the training process, sample image warp It crosses feature extraction network and the first detection neural network can be obtained first-loss, realized based on the first-loss to feature extraction The training of network and the first detection neural network；And the input of the second detection neural network is to combine prediction sense emerging with first-loss What interesting region determined, feature extraction network and the second detection neural network are trained by the second loss, the mesh after training Mark detection neural network first detects neural network and the second detection neural network can export more accurate first detection knot respectively Fruit and the second testing result, to obtain more accurate object detection results.

Optionally, the second loss is obtained based on first-loss and at least one prediction area-of-interest, comprising:

Based on first-loss, screening obtains at least one error area-of-interest from least one prediction area-of-interest；

Optionally, at least one is predicted that the prediction in area-of-interest in first-loss more than default penalty values is interested Region is determined as error area-of-interest.Alternatively, before at least one being predicted to, first-loss is maximum in area-of-interest One or more prediction area-of-interest is determined as error area-of-interest.

Judge whether the corresponding first-loss of prediction area-of-interest is larger by preset value, when first-loss is larger When, illustrate that the prediction area-of-interest is detected by first between the prediction result and annotation results that Processing with Neural Network obtains Difference is excessive, at this point, handling except through the first detection neural network the prediction area-of-interest, is also added into second Neural network is detected, is handled, is avoided only by the error area-of-interest that the second detection neural network obtains screening Slow problem was restrained caused by being handled by the first detection neural network prediction area-of-interest, accelerates training speed Degree.

It is carried out using characteristic information and sample characteristics of the second detection neural network at least one error area-of-interest Processing, obtains the second prediction result of at least one error area-of-interest；

Based on the second prediction result of at least one error area-of-interest and the markup information of sample image, second is obtained Loss.

Optionally, the present embodiment by online hard example excavate screening obtained from least one prediction area-of-interest to A few error area-of-interest.

It is according to defeated that online hard example, which excavates the core concept of OHEM (online hard example mining) algorithm, The loss for entering sample is screened, and is filtered out difficult sample (hard example), and expression is affected to classification and detection Then these samples that screening obtains are applied the training in stochastic gradient descent by sample.Being in actual operation will be original One area-of-interest network (ROI Network) is extended for two area-of-interest networks (ROI Network), the two ROI Network shared parameter.Wherein the ROI Network of front one is only preceding to operation, is mainly used for calculating loss；Below One ROI Network includes that forward and backward operation is calculated and lost simultaneously using difficult sample (hard example) as input Return gradient.

The present embodiment obtains the true institute's thoughts of forecasting inaccuracy from prediction area-of-interest by OHEM, based on first-loss Interest region obtains the second loss as error area-of-interest input the second detection neural network, with the second loss instruction Practice the second detection neural network and feature extraction network.

Fig. 3 is that the embodiment of the present application training objective detects one exemplary structural schematic diagram of neural network.As shown in figure 3, Using remote sensing image Small object as sample image feature, remote sensing image Small object is detected, network structure specifically includes that feature It extracts network (conv feature), region of interesting extraction network (ROI), Duo Chihua (Multi Pool) network, be based on area Neural network (RCNN) is detected in domain, online hard example excavates (OHEM) network and the full convolutional network (RFCN) based on region； RCNN and RFCN (wherein R refers to region (region)) is the classical architecture of two kinds of target detections, respectively corresponds above-mentioned implementation The first detection neural network and the second detection neural network in example；First by RCNN network to Preliminary detection to region of interest Domain carries out preliminary target category classification and boundary and returns (bounding-box regression)；It is then based on OHEM network Descending arrangement is carried out by classification, regression combination loss to preliminary classification result, selects the mistake that there is detection mistake in more maximum probability Poor area-of-interest；Finally using RFCN network in the more maximum probability selected exist detection mistake error area-of-interest into The further category classification of row and boundary return, and reduce the probability of detection mistake, strengthen the study to fuzzy difficult sample simultaneously Secondary detection is carried out to sample, is obviously improved target detection precision.

Optionally, based at least one prediction area-of-interest in it is each prediction area-of-interest the first prediction result and The markup information of sample image obtains the first-loss of each prediction area-of-interest at least one prediction area-of-interest, Include:

The first prediction result and sample graph based on each prediction area-of-interest at least one prediction area-of-interest The markup information of picture obtains the index normalization loss and the loss of thunder lattice of each prediction area-of-interest；

The sum of index normalization loss and the loss of thunder lattice based on prediction area-of-interest, determine prediction area-of-interest First-loss.

Optionally, index normalization loss (softmax loss) is obtained respectively based on the first prediction result and annotation results (reg loss) is lost with thunder lattice；Based on the sum of softmax loss and reg loss, first-loss is determined；In this implementation for Each prediction area-of-interest obtains an a softmax loss and reg loss, can get each prediction region of interest The first-loss in domain adjusts feature extraction network, region of interesting extraction network and the first inspection by each first-loss respectively Survey the network parameter of neural network.

Softmax loss and reg loss is obtained respectively based on the first prediction result and label target result to belong to now Have technology, the present embodiment by the way that trained speed can be accelerated using the sum of softmax loss and reg loss as first-loss, And it is more preferable to the training effect of feature extraction network, region of interesting extraction network and the first detection neural network.

Optionally, based at least one prediction area-of-interest in it is each prediction area-of-interest the first prediction result and The markup information of sample image, obtain at least one prediction area-of-interest in it is each prediction area-of-interest first-loss it Afterwards, further includes:

Based on the first prediction result of each prediction area-of-interest at least one prediction area-of-interest, obtain at least The center loss of each prediction area-of-interest in one prediction area-of-interest；

Specifically, the first prediction result of each prediction area-of-interest at least one prediction area-of-interest is carried out Cluster obtains cluster centre；

In the first prediction result and cluster based on each prediction area-of-interest at least one prediction area-of-interest The distance of the heart obtains the center loss of each prediction area-of-interest.

At least one prediction area-of-interest respectively correspond at least first prediction result, by by least one first Prediction result is clustered, it is known that difference between the obtained prediction result of difference prediction area-of-interests (it is determined by distance, Such as: COS distance, Euclidean distance etc.), using the difference as in multitask supervised learning cooperate first-loss it is increased in The heart loses (center loss) makes different classes of target guarantee the same of class inherited in assorting process to realize supervision When reduce difference in class, improve target detection precision.

Optionally, the first-loss based on each prediction area-of-interest at least one prediction area-of-interest, adjustment The network parameter of feature extraction network, region of interesting extraction network and the first detection neural network, comprising:

First-loss and center loss based on each prediction area-of-interest at least one prediction area-of-interest, are adjusted The parameter of whole feature extraction network, region of interesting extraction network and the first detection neural network.

Optionally, the sum lost based on first-loss and center, adjusts feature extraction network, region of interesting extraction network With the parameter of the first detection neural network.

It can accelerate the convergence rate of network by the loss of increase center, optionally, center is lost, it is pre- constantly to reduce The center loss for surveying area-of-interest is target, that is, the first prediction result of same class categories is made to have similar characteristics, Realize the reduction of difference in class.By losing (center loss) and trained network with first-loss and center, can be improved The accuracy rate of feature extraction network, region of interesting extraction network and the first detection neural network to classification.

In one or more optional embodiments, screened from least one prediction area-of-interest based on first-loss Obtain at least one error area-of-interest, comprising:

At least one is predicted that first-loss is more than that the prediction area-of-interest of default loss threshold value is true in area-of-interest It is set to error area-of-interest.

The present embodiment can excavate OHEM (online hard example mining) algorithm by online hard example and realize, The prediction area-of-interest (difficult sample (hard example)) that there is detection mistake in more maximum probability is selected by OHEM, and It determines which prediction area-of-interest has detection mistake in more maximum probability, in this implementation is determined by first-loss, Such as: the corresponding prediction area-of-interest of the bigger explanation of first-loss more likely there are detection mistake, can be interested by the prediction Region is determined as error area-of-interest；It is finally emerging to the sense that there is detection mistake in the more maximum probability selected using RFCN network Interesting region is detected, and target detection precision is obviously improved.

Fig. 4 is that one of the embodiment of the present application object detection method applies exemplary diagram.As shown in figure 4, to a remote sensing shadow Picture, by the object detection method based on target detection neural network described in any of the above-described embodiment of the application detect to obtain to A few detection block position, realization detect the position of the aircraft in the remote sensing image, and each frame indicates an inspection in figure Survey result.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed；And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light The various media that can store program code such as disk.

Fig. 5 is a structural schematic diagram of the embodiment of the present application object detecting device.The device of the embodiment can be used for reality The existing above-mentioned each method embodiment of the application.As shown in figure 5, the device of the embodiment includes:

Region of interesting extraction unit 51 obtains at least one area-of-interest for the characteristic information based on image.

Target area obtaining unit 52 obtains corresponding at least two mesh of area-of-interest for being based on area-of-interest Mark region.

Wherein, area-of-interest is within target area.

Optionally, which can be carried out centered on the center of area-of-interest, that is to say, that magnification region Center is identical as the center of area-of-interest.Enhanced processing respectively corresponds different amplification factors at least twice for this, such as: it will feel 1.2 times and 1.5 times are amplified in interest region respectively, obtain two target areas.It in some implementations, can also will be interested Region is as a target area, it is understood that for area-of-interest is amplified 1 times of obtained target area.

Result detection unit 53 determines that area-of-interest is corresponding for the characteristic information based at least two target areas Object detection results.

Based on a kind of object detecting device that the above embodiments of the present application provide, more contextual informations sense has been fused to In interest region, the ability to express of area-of-interest is improved, detection accuracy is improved.

In one or more optional embodiments, in order to obtain as a result more accurate object detection results are optionally examined Survey unit 53, comprising: link block carries out feature connection processing for the characteristic information at least two target areas, obtains Connection features；Module of target detection determines the corresponding object detection results of area-of-interest for being based on connection features.Wherein, In some implementations, fusion treatment can be carried out to the characteristic information of multiple target areas, or carries out channel superposition, etc. Deng obtaining connection features, the embodiment of the present application is not construed as limiting the specific implementation of connection processing.

Optionally, module of target detection obtains the first testing result for being based on connection features；Based on area-of-interest Characteristic information and image characteristic information, obtain the second testing result；Based on the first testing result and the second testing result, really Determine the corresponding object detection results of area-of-interest.

Optionally, module of target detection is being based on the first testing result and the second testing result, determines area-of-interest pair When the object detection results answered, for the first testing result and the second testing result to be averaging processing, region of interest is obtained The corresponding object detection results in domain.

In one or more optional embodiments, further includes:

Feature extraction unit obtains the characteristic information of image for carrying out feature extraction to image.

In one or more optional embodiments, device combining target detects neural fusion target detection；

The present embodiment device further include:

Network training unit, for based on the sample image with markup information, training objective to detect neural network.This When, using the markup information of sample image as supervision.Alternatively, nerve net can also be detected using unsupervised mode training objective Network, the embodiment of the present application do not limit this.

In some optional implementations, target detection neural network includes that feature extraction network, area-of-interest mention Take network, the first detection neural network and the second detection neural network；

Network training unit, comprising:

Sample process module carries out feature extraction processing to sample image using feature extraction network, obtains sample characteristics； Sample characteristics are handled using region of interesting extraction network, obtain each prediction at least one prediction area-of-interest The characteristic information of area-of-interest corresponding at least two prediction target area；Using the first detection neural network to each prediction The characteristic information of area-of-interest corresponding at least two prediction target area is handled, and each prediction area-of-interest is obtained The first prediction result；

First-loss module, for first based on each prediction area-of-interest at least one prediction area-of-interest The markup information of prediction result and sample image obtains each prediction area-of-interest at least one prediction area-of-interest First-loss；

First-loss training module, for based on each prediction area-of-interest at least one prediction area-of-interest First-loss, the network parameter of adjustment feature extraction network, region of interesting extraction network and the first detection neural network.

In some embodiments, network training unit can also include: the second loss module, for being based at least one It predicts the first-loss of each prediction area-of-interest and at least one prediction area-of-interest in area-of-interest, obtains second Loss；

Second loss training module, for adjusting feature extraction network and the second detection neural network based on the second loss Network parameter.

Optionally, the second loss module, specifically for being sieved from least one prediction area-of-interest based on first-loss Choosing obtains at least one error area-of-interest；Using the second detection neural network to the spy of at least one error area-of-interest Reference breath and sample characteristics are handled, and second of each error area-of-interest at least one error area-of-interest is obtained Prediction result；The second prediction result and sample based on error area-of-interest each in error area-of-interest described at least one The corresponding markup information of this image obtains the second loss.

Optionally, first-loss module is specifically used for emerging based on prediction sense each at least one prediction area-of-interest First prediction result in interesting region and the markup information of sample image, the index for obtaining each prediction area-of-interest normalize damage Thunder lattice of becoming estranged loss；The sum of index normalization loss and the loss of thunder lattice based on prediction area-of-interest, determine that prediction is interested The first-loss in region.

Optionally, network training unit, further includes:

Module is lost at center, for first based on each prediction area-of-interest at least one prediction area-of-interest Prediction result obtains the center loss of each prediction area-of-interest at least one prediction area-of-interest；

First-loss training module is specifically used for based on each prediction region of interest at least one prediction area-of-interest The first-loss in domain and center loss, adjustment feature extraction network, region of interesting extraction network and the first detection neural network Network parameter.

Optionally, module is lost at center, is specifically used for interested in each prediction at least one prediction area-of-interest First prediction result in region is clustered, and cluster centre is obtained；Based on each prediction at least one prediction area-of-interest First prediction result of area-of-interest obtains the center loss of each prediction area-of-interest at a distance from cluster centre.

Optionally, second loss module based on first-loss from least one prediction area-of-interest in screening obtain to When a few error area-of-interest, at least one to be predicted that first-loss is more than default loss threshold value in area-of-interest Prediction area-of-interest be determined as error area-of-interest.

According to the other side of the embodiment of the present application, a kind of electronic equipment provided, including processor, the processor Including object detecting device described in any one embodiment as above.

And processor, it is as above any one to complete that the executable instruction is executed for communicating with the memory The operation of object detection method described in embodiment.

According to the other side of the embodiment of the present application, a kind of computer readable storage medium provided, based on storing The instruction that calculation machine can be read, described instruction are performed the operation for executing object detection method described in as above any one embodiment.

According to the other side of the embodiment of the present application, a kind of computer program product provided, including it is computer-readable Code, when the computer-readable code is run in equipment, the processor in the equipment is executed for realizing such as taking up an official post It anticipates the instruction of object detection method described in an embodiment.

In one or more optional embodiments, the embodiment of the present application also provides a kind of productions of computer program program Product, for storing computer-readable instruction, described instruction is performed so that computer executes any of the above-described possible realization side The operation of object detection method described in formula.

The computer program product can be realized especially by hardware, software or its mode combined.In an alternative embodiment In son, the computer program product is embodied as computer storage medium, in another optional example, the computer Program product is embodied as software product, such as software development kit (Software Development Kit, SDK) etc..

According to the embodiment of the present application also provides object detection methods and device, electronic equipment, computer storage medium, meter Calculation machine program product, wherein the characteristic information based on image obtains at least one area-of-interest；Based on area-of-interest, obtain Obtain corresponding at least two target area of area-of-interest；Based on the characteristic information of at least two target areas, determine interested The corresponding object detection results in region.

In some embodiments, target detection instruction can be specially call instruction, and first device can pass through calling Mode indicate second device performance objective detect, accordingly, in response to call instruction is received, second device can be executed State the step and/or process in any embodiment in object detection method.

It should be understood that the terms such as " first " in the embodiment of the present application, " second " are used for the purpose of distinguishing, and be not construed as Restriction to the embodiment of the present application.

It should also be understood that in this application, " multiple " can refer to two or more, "at least one" can refer to one, Two or more.

It should also be understood that clearly being limited or no preceding for the either component, data or the structure that are referred in the application In the case where opposite enlightenment given hereinlater, one or more may be generally understood to.

It should also be understood that the application highlights the difference between each embodiment to the description of each embodiment, Same or similar place can be referred to mutually, for sake of simplicity, no longer repeating one by one.

The embodiment of the present application also provides a kind of electronic equipment, such as can be mobile terminal, personal computer (PC), put down Plate computer, server etc..Below with reference to Fig. 6, it illustrates the terminal device or the services that are suitable for being used to realize the embodiment of the present application The structural schematic diagram of the electronic equipment 600 of device: as shown in fig. 6, electronic equipment 600 includes one or more processors, communication unit For example Deng, one or more of processors: one or more central processing unit (CPU) 601, and/or one or more figures As processor (GPU) 613 etc., processor can according to the executable instruction being stored in read-only memory (ROM) 602 or from Executable instruction that storage section 608 is loaded into random access storage device (RAM) 603 and execute various movements appropriate and place Reason.Communication unit 612 may include but be not limited to network interface card, and the network interface card may include but be not limited to IB (Infiniband) network interface card.

Processor can with communicate in read-only memory 602 and/or random access storage device 603 to execute executable instruction, It is connected by bus 604 with communication unit 612 and is communicated through communication unit 612 with other target devices, to completes the application implementation The corresponding operation of any one method that example provides, for example, the characteristic information based on image, obtains at least one area-of-interest； Based on area-of-interest, obtain corresponding at least two target area of area-of-interest, area-of-interest be in target area it It is interior；Based on the characteristic information of at least two target areas, the corresponding object detection results of area-of-interest are determined.

In addition, in RAM 603, various programs and data needed for being also stored with device operation.CPU601,ROM602 And RAM603 is connected with each other by bus 604.In the case where there is RAM603, ROM602 is optional module.RAM603 storage Executable instruction, or executable instruction is written into ROM602 at runtime, executable instruction executes central processing unit 601 The corresponding operation of above-mentioned communication means.Input/output (I/O) interface 605 is also connected to bus 604.Communication unit 612 can integrate Setting, may be set to be with multiple submodule (such as multiple IB network interface cards), and in bus link.

I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 608 including hard disk etc.； And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon Computer program be mounted into storage section 608 as needed.

It should be noted that framework as shown in FIG. 6 is only a kind of optional implementation, it, can root during concrete practice The component count amount and type of above-mentioned Fig. 6 are selected, are deleted, increased or replaced according to actual needs；It is set in different function component It sets, separately positioned or integrally disposed and other implementations, such as the separable setting of GPU613 and CPU601 or can also be used GPU613 is integrated on CPU601, the separable setting of communication unit, can also be integrally disposed on CPU601 or GPU613, etc.. These interchangeable embodiments each fall within protection scope disclosed in the present application.

Particularly, according to an embodiment of the present application, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiments herein includes a kind of computer program product comprising be tangibly embodied in machine readable Computer program on medium, computer program include the program code for method shown in execution flow chart, program code It may include the corresponding instruction of corresponding execution method and step provided by the embodiments of the present application, for example, the characteristic information based on image, obtains To at least one area-of-interest；Based on area-of-interest, corresponding at least two target area of area-of-interest is obtained, is felt emerging Interesting region is within target area；Based on the characteristic information of at least two target areas, the corresponding mesh of area-of-interest is determined Mark testing result.In such embodiments, which can be downloaded and be pacified from network by communications portion 609 Dress, and/or be mounted from detachable media 611.When the computer program is executed by central processing unit (CPU) 601, execute The operation for the above-mentioned function of being limited in the present processes.

The present processes and device may be achieved in many ways.For example, can by software, hardware, firmware or Software, hardware, firmware any combination realize the present processes and device.The said sequence of the step of for the method Merely to be illustrated, the step of the present processes, is not limited to sequence described in detail above, special unless otherwise It does not mentionlet alone bright.In addition, in some embodiments, also the application can be embodied as to record program in the recording medium, these programs Including for realizing according to the machine readable instructions of the present processes.Thus, the application also covers storage for executing basis The recording medium of the program of the present processes.

The description of the present application is given for the purpose of illustration and description, and is not exhaustively or by the application It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches Embodiment is stated and be the principle and practical application in order to more preferably illustrate the application, and those skilled in the art is enable to manage Solution the application is to design various embodiments suitable for specific applications with various modifications.

Claims

1. a kind of object detection method characterized by comprising

Based on the area-of-interest, corresponding at least two target area of the area-of-interest, the region of interest are obtained Domain is within the target area；

Based on the characteristic information of at least two target area, the corresponding object detection results of the area-of-interest are determined.

2. obtaining at least two the method according to claim 1, wherein described be based on the area-of-interest Target area, comprising:

3. method according to claim 1 or 2, which is characterized in that the spy based at least two target area Reference breath, determines the corresponding object detection results of the area-of-interest, comprising:

4. according to the method described in claim 3, it is characterized in that, it is described be based on the connection features, determine described interested The corresponding object detection results in region, comprising:

Based on the connection features, the first testing result is obtained；

Based on first testing result and second testing result, the corresponding target detection knot of the area-of-interest is determined Fruit.

5. according to the method described in claim 4, it is characterized in that, described examined based on first testing result with described second It surveys as a result, determining the corresponding object detection results of the area-of-interest, comprising:

First testing result and second testing result are averaging processing, it is corresponding to obtain the area-of-interest Object detection results.

6. a kind of object detecting device characterized by comprising

Target area obtaining unit obtains the area-of-interest corresponding at least two for being based on the area-of-interest Target area, the area-of-interest are within the target area；

Result detection unit determines the area-of-interest pair for the characteristic information based at least two target area The object detection results answered.

7. a kind of electronic equipment, which is characterized in that including processor, the processor includes target inspection as claimed in claim 6 Survey device.

8. a kind of electronic equipment characterized by comprising memory, for storing executable instruction；

And processor, for being communicated with the memory to execute the executable instruction to complete claim 1 to 5 times The operation of one object detection method of meaning.

9. a kind of computer readable storage medium, for storing computer-readable instruction, which is characterized in that described instruction quilt Perform claim requires the operation of object detection method described in 1 to 5 any one when execution.

10. a kind of computer program product, including computer-readable code, which is characterized in that when the computer-readable code When running in equipment, the processor in the equipment is executed for realizing target detection described in claim 1 to 5 any one The instruction of method.