CN110147753A

CN110147753A - The method and device of wisp in a kind of detection image

Info

Publication number: CN110147753A
Application number: CN201910410363.5A
Authority: CN
Inventors: 屈鸿; 杨昀欣; 涂铮铮; 沈晓峰
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-05-17
Filing date: 2019-05-17
Publication date: 2019-08-20

Abstract

The application provides a kind of method and device of wisp in detection image, for improving the accuracy of wisp in detection image.This method comprises: obtaining image to be processed；Network is generated according to candidate region trained in advance, different process of convolution are carried out repeatedly to image to be processed, treated that result carries out fusion treatment to multiple convolution results, generates the candidate region in image to be processed；Wherein, candidate region includes the wisp in image to be processed；Size normalization and super-resolution processing are carried out to candidate region, the candidate region after obtaining super-resolution processing；The feature of the neighborhood of feature and candidate region to the candidate region after super-resolution processing carries out fusion treatment, obtains fused provincial characteristics；Fused provincial characteristics is identified, the corresponding wisp information in candidate region is obtained；Wherein, wisp information includes the position of the generic and wisp of wisp in image to be processed.

Description

The method and device of wisp in a kind of detection image

Technical field

This application involves image procossing nerual network technique fields, and in particular to a kind of method of wisp in detection image And device.

Background technique

It being continued to develop as deep learning calculates, deep learning is increasingly significant in the effect that computer vision field plays, The deep neural network progress of making a breakthrough property, object detection field in the tasks such as image classification, target detection, semantic segmentation take It must be even more to obtain achievement highly visible.

But be usually the medium-sized or large-sized object in detection image in existing research, for wisp in image, Since the resolution ratio of wisp is lower, the feature of the wisp extracted is unobvious, the result accuracy for causing wisp to detect It is low.

Summary of the invention

The application provides a kind of method and device of wisp in detection image, for improving wisp in detection image Accuracy.

In a first aspect, providing a kind of method of wisp in detection image, comprising:

Obtain image to be processed；

Network is generated according to candidate region trained in advance, different process of convolution are carried out repeatedly to the image to be processed, Fusion treatment is carried out to multiple convolution results treated result, generates the candidate region in the image to be processed；Wherein, institute Stating candidate region includes the wisp in the image to be processed；

Size normalization and super-resolution processing are carried out to the candidate region, the candidate regions after obtaining super-resolution processing Domain；

The feature of feature and the neighborhood of the candidate region to the candidate region after the super-resolution processing is melted Conjunction processing, obtains fused provincial characteristics；

The fused provincial characteristics is identified, the corresponding wisp information in the candidate region is obtained；Wherein, The wisp information includes the position of the generic and the wisp of the wisp in the image to be processed.

In the embodiment of the present application, different process of convolution are carried out to image to be processed, due to the convolutional neural networks difference number of plies The difference of receptive field makes each layer of region-of-interest different, comprehensively considers and melts the feature of two different feeling open countries Symphysis is at fused provincial characteristics, so that fused provincial characteristics is more clear, is conducive to improve the accurate of testing result Property.And judged by the contextual information of candidate region, enable to fused provincial characteristics more comprehensive, into one Step improves the accuracy of detection.And superresolution processing is carried out to candidate region, so that the image after superresolution processing is more clear It is clear, further increase the accuracy of testing result.And super-resolution technique and feature are used when object classification and frame return Fusion substantially increases the accuracy rate that wisp detects under low resolution.Detection accuracy is improved in conjunction with various ways, ensure that Accuracy, the reliability of low-resolution detection.In short, fused provincial characteristics in the embodiment of the present application, ensure that and extract To the more accurate feature of wisp, so that testing result is more accurate.

Optionally, network is being generated according to candidate region trained in advance, different convolution is carried out repeatedly to image to be processed Processing carries out fusion treatment to multiple convolution results treated result, generate candidate region in the image to be processed it Before, comprising:

Obtain sample image；

Network is generated according to preset candidate region, N kind semanteme scale feature is carried out to the sample image and is extracted, is obtained Sample characteristics under N kind semanteme scale, N is positive integer；

Network is generated according to the preset candidate region, to M kind semanteme ruler in the sample characteristics of the N kind semanteme scale The sample areas feature of degree is merged, and obtains fused sample characteristics, and according to the fused sample characteristics, in advance The object information in the sample image is surveyed, the corresponding probability of current predictive result is obtained, M is the positive integer less than or equal to N；

According to the probability calculation loss function value, and according to the variation of loss function value, adjust the preset time Favored area generates the parameter of network, and until the loss function is restrained, acquisition candidate region trained in advance generates network.

In the embodiment of the present application, network is generated using trained candidate region in advance to generate candidate region, so that mentioning The candidate region taken has higher recall rate and accuracy rate, moreover it is possible to improve candidate region formation efficiency.

Optionally, according to the variation of loss function value, the parameter for generating network is adjusted, comprising:

According to the variation of the loss function value, the parameter for generating network is adjusted according to default downward gradient.

Optionally, network is being generated according to the candidate region, the sample areas feature of described M semantic scale is being carried out Fusion, before obtaining fused sample areas feature, comprising:

Sample areas feature under every two kinds in sample areas feature under N kind semanteme scale semantic scales is melted It closes, obtains multiple predictions and merge sample areas feature；

Confidence evaluation is carried out to the multiple prediction fusion sample areas feature, obtains multiple confidence levels；

With maximum corresponding two kinds semantic scales of confidence level for M kind semanteme scale.

In the embodiment of the present application, a variety of semantic scales are screened, select two kinds, on the one hand guarantee result after fusion Reliable rows, on the other hand can greatly reduce calculation amount.

Optionally, size normalization and super-resolution processing are carried out to the candidate region, after obtaining super-resolution processing Candidate region, comprising:

The candidate region is normalized, the candidate region after obtaining normalized；

Process of convolution is carried out to the candidate region after the normalized, obtains convolution processing result, and to described Candidate region after normalized carries out deconvolution processing, obtains deconvolution processing result；

Fusion treatment is carried out to the convolution processing result and deconvolution processing result, is obtained and the candidate area size Candidate region after identical super-resolution processing.

Optionally, the fused provincial characteristics is identified, obtains the corresponding wisp letter in the candidate region Breath, comprising:

The fused provincial characteristics is input in advance trained classification frame and predicts network, carry out frame prediction and Classification prediction, exports the corresponding wisp information in the candidate region.

Optionally, the neighborhood of the candidate region is respectively to expand in three times region to remove using the candidate region as center length and width The region of the candidate region.

Second aspect provides a kind of device of wisp in detection image, comprising:

Module is obtained, for obtaining image to be processed；

Processing module, the generation network trained in advance for basis, the processing image carry out repeatedly different process of convolution, Fusion treatment is carried out to multiple convolution results treated result, generates the candidate region in the image to be processed；Wherein, institute Stating candidate region includes the wisp in the image to be processed；

The processing module is also used to carry out size normalization and super-resolution processing to the candidate region, be surpassed Candidate region after resolution processes；

The processing module is also used to feature and the candidate region to the candidate region after the super-resolution processing Neighborhood feature carry out fusion treatment, obtain fused provincial characteristics；

The processing module is also used to identify the fused provincial characteristics, obtains the candidate region pair The wisp information answered；Wherein, the generic and the wisp that the wisp information includes the wisp are described Position in image to be processed.

Optionally, the acquisition module is also used to generating network according to candidate region trained in advance, to figure to be processed As carrying out repeatedly different process of convolution, to multiple convolution results, treated that result carries out fusion treatment, generates described to be processed Before candidate region in image, sample image is obtained；

The processing module, is also used to generate network according to preset candidate region, carries out N kind language to the sample image Adopted scale feature extracts, and obtains the sample characteristics under N kind semanteme scale, N is positive integer；

The processing module is also used to generate network according to the preset candidate region, to the N kind semanteme scale The sample areas feature of M kind semanteme scale is merged in sample characteristics, obtains fused sample characteristics, and according to described Fused sample characteristics predict the object information in the sample image, obtain the corresponding probability of current predictive result, and M is Positive integer less than or equal to N；

The processing module is also used to according to the probability calculation loss function value, and the change according to loss function value Change, adjusts the parameter that the preset candidate region generates network until loss function convergence and obtain time trained in advance Favored area generates network.

Optionally, the processing module is specifically used for:

Detailed description of the invention

Technical solution in ord to more clearly illustrate embodiments of the present application will make below to required in the embodiment of the present application Attached drawing is briefly described, it should be apparent that, attached drawing described below is only some embodiments of the present invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Fig. 1 provides the flow chart of a kind of method of wisp in detection image for the embodiment of the present application；

Fig. 2 provides a kind of process schematic of candidate region generation network processes image to be processed for the embodiment of the present application；

Fig. 3 provides the straightforward procedure figure of a kind of method of wisp in detection image for the embodiment of the present application

Fig. 4 provides a kind of process schematic of super-resolution processing for the embodiment of the present application；

Fig. 5 provides the effect diagram after a kind of image procossing to be processed for the embodiment of the present application；

Fig. 6 provides a kind of structure chart of the device of wisp in detection image for the embodiment of the present application.

Specific embodiment

To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described.

It needs whether to there is cigarette smoking to detect driver in public transport traffic, whether just be needed in classroom to student Often attend class detected and medical treatment in microfocal detect etc. in scene, in the picture due to detected object Less than 10%, existing image processing method causes to detect small object the area occupied it is difficult to extract the characteristics of image for arriving wisp The accuracy rate of body is relatively low.

In consideration of it, the embodiment of the present application provides a kind of method of wisp in detection image, this method is by detection image The device of wisp executes, and the device of wisp can be by having image processor (Graphics in detection image Processing Unit, GPU) equipment realize, equipment with GPU such as personal computer, server, video camera etc., this Application does not limit the concrete type of the device of wisp in detection image.

Fig. 1 is please referred to, the detailed process for executing this method to the device of wisp in detection image below is introduced.It should Method includes:

Step 11, image to be processed is obtained；

Step 12, network is generated according to candidate region trained in advance, image to be processed is carried out repeatedly at different convolution Reason carries out fusion treatment to multiple convolution results treated result, generates the candidate region in image to be processed；Wherein, it waits Favored area includes the wisp in image to be processed；

Step 13, size normalization and super-resolution processing are carried out to candidate region, the time after obtaining super-resolution processing Favored area；

Step 14, the feature of the neighborhood of the feature and candidate region of the candidate region after super-resolution processing is merged Processing, obtains fused provincial characteristics；

Step 15, fused provincial characteristics is identified, obtains the corresponding wisp information in candidate region；Wherein, Wisp information includes the position of the generic and wisp of wisp in image to be processed.

It describes in detail below to above-mentioned each step.

The device of wisp executes step 11 in detection image, i.e., first obtains image to be processed.

The device of wisp can be real-time acquisition in detection image, handle in real time, or pass through other Image Acquisition The image collected is sent to the device of wisp in detection image by equipment, is equivalent to the device of wisp in detection image Obtain image to be processed.

After obtaining image to be processed, step 12 is executed, i.e., network is generated according to candidate region trained in advance, treated It handles image and carries out repeatedly different process of convolution, treated that result carries out fusion treatment to multiple convolution results, generates wait locate Manage the candidate region in image.

Specifically, generating network using trained candidate region in advance in the embodiment of the present application, the neural network forecast is utilized Mark is except the candidate region in the image to be processed, in order to detect to wisp that may be present in candidate region.By Multiple convolution processing can be carried out to image to be processed by generating network in candidate region, and the corresponding receptive field of different process of convolution is not Together, so that the image information that stresses of each convolution results is also different, then each convolution results are merged, it is hereby achieved that The image being more clear is conducive to the accuracy for improving wisp in subsequent detection image.Wherein, in multiple convolution processing every time Process of convolution with repeatedly in other secondary process of convolution it is not exactly the same, it is different that difference is possibly including, but not limited to the convolution number of plies, Characteristics of image difference of convolution etc..

Wherein, it involving how to that candidate region is trained to generate network, the mode that training candidate region generates network has plants very much, Network development process is generated to training candidate region below to be illustrated.

A kind of mode that trained candidate region generates network is as follows:

Obtain sample image；

Specifically, the device of wisp first obtains a large amount of sample image in detection image, sample image includes the image In corresponding object category label and the corresponding location tags of object.Great amount of samples figure can be obtained by the manual standard of user Picture, can also be from existing data

Optionally, since the size of a large amount of sample image may be different, in order to improve trained accuracy, to sample graph As being normalized, the sample image of uniform sizes is obtained.Normalized for example, by using bilinear interpolation method, such as Using bilinear interpolation method by image sampling to 224 × 224.

The device of wisp is after obtaining sample image in detection image, and building candidate region generates network, to a large amount of Sample image carries out N kind semanteme scale feature, obtains the sample characteristics under N kind semanteme scale, can be under N kind semanteme scale Sample characteristics carry out fusion treatment, obtain fused sample characteristics, calculate the corresponding probability of fused sample characteristics, according to The probability calculation candidate region generates the loss function value of network training, and candidate region generation is constantly adjusted according to loss function value The parameter of network obtains preparatory in trained candidate region generation network, that is, step 12 until loss function convergence Trained candidate region generates network.Wherein, any two kinds of N kind semanteme scale kind semantic scales are all different.N kind semanteme ruler Degree includes but is not limited to one or more of image outline distribution, image grayscale distribution and image texture distribution.

As one embodiment, it is trained every time with scale feature semantic in N, calculation amount is larger, is reducing meter to the greatest extent While calculation amount, guarantee the accuracy of training result as far as possible, but not can determine that the knot which semantic scale feature merges out Fruit is best, therefore M kind semanteme scale feature can be selected from N kind semanteme scale feature to carry out Fusion training process.

There are many modes for screening M kind semanteme scale feature, is illustrated below,

Mode one:

Arbitrarily filter out M kind semanteme scale feature.

Mode two:

Sample areas feature under every two kinds in sample areas feature under N kind semanteme scale semantic scales is melted It closes, obtains multiple predictions and merge sample areas feature；Confidence evaluation is carried out to multiple predictions fusion sample areas feature, is obtained Multiple confidence levels；With maximum corresponding two kinds semantic scales of confidence level for M kind semanteme scale.

Specifically, being merged to every two kinds of sample areas feature under N kind semanteme scale, to every two kinds of fusion results Confidence evaluation is carried out, with the M kind semanteme scale that highest two kinds semantic scales of confidence evaluation alternatively go out, such one Come, while guaranteeing the accuracy of fusion results, moreover it is possible to greatly reduce the calculation amount in fusion process as far as possible.The characterization of confidence level There are many kinds of modes, such as can be characterized with probability, and there are many modes for calculating probability, such as common logistic function To calculate.

It, can be to the sample area of M kind semanteme scale after the sample areas feature for filtering out corresponding M kind semanteme scale Characteristic of field is merged, and fused sample characteristics are obtained, the object further according to fused sample characteristics, in forecast sample image Body information obtains the corresponding probability of current predictive result.

The loss function (being referred to as cost function) that network is generated according to the probability calculation candidate region, according to the damage The parameter that functional value adjustment candidate region generates network is lost, can arbitrarily be adjusted, network can also be generated in adjustment candidate region When, in order to improve training speed the parameter that candidate region generates network can not be adjusted according to default downward gradient.Until It is that trained candidate region generates network model in advance, such as makes that the corresponding candidate region of loss function convergence, which generates network, Decline adjusting parameter with Adam optimizer gradient.

It include that M kind semanteme scale feature extracts since trained candidate region generates in network in advance, naturally, when wait locate Reason image input is set the candidate region and is generated in network, which handles image and carry out multiple convolution processing, to multiple volume Product processing carries out fusion treatment, generates the corresponding candidate region of image to be processed.Candidate region may be multiple, multiple candidate regions Some possible candidate regions include wisp in domain, it is also possible to which some do not include wisp.

For example, referring to figure 2., image to be processed carries out semantic feature by Conv5_3 in core network VGG16 and handles It handles to obtain the second semantic processes to the first semantic processes result (512*38*38), and by Conv4_3 semantic feature (512*19*19) obtains the first semantic processes result (512*38*38) by process of convolution (convolution kernel is, for example, 3*3*512) To characteristic pattern A (512*38*38) shown in Fig. 2, (convolution kernel is handled by deconvolution to the second semantic processes (512*19*19) For example, 3*3*512) characteristic pattern B shown in Fig. 2 is obtained, operation is attached to characteristic pattern A and characteristic pattern B and obtains 1024* The eigenmatrix of 38*38, then fusion treatment is carried out to this feature matrix and obtains the fusion feature matrix of 512*38*38, according to pre- If the sliding window (being referred to as match window etc., size such as 3*3) of size traverses the fusion feature matrix, Mei Gete Sign point generates K (such as 9) candidate frame, finally can generate 38*38*K candidate region according to fusion feature.

The device of wisp executes step 13 after generating candidate region in detection image, i.e., carries out to candidate region Size normalization and super-resolution processing, the candidate region after obtaining super-resolution processing.

Specifically, first carrying out size normalized to candidate region, normalized can still be inserted using bilinearity Value method, normalized can amplify processing to candidate region.But it after amplifying to candidate region, may result in Noise or fuzzy situation, therefore super-resolution processing is carried out again to the candidate region after normalization.Wherein, super-resolution processing Also there are many kinds of modes.

A kind of mode:

To after normalized candidate region carry out process of convolution, obtain convolution processing result, and to normalization at Candidate region after reason carries out deconvolution processing, obtains deconvolution processing result；

Fusion treatment is carried out to convolution processing result and deconvolution processing result, is obtained identical with candidate area size super Candidate region after resolution processes.

Specifically, carrying out super-resolution processing to the candidate region after normalized using convolution-deconvolution network, obtain To, clarity higher region identical as candidate region size.Convolution-deconvolution network be also in advance it is trained, about instruction The process for practicing convolution-deconvolution network is referred to the content that the training candidate region discussed in step 12 generates network, this time It repeats no more.

Candidate region when candidate region has multiple, after multiple superresolution processings are corresponding with after natural superresolution processing.

For example, by image to be processed be Fig. 3 in a figure for, when a figure in Fig. 3 by candidate region generate network it Candidate region (shown in the b in Fig. 3) is obtained afterwards, the b in Fig. 3 is carried out to obtain institute in clearer Fig. 3 after superresolution processing Candidate region shown in the c shown.Fig. 3 is one kind simply signal to process, and limits real image treatment effect.

After the device of wisp executes step 13 in detection image, step 14 can be executed, i.e., to super-resolution processing The feature of the neighborhood of the feature and candidate region of candidate region afterwards carries out fusion treatment, obtains fused provincial characteristics.

Specifically, the feature of a candidate region after superresolution processing may include whole features of wisp, it can also It can include the Partial Feature of wisp, it is also possible to it does not include the feature of wisp, it can be by the candidate region after superresolution processing Feature neighborhood characteristics corresponding with the candidate region merged, with combine more image feature informations, obtain it is more complete The wisp characteristic image in face.The neighborhood characteristics of candidate region are also it will be further appreciated that the contextual information of the candidate region.

Should illustrate when, when having multiple candidate regions, the feature of each candidate region neighbour corresponding with the candidate region Characteristic of field is merged, corresponding to obtain K fused provincial characteristics when there is K candidate region.

As one embodiment, the neighborhood of candidate region is chosen excessive, can greatly increase calculation amount, but select too small Neighborhood, increased feature is very few, may be unable to reach preferable syncretizing effect, therefore, candidate region in the embodiment of the present application Expand three times along length direction centered on the neighborhood choice justice candidate region, expands in the region of three times along wide direction in addition to the time The region of favored area.

For example, referring to figure 4., super-resolution processing is passed through into the candidate region of image to be processed, then by superresolution processing Candidate region afterwards inputs convolutional layer, obtains the feature of candidate region, the neighborhood of the candidate region of figure to be processed is input to volume Lamination obtains the neighborhood characteristics of candidate region, is passing through full attended operation, is obtaining fused feature.

The device of wisp executes step 15, i.e., to fusion after obtaining fused provincial characteristics in detection image Provincial characteristics afterwards is identified, the corresponding wisp information in candidate region is obtained.

Specifically, fused provincial characteristics can be input to preparatory instruction after obtaining fused provincial characteristics The classification frame prediction network perfected, predicts that object information includes small object to the corresponding wisp information in the candidate region The location information in the picture of body classification and wisp.

The training process of classification frame prediction network is illustrated below.

The classification frame prediction network of building is obtained to the prediction result of sample image, calculates the probability of prediction result, root The corresponding loss function value of classification frame prediction network is obtained according to probability to be adjusted according to loss function value and Adam optimizer The parameter of whole classification frame prediction network is trained until the corresponding loss function convergence of classification frame prediction network Classification frame predict network.

After fused provincial characteristics is input in classification frame prediction network, the candidate regions can be accordingly generated The process of detection wisp is realized in the classification of the corresponding wisp in domain and the position of wisp.It is right when there is multiple candidate regions The wisp information of each wisp should be generated.There are many kinds of the forms of the wisp information of output, such as picture is tagged Form or aggregate form etc., are not specifically limited herein.

For example, continuing by taking Fig. 4 as an example, when fused provincial characteristics (is with complete in Fig. 4 by classification frame prediction network For connection network) after, export the classification of wisp and the position of wisp in the image to be processed.Referring to figure 5., to Image is handled as shown in a in Fig. 5, after by above-mentioned processing, exporting the wisp in image to be processed includes b in Fig. 5 Shown in the moon and bird, the position of wisp is as shown in b rectangle frame in Fig. 5.

In a kind of detection image discussed above on the basis of the method for wisp, the embodiment of the present application also provides one kind The device of wisp in detection image, the device are the device of wisp in the detection image discussed above, please refer to Fig. 6, should Device includes:

Module 601 is obtained, for obtaining image to be processed；

Processing module 602, for handling image and carrying out repeatedly different process of convolution according to generation network trained in advance, Fusion treatment is carried out to multiple convolution results treated result, generates the candidate region in image to be processed；Wherein, candidate regions Domain includes the wisp in image to be processed；

Processing module 602 is also used to carry out size normalization and super-resolution processing to candidate region, obtains super-resolution Treated candidate region；

Processing module 602 is also used to the neighborhood of the feature and candidate region of the candidate region after super-resolution processing Feature carries out fusion treatment, obtains fused provincial characteristics；

Processing module 602 is also used to identify fused provincial characteristics, obtains the corresponding wisp in candidate region Information；Wherein, wisp information includes the position of the generic and wisp of wisp in image to be processed.

Optionally, module 601 is obtained, is also used to generating network according to candidate region trained in advance, to figure to be processed As carrying out repeatedly different process of convolution, to multiple convolution results, treated that result carries out fusion treatment, generates image to be processed In candidate region before, obtain sample image；

Processing module 602, is also used to generate network according to preset candidate region, carries out N kind semanteme ruler to sample image Feature extraction is spent, obtains the sample characteristics under N kind semanteme scale, N is positive integer；

Processing module 602 is also used to generate network according to preset candidate region, to the sample characteristics of N kind semanteme scale The sample areas feature of middle M kind semanteme scale is merged, and obtains fused sample characteristics, and according to fused sample Feature, the object information in forecast sample image obtain the corresponding probability of current predictive result, and M is just whole less than or equal to N Number；

Processing module 602 is also used to according to probability calculation loss function value, and according to the variation of loss function value, is adjusted Whole preset candidate region generates the parameter of network, and until loss function is restrained, acquisition candidate region trained in advance generates net Network.

Optionally, processing module 602 is specifically used for: according to the variation of loss function value, adjusting according to default downward gradient Generate the parameter of network.

Optionally, processing module 602 is also used to:

Network is being generated according to candidate region, the sample areas feature of M semantic scale is being merged, after being merged Sample areas feature before, to the sample area under two kinds of semantic scales any in the sample areas feature under N kind semanteme scale Characteristic of field is merged, and is obtained multiple predictions and is merged sample areas feature；

Confidence evaluation is carried out to multiple predictions fusion sample areas feature, obtains multiple confidence levels；

Optionally, processing module 602 is specifically used for:

Candidate region is normalized, the candidate region after obtaining normalized；

Optionally, processing module 602 is specifically used for:

Fused provincial characteristics is input to classification frame trained in advance and predicts network, carries out frame prediction and classification Prediction, the corresponding wisp information in output candidate region.

Optionally, the neighborhood of candidate region is respectively to expand in three times region by center length and width of candidate region in addition to candidate regions The region in domain.

Reply explanation, Fig. 6 is illustrated with the software module in the device of wisp in detection image, actually Acquisition module and processing module in Fig. 6 can be realized by the processor in the device of wisp in detection image, be handled Device can be realized by CPU, integrated circuit etc..

The above, above embodiments are only described in detail to the technical solution to the application, but the above implementation The method that the explanation of example is merely used to help understand the embodiment of the present application, should not be construed as the limitation to the embodiment of the present application.This Any changes or substitutions that can be easily thought of by those skilled in the art, should all cover within the protection scope of the embodiment of the present application.

Claims

1. a kind of method of wisp in detection image characterized by comprising

Obtain image to be processed；

Network is generated according to candidate region trained in advance, different process of convolution are carried out repeatedly to the image to be processed, to more A convolution results treated result carries out fusion treatment, generates the candidate region in the image to be processed；Wherein, the time Favored area includes the wisp in the image to be processed；

Size normalization and super-resolution processing are carried out to the candidate region, the candidate region after obtaining super-resolution processing；

The feature of candidate region after the super-resolution processing is carried out merging place with the feature of the neighborhood of the candidate region Reason, obtains fused provincial characteristics；

The fused provincial characteristics is identified, the corresponding wisp information in the candidate region is obtained；Wherein, described Wisp information includes the position of the generic and the wisp of the wisp in the image to be processed.

2. the method as described in claim 1, which is characterized in that generating network according to candidate region trained in advance, treating It handles image and carries out repeatedly different process of convolution, to multiple convolution results treated result carries out fusion treatment, described in generation Before candidate region in image to be processed, comprising:

Obtain sample image；

Network is generated according to preset candidate region, N kind semanteme scale feature is carried out to the sample image and is extracted, N kind is obtained Sample characteristics under semantic scale, N are positive integer；

Network is generated according to the preset candidate region, to M kind semanteme scale in the sample characteristics of the N kind semanteme scale Sample areas feature is merged, and obtains fused sample characteristics, and according to the fused sample characteristics, predict institute The object information in sample image is stated, the corresponding probability of current predictive result is obtained, M is the positive integer less than or equal to N；

According to the probability calculation loss function value, and according to the variation of loss function value, adjust the preset candidate regions Domain generates the parameter of network, and until the loss function is restrained, acquisition candidate region trained in advance generates network.

3. method according to claim 2, which is characterized in that according to the variation of loss function value, adjust the generation network Parameter, comprising:

4. method as claimed in claim 3, which is characterized in that network is being generated according to the candidate region, to the M language The sample areas feature of adopted scale is merged, before obtaining fused sample areas feature, comprising:

Sample areas feature under every two kinds in sample areas feature under N kind semanteme scale semantic scales is merged, is obtained It obtains multiple predictions and merges sample areas features；

5. the method as described in claim 1-4 is any, which is characterized in that carry out size normalization to the candidate region and surpass Resolution processes, the candidate region after obtaining super-resolution processing, comprising:

Process of convolution is carried out to the candidate region after the normalized, obtains convolution processing result, and to the normalizing Change treated candidate region and carry out deconvolution processing, obtains deconvolution processing result；

Fusion treatment is carried out to the convolution processing result and deconvolution processing result, is obtained identical as the candidate area size Super-resolution processing after candidate region.

6. method as claimed in claim 5, which is characterized in that identified to the fused provincial characteristics, obtain institute State the corresponding wisp information in candidate region, comprising:

The fused provincial characteristics is input to classification frame trained in advance and predicts network, carries out frame prediction and classification Prediction, exports the corresponding wisp information in the candidate region.

7. method as claimed in claim 5, which is characterized in that the neighborhood of the candidate region, which is with the candidate region, is Heart length and width respectively expand in three times region in addition to the region of the candidate region.

8. the device of wisp in a kind of detection image characterized by comprising

Module is obtained, for obtaining image to be processed；

Processing module, for according to generation network trained in advance, the processing image to carry out repeatedly different process of convolution, to more A convolution results treated result carries out fusion treatment, generates the candidate region in the image to be processed；Wherein, the time Favored area includes the wisp in the image to be processed；

The processing module is also used to carry out size normalization and super-resolution processing to the candidate region, obtains super-resolution Rate treated candidate region；

The processing module is also used to the feature to the candidate region after the super-resolution processing and the neighbour of the candidate region The feature in domain carries out fusion treatment, obtains fused provincial characteristics；

The processing module is also used to identify the fused provincial characteristics, it is corresponding to obtain the candidate region Wisp information；Wherein, the wisp information include the wisp generic and the wisp described wait locate Manage the position in image.

9. device as claimed in claim 8, which is characterized in that

The acquisition module is also used to generating network according to candidate region trained in advance, carry out to image to be processed multiple Different process of convolution carry out fusion treatment to multiple convolution results treated result, generate the time in the image to be processed Before favored area, sample image is obtained；

The processing module, is also used to generate network according to preset candidate region, carries out N kind semanteme ruler to the sample image Feature extraction is spent, obtains the sample characteristics under N kind semanteme scale, N is positive integer；

The processing module is also used to generate network according to the preset candidate region, to the sample of the N kind semanteme scale The sample areas feature of M kind semanteme scale is merged in feature, obtains fused sample characteristics, and according to the fusion Sample characteristics afterwards, predict the object information in the sample image, obtain the corresponding probability of current predictive result, M be less than Or the positive integer equal to N；

The processing module is also used to according to the probability calculation loss function value, and according to the variation of loss function value, is adjusted The whole preset candidate region generates the parameter of network, until loss function convergence, obtains candidate regions trained in advance Domain generates network.

10. device as claimed in claim 9, which is characterized in that the processing module is specifically used for: