CN109255352A - Object detection method, apparatus and system - Google Patents
Object detection method, apparatus and system Download PDFInfo
- Publication number
- CN109255352A CN109255352A CN201811049034.4A CN201811049034A CN109255352A CN 109255352 A CN109255352 A CN 109255352A CN 201811049034 A CN201811049034 A CN 201811049034A CN 109255352 A CN109255352 A CN 109255352A
- Authority
- CN
- China
- Prior art keywords
- feature
- information
- layer
- fisrt feature
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/255—Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of object detection methods, apparatus and system, are related to field of artificial intelligence, this method comprises: obtaining target image to be detected;Feature extraction is carried out to target image, generates fisrt feature figure;Wherein, fisrt feature figure includes the characteristic information of different scale;Region candidate identification is carried out to fisrt feature figure, obtains the candidate region information of target image;According to candidate region information and fisrt feature figure, testing result is generated;The testing result includes target category and/or target position in target image.The present invention can effectively promote detection effect.
Description
Technical field
The present invention relates to field of artificial intelligence, more particularly, to a kind of object detection method, apparatus and system.
Background technique
Target detection (Object Detection) is the very important task of one kind in computer vision, is such as people
The basis of many complicated visual tasks such as face detection, target following, example segmentation.Existing object detection method is mostly based on volume
Product neural fusion, can detecte out the object category for including in image, can also orient target object in the picture
Position is widely used to the fields such as security system, traffic system.It is understood that object detection results pair
Each application is of great significance, and the detection effect of existing object detection method is bad.
Summary of the invention
In view of this, can preferably be mentioned the purpose of the present invention is to provide a kind of object detection method, apparatus and system
Rise detection effect.
To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:
In a first aspect, the embodiment of the invention provides a kind of object detection methods, comprising: obtain target figure to be detected
Picture;Feature extraction is carried out to the target image, generates fisrt feature figure;Wherein, the fisrt feature figure includes different rulers
The characteristic information of degree;Region candidate identification is carried out to the fisrt feature figure, obtains the candidate region information of the target image;
According to multiple candidate regions and the fisrt feature figure, testing result is generated;The testing result includes the target figure
Target category and/or target position as in.
Further, the step of acquisition target image to be detected, comprising: obtain initial pictures to be detected;To institute
It states initial pictures to be pre-processed, obtains target image;Wherein, the pretreatment includes whitening operation.
Further, described the step of feature extraction is carried out to the target image, generates fisrt feature figure, comprising: by institute
It states target image and is input to base neural network;Multistage feature is carried out to the target image by the base neural network
It extracts, obtains the characteristic information of different scale;Wherein, the scale for the characteristic information that each stage extracts is different;By multiple fingers
Determine stage corresponding feature fusion and forms fisrt feature figure.
Further, the described the step of corresponding feature fusion of multiple specified phases is formed into fisrt feature figure, comprising:
Obtain the fisrt feature information that the penultimate stage of the base neural network extracts;Obtain the base neural network
The second feature information that the last stage extracts;Global pool operation is carried out to the second feature information, obtains third spy
Reference breath;Enhance network by context to believe the fisrt feature information, the second feature information and the third feature
Breath fusion forms fisrt feature figure.
Further, the context enhancing network includes the first parallel convolutional layer, the second convolutional layer and third convolutional layer;
Wherein, the output end of second convolutional layer is also connected with up-sampling operation layer, and the output end of the third convolutional layer is also connected with
There is broadcast operation layer;The output end and the broadcast operation layer of the output end of first convolutional layer, the up-sampling operation layer
Output end be connected with add operation layer jointly.
Further, described that network is enhanced for the fisrt feature information, the second feature information and institute by context
It states third feature information and merges the step of forming fisrt feature figure, comprising: by the fisrt feature information input to described first
Convolutional layer, by the second feature information input to second convolutional layer, and by the third feature information input to institute
State third convolutional layer;Convolution operation is carried out to the fisrt feature information by first convolutional layer, obtains that there is specified ruler
The fisrt feature information of degree;By second convolutional layer and the up-sampling operation layer successively to the second feature information into
Row convolution operation and up-sampling operation, obtain the second feature information with the specified scale;Pass through the third convolutional layer
Convolution operation and broadcast operation successively are carried out to the third feature information with the broadcast operation layer, obtain having described specified
The third feature information of scale;By the add operation layer to the specified scale fisrt feature information, have institute
It states the second feature information of specified scale and the third feature information with the specified scale sums up, form first
Characteristic pattern.
Further, the base neural network is lightweight feature extraction network.
Further, described that candidate region identification is carried out to the fisrt feature figure, obtain the candidate regions of the target image
The step of domain information, comprising: the fisrt feature figure is input to region candidate and generates network;It is generated by the region candidate
Network carries out feature extraction to the fisrt feature figure, obtains intermediate features figure, and carry out candidate regions to the intermediate features figure
Domain identification, obtains the candidate region information of the target image.
Further, it includes sequentially connected channel convolutional layer and Volume Four lamination that the region candidate, which generates network,.
Further, described according to the candidate region information and the fisrt feature figure, the step of generating testing result, packet
It includes: the fisrt feature figure and the intermediate features figure is input to spatial attention network;Pass through the spatial attention
The fisrt feature figure and the intermediate features figure are merged and to form second feature figure by network;Wherein, the second feature figure
Foreground features are better than background characteristics;According to the candidate region information and the second feature figure, testing result is generated.
Further, the spatial attention network includes sequentially connected 5th convolutional layer and activation primitive layer;It is described to swash
The output end of function layer living is connected with multiplying layer.
Further, batch normalization layer is also connected between the 5th convolutional layer and the activation primitive layer.
Further, described to be merged the fisrt feature figure and the intermediate features figure by the spatial attention network
The step of forming second feature figure, comprising: the intermediate features figure is input to the 5th convolutional layer, passes through described volume five
Lamination, described batch of normalization layer and the activation primitive layer are successively handled the intermediate features figure, obtain the activation
The intermediate features figure after processing of function layer output;Wherein, the foreground features of the intermediate features figure after processing
It is better than background characteristics;The intermediate features figure by the fisrt feature figure and after processing is input to the multiplying layer;
Multiplying is carried out by the intermediate features figure of the multiplying layer to the fisrt feature figure and after processing, is generated
Second feature figure.
Further, described according to the candidate region information and the second feature figure, the step of generating testing result, packet
It includes: the candidate region information and the second feature figure is input to candidate region feature extraction layer;Pass through the candidate regions
Characteristic of field extract layer is based on the candidate region information, and the region of each candidate region is extracted on the second feature figure
Feature;Provincial characteristics based on each candidate region carries out target detection, generates testing result.
Further, the provincial characteristics based on each candidate region carries out target detection, generates testing result
Step, comprising: classification processing is carried out to the provincial characteristics of each candidate region by the classification sub-network, determine described in
Target category in target image;And/or the provincial characteristics of each candidate region is returned by returning sub-network
Processing, obtains the target position in the target image.
Further, the classification sub-network and the recurrence sub-network are a full articulamentum.
Second aspect, the embodiment of the present invention also provide a kind of object detecting device, comprising: image collection module, for obtaining
Take target image to be detected;Fisrt feature figure generation module generates first for carrying out feature extraction to the target image
Characteristic pattern;Wherein, the fisrt feature figure includes the characteristic information of different scale;Candidate identification module, for described the
One characteristic pattern carries out region candidate identification, obtains the candidate region information of the target image;Detection module, for according to
Candidate region information and the fisrt feature figure generate testing result;The testing result includes the mesh in the target image
Mark classification and/or target position.
The third aspect, the embodiment of the invention provides a kind of object detection system, the system comprises: image collector
It sets, processor and storage device;Described image acquisition device, for acquiring target image;Meter is stored on the storage device
Calculation machine program, the computer program execute the described in any item methods of above-mentioned first aspect when being run by the processor.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage
Computer program is stored on medium, the computer program is executed when being run by processor described in above-mentioned any one of first aspect
Method the step of.
The embodiment of the invention provides a kind of object detection method, apparatus and system, can to the target image of acquisition into
Row feature extraction, generation include the fisrt feature figure of the characteristic information of different scale;Then region is carried out to fisrt feature figure
Candidate's identification obtains candidate region information, and then can generate testing result according to candidate region information and fisrt feature figure.
Aforesaid way provided in this embodiment can carry out target detection using the characteristic information of different scale, and detection is effectively promoted
Effect.
Other features and advantages of the present invention will illustrate in the following description, alternatively, Partial Feature and advantage can be with
Deduce from specification or unambiguously determine, or by implement the disclosure above-mentioned technology it can be learnt that.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below
Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor
It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 shows the structural schematic diagram of a kind of electronic equipment provided by the embodiment of the present invention;
Fig. 2 shows a kind of target detection flow charts provided by the embodiment of the present invention;
Fig. 3 shows a kind of structural schematic diagram of context enhancing network provided by the embodiment of the present invention;
Fig. 4 shows a kind of structural schematic diagram of spatial attention network provided by the embodiment of the present invention;
Fig. 5 shows a kind of structural schematic diagram of target detection model provided by the embodiment of the present invention;
Fig. 6 shows a kind of structural block diagram of object detection system provided by the embodiment of the present invention;
Fig. 7 shows a kind of fisrt feature figure generation schematic diagram provided by the embodiment of the present invention;
Fig. 8 shows a kind of second feature figure generation schematic diagram provided by the embodiment of the present invention;
Fig. 9 shows a kind of structural block diagram of object detecting device provided by the embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention
Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
In view of target detection in the prior art is ineffective, to improve this problem, provided in an embodiment of the present invention one
Kind object detection method, apparatus and system, which can be used corresponding software or hardware realization, below to the embodiment of the present invention
It describes in detail.
Embodiment one:
Firstly, describing the example of object detection method for realizing the embodiment of the present invention, apparatus and system referring to Fig.1
Electronic equipment 100.
The structural schematic diagram of a kind of electronic equipment as shown in Figure 1, electronic equipment 100 include one or more processors
102, one or more storage devices 104, input unit 106, output device 108 and image collecting device 110, these components
It is interconnected by bindiny mechanism's (not shown) of bus system 112 and/or other forms.It should be noted that electronic equipment shown in FIG. 1
100 component and structure be it is illustrative, and not restrictive, as needed, the electronic equipment also can have other
Component and structure.
The processor 102 can use digital signal processor (DSP), field programmable gate array (FPGA), can compile
At least one of journey logic array (PLA) example, in hardware realizes that the processor 102 can be central processing unit
(CPU) or one or more of the processing unit of other forms with data-handling capacity and/or instruction execution capability
Combination, and can control other components in the electronic equipment 100 to execute desired function.
The storage device 104 may include one or more computer program products, and the computer program product can
To include various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.It is described easy
The property lost memory for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non-
Volatile memory for example may include read-only memory (ROM), hard disk, flash memory etc..In the computer readable storage medium
On can store one or more computer program instructions, processor 102 can run described program instruction, to realize hereafter institute
The client functionality (realized by processor) in the embodiment of the present invention stated and/or other desired functions.In the meter
Can also store various application programs and various data in calculation machine readable storage medium storing program for executing, for example, the application program use and/or
The various data etc. generated.
The input unit 106 can be the device that user is used to input instruction, and may include keyboard, mouse, wheat
One or more of gram wind and touch screen etc..
The output device 108 can export various information (for example, image or sound) to external (for example, user), and
It and may include one or more of display, loudspeaker etc..
Described image acquisition device 110 can shoot the desired image of user (such as photo, video etc.), and will be clapped
The image taken the photograph is stored in the storage device 104 for the use of other components.
Illustratively, for realizing object detection method according to an embodiment of the present invention, the exemplary electron of apparatus and system
Equipment may be implemented as the intelligent terminals such as smart phone, tablet computer, computer.
Embodiment two:
A kind of target detection flow chart shown in Figure 2, this method can be held by the electronic equipment that previous embodiment provides
Row, this method specifically comprise the following steps:
Step S202 obtains target image to be detected.It wherein, include target pair to be detected in the target image
As.Wherein, the classification of target object can sets itself according to actual needs, such as setting target object can for people, cat,
Vehicle etc..
In one embodiment, the picture frame that the image capture devices such as camera can be directly acquired is as mesh
Logo image.In another embodiment, initial pictures to be detected can be obtained by camera first, then to initial graph
As being pre-processed, target image is obtained;That is, being pre-processed to the picture frame that camera directly acquires, after pretreatment
Image as target image.Wherein, pretreatment may include the image processing operations such as whitening operation.Wherein, whitening operation
It can be described as averaging operation, primary operational process is that each channel of initial pictures is subtracted to corresponding preset in the channel to be averaged
Value, then again divided by the corresponding variance in the channel, to obtain pretreated image.By being located in advance to initial pictures
Reason, obtains satisfactory image, can effectively accelerate to detect speed.Such as, target image is carried out using neural network
When detection, the convergence rate of neural network is helped speed up through handling obtained target image.
Step S204 carries out feature extraction to target image, generates fisrt feature figure;Wherein, fisrt feature figure includes
The characteristic information of different scale.
In the specific implementation, fisrt feature map generalization mode can be with are as follows: target image is input to base neural network;
Multistage feature extraction is carried out to target image by base neural network, obtains the characteristic information of different scale;Wherein, each
The scale for the characteristic information that stage extracts is different;It is special that the corresponding feature fusion of multiple specified phases is finally formed first
Sign figure.
Above-mentioned base neural network is the backbone network that can extract characteristics of image, and main function is to extract image spy
Sign, and generate characteristic pattern.In order to promote detection efficiency, it can choose the lightweights such as Xception or ShuffleNet
Feature extraction network is as the base neural network in the present embodiment.Wherein, lightweight feature extraction network is characterized in that net
Network structure is simple, memory requirements is low, operand is smaller and detection efficiency is higher.
Step S206 carries out region candidate identification to fisrt feature figure, obtains the candidate region information of target image.Its
In, candidate region information may include location information and confidence level of multiple candidate regions etc..Candidate region namely target image
Middle may include the region of target object.
Such as, fisrt feature figure is input to region candidate and generates network (Region Proposal Network, RPN);
Network is generated by region candidate, feature extraction is carried out to fisrt feature figure, obtain intermediate features figure, and to the intermediate features
Figure carries out candidate region identification, obtains the candidate region information of target image.Step S206 can be shown that the embodiment of the present invention provides
Object detection method specifically use the basic principle of two-stage detection algorithm.Wherein, two-stage detection algorithm is substantially former
Reason are as follows: the candidate region (and can be described as candidate frame) first in forecast image, being then based on candidate region prediction includes target
The target area (and can be described as detection block) of object.For the stage detection algorithm for directly predicting target area, two
The detection accuracy of stage detection algorithm is more preferable.The networks mould such as Faster R-CNN, R-FCN can be used in two-stage detection algorithm
Type is realized, wherein it is that network is commonly used used by various two-stage detection models that region candidate, which generates network, multiple for generating
Candidate region.
Step S208 generates testing result according to candidate region information and fisrt feature figure;Testing result includes target figure
Target category and/or target position as in.
Above-mentioned object detection method provided in an embodiment of the present invention can carry out feature extraction to the target image of acquisition,
Generation includes the fisrt feature figure of the characteristic information of different scale;Then region candidate identification is carried out to fisrt feature figure, obtained
To candidate region information, and then testing result can be generated according to candidate region information and fisrt feature figure.The present embodiment provides
Aforesaid way can using the characteristic information of different scale carry out target detection, detection effect be effectively promoted.
When carrying out feature extraction to target image using base neural network, it is to be understood that the spy of neural network
Sign extraction process generally includes multiple stages, and each stage can (characteristic information can to the characteristic information that the last stage obtains
To be embodied in the form of characteristic pattern) feature is further extracted, obtain the stage corresponding characteristic information, the spy that different phase obtains
The scale of reference breath is different.In order to incorporate the semantic information and contextual information of different scale during target detection,
The present embodiment chooses several stages as specified phases, by the corresponding spy of specified phases from multiple stages of base neural network
Reference breath fusion forms fisrt feature figure.
In one embodiment, specified phases can based on neural network most latter two stage, such as, if base
Plinth neural network altogether there are four the stage, then selects phase III and fourth stage as specified rank in characteristic extraction procedure
Section.When the corresponding feature fusion of multiple specified phases is formed fisrt feature figure, it is referred to following steps realization:
Step 1, the fisrt feature information that the penultimate stage of base neural network extracts is obtained;
Step 2, the second feature information that the last stage of base neural network extracts is obtained;
Step 3, global pool operation is carried out to second feature information, obtains third feature information;
Step 4, enhance network by context to merge fisrt feature information, second feature information and third feature information
Form fisrt feature figure.
By the above-mentioned means, being capable of effectively multi-scale expression target image.It is understood that extracting characteristics of image
When, if taking the feature detection mode of fixed size, the testing result for being biased to the scale will be obtained, and the other scales of missing inspection
Feature.Based on this, the embodiment of the present invention carries out multistage feature extraction to target image by base neural network, can will scheme
As being detected and being matched on multiple scales, so that the characteristic information that the fisrt feature figure made is included is more accurate.
In one embodiment, a kind of structural schematic diagram of context shown in Figure 3 enhancing network, context increase
Strong network may include the first parallel convolutional layer, the second convolutional layer and third convolutional layer;Wherein, the output end of the second convolutional layer
It is also connected with up-sampling operation layer, the output end of third convolutional layer is also connected with broadcast operation layer.The output end of first convolutional layer,
The output end of the output end and broadcast operation layer that up-sample operation layer is connected with add operation layer jointly.
The parameter of first convolutional layer, the second convolutional layer and third convolutional layer can be identical or different, and such as, all selection includes
There is the convolutional layer that 245 sizes are the convolution kernel of 1*1 to realize, so that the characteristic information received to be passed through to the convolution kernel pressure of 1*1
It is condensed to 245 channels.
Fisrt feature information, second feature information and third feature information are merged and to be formed enhancing network by context
When fisrt feature figure, it is referred to following steps realization:
(1) by fisrt feature information input to the first convolutional layer, by second feature information input to the second convolutional layer, and
By third feature information input to third convolutional layer;That is, the characteristic information that different phase is obtained be separately input to it is corresponding
In convolutional layer;
(2) convolution operation is carried out to fisrt feature information by the first convolutional layer, it is special obtains first with specified scale
Reference breath;Such as, fisrt feature Informational Expression is the characteristic pattern that size is 20*20, passes through the 1*1 convolution operation of the first convolutional layer
Afterwards, the characteristic pattern that specified size is 20*20 is obtained.
(3) convolution operation and up-sampling are successively carried out to second feature information by the second convolutional layer and up-sampling operation layer
Operation obtains the second feature information with specified scale;Such as, second feature Informational Expression is the feature that size is 10*10
Figure is obtained after the 1*1 convolution operation of the second convolutional layer and twice of up-sampling (2*Upsample) of up-sampling operation layer
The characteristic pattern that specified size is 20*20.
(4) convolution operation and setting-up exercises to music are successively carried out to third feature information by third convolutional layer and broadcast operation layer
Make, obtains the third feature information with specified scale;Such as, third feature Informational Expression is the characteristic pattern of 1*1, passes through third
After the 1*1 convolution operation of convolutional layer and the broadcast operation of broadcast operation layer, the characteristic pattern that specified size is 20*20 is obtained.
(5) by add operation layer to fisrt feature information, the second feature with specified scale with specified scale
Information and third feature information with specified scale sum up, and form fisrt feature figure.
Same scale (size) is converted to by the characteristic information for obtaining different phase, so that different phase is obtained
Characteristic information sums up, and obtains finally comprising there are many characteristic patterns of semantic information and contextual information.
In order to further enhance target detection speed, network is generated compared to conventional region candidate, the present embodiment provides
A kind of region candidate that structure is simplified generates network, the region candidate generate network include sequentially connected channel convolutional layer and
Volume Four lamination.Such as, the channel convolution that it is 5*5 including 1 size which, which can be, which can be with
The convolution kernel for being 1*1 including 256 sizes, and can be described as the 1*1 Standard convolution in 256 channels.It is generated by this region candidate
Network can easily identify that the candidate region in characteristic pattern such as can generate up to 200 candidate regions.
In order to further promote target detection precision, according to candidate region information and fisrt feature figure, detection is generated
As a result a kind of embodiment, which may is that, generates network in the mistake for generating candidate region information for fisrt feature figure and region candidate
Intermediate features figure obtained in journey is input to spatial attention network;By spatial attention network by fisrt feature figure and centre
Characteristic pattern merges to form second feature figure;Wherein, the foreground features of second feature figure are better than background characteristics, can be regarded as again
The characteristic value (abbreviation foreground features value) of the foreground area of two characteristic patterns is higher than characteristic value (the abbreviation background characteristics of background area
Value).According to candidate region information and second feature figure, testing result is generated.Exist it is understood that region candidate generates network
It includes foreground information and background information that it is also potential, which to generate obtained intermediate features figure during the information of candidate region,
In, foreground information can be understood as the information of target object region again, and background information can be regarded as not comprising target pair
The information in the region of elephant;Spatial attention network is based on intermediate features figure, can be to the feature of the foreground area in fisrt feature figure
Carry out enhancing processing (such as, increasing foreground features value), the feature of background area carries out weakening process (such as, reduction background spy
Value indicative), to obtain the second feature figure that foreground features are better than background characteristics.For ease of understanding, as follows in this simple examples: false
If the foreground features value in fisrt feature figure is 0.5, background characteristics value is 0.4, and the discrimination of foreground area and background area is not
Greatly;But after spatial attention network processes, foreground features value can be promoted to 0.6, background characteristics value is reduced to
0.1, so that foreground features are significantly stronger than background characteristics, the discrimination of foreground area and background area is increased, and effectively highlight
Foreground area out.It should be noted that the foreground features value in fisrt feature figure has been illustrated the above is only exemplary illustration
Slightly better than background characteristics value but when numerical value is not much different, spatial attention network can still enhance further foreground features value, into one
Step weakens background characteristics value, to increase the discrimination between foreground area and background area.Certain fisrt feature figure also will appear
Foreground features value be lower than background characteristics value the case where, at this time spatial attention network can greatly be promoted foreground features value with
And background characteristics value is reduced, so that foreground features value be made to be higher than background characteristics value, details are not described herein.Space is used by this
Attention network enhances foreground features, weakens the mode of background characteristics, facilitates the feature for enhancing candidate region, namely after being allowed to
The feature for the candidate region directly extracted on second feature figure more highlights, so that testing result is more accurate.
In a kind of embodiment, spatial attention network includes sequentially connected 5th convolutional layer and activation primitive layer;Swash
The output end of function layer living is connected with multiplying layer.Such as, the 5th convolutional layer may include the volume that 245 sizes are 1*1
Product core, the activation primitive layer can be realized using Sigmoid activation primitive.In another embodiment, the 5th convolutional layer and
Batch normalization layer (BatchNorm) is also connected between activation primitive layer.
A kind of structural schematic diagram of spatial attention network as shown in connection with fig. 4, is further described and passes through spatial attention
Network generation second feature figure specific implementation: intermediate features figure is input to the 5th convolutional layer, by the 5th convolutional layer,
Batch normalization layer and activation primitive layer are successively handled intermediate characteristic pattern, obtain the output of activation primitive layer after processing
Intermediate features figure;Wherein, the foreground features of intermediate features figure after processing are better than background characteristics;By fisrt feature figure and through locating
Intermediate features figure after reason is input to multiplying layer;Enhancing by multiplying layer to fisrt feature figure and each candidate region
Feature carries out multiplying, generates second feature figure.Intermediate features figure after processing can embody the feature power in each region
Weight, above-mentioned spatial attention, which generates network, to be weighted (re- to the feature in fisrt feature figure based on feature weight
Weight), the foreground features in second feature figure obtained after weighting are better than background characteristics, help to highlight target location
Domain promotes the accuracy of object detection results.
After generating second feature figure, candidate region information and second feature figure can be input to candidate region feature
Extract layer;It is based on candidate region information by candidate region feature extraction layer, each candidate region is extracted on second feature figure
Provincial characteristics;The provincial characteristics for being then based on each candidate region carries out target detection, generates testing result.Candidate region mentions
It takes layer when extracting provincial characteristics, one of such as following operation can be executed to candidate region;RoI pooling
(Region of Interest pooling, area-of-interest pond) operation, PSRoI pooling (Position
Sensitive Region of Interest pooling, the area-of-interest pond of position sensing) operation, RoI align
(Region of Interest align, area-of-interest alignment) operation or PSRoI align (Position Sensitive
Region of Interest align, the area-of-interest alignment of position sensing) operation etc..
When extracting the provincial characteristics of each candidate region by candidate region feature extraction layer, then each candidate can be based on
The provincial characteristics in region carries out target detection, specifically, can be special by region of the classification sub-network to each candidate region
Sign carries out classification processing, determines the target category in target image;And/or by returning sub-network to each candidate region
Provincial characteristics carries out recurrence processing, obtains the target position in target image.
In order to further promote target detection efficiency, shorten detection time, the classification sub-network that the present embodiment uses
It can be a full articulamentum with sub-network is returned.Wherein, the port number as the full articulamentum of classification sub-network can be
Classification number;Port number as the full articulamentum for returning sub-network can be 4 channels.In addition to this, classification sub-network and recurrence
It can be connected to a full articulamentum again before sub-network, by the full articulamentum first to the provincial characteristics of each candidate region into one
Step extracts feature, can preferably be divided for the provincial characteristics further extracted to classify sub-network with sub-network is returned
Class and frame return.In one embodiment, classification sub-network and the full articulamentum before returning sub-network can have 1024
Channel.
For ease of understanding, a kind of structural schematic diagram of target detection model shown in fig. 5, the target detection mould be may refer to
Type can be used for realizing above-mentioned object detection method, specifically illustrate base neural network, context enhancing network, space transforms
Power network, region candidate generate network, candidate region feature extraction layer, classification sub-network and the connection relationship for returning sub-network,
Details are not described herein for the specific effect of each network.It should be noted that target detection model shown in fig. 5 is only a kind of example,
In practical applications, adaptability doses other network structures or deletes portion in target detection model that can be shown in Fig. 5
Subnetwork structure.
In conclusion the above-mentioned object detection method provided through this embodiment, can be had using context enhancing network
Effect combines the semantic information and contextual information of different scale, so that characteristic pattern includes the characteristic information there are many scale;Using
Spatial attention network can the feature to candidate region carry out enhancing processing, so as to preferably based on the time with Enhanced feature
Favored area carries out target detection, obtains more accurate testing result.Moreover, the base neural network that the present embodiment uses is light
Quantative feature extracts network, context enhancing network, spatial attention network, classification sub-network and the recurrence that the present embodiment proposes
Sub-network structure is simplified, and operand is smaller, effectively improves detection efficiency.In conclusion the above-mentioned target that the present embodiment proposes
Detection method can effectively promote target detection precision and target detection speed.
Embodiment three:
A kind of specific example using preceding aim detection method is present embodiments provided, is specifically illustrated a kind of based on deep
The object detection system (and can be described as target detection model) of neural network is spent, which is mainly to current light
Magnitude two-stage algorithm of target detection (such as, Light-Head R-CNN) is improved, efficient, high-precision to realize
Target detection.
Generally speaking, object detection system provided in an embodiment of the present invention mainly includes following three modules: image is located in advance
Manage module, region candidate (Region Proposal) extraction module and region candidate identification module.Wherein, image preprocessing mould
Block is responsible for pre-processing the image (that is, aforementioned original image) of input, and region candidate extraction module mainly uses convolution
Neural network generates potential target area (that is, aforementioned candidates region), and region candidate identification module mainly uses nerve
The region candidate that network extracts region candidate extraction module identifies, obtains testing result to the end.In practical applications,
Object detection system can also be not provided with image pre-processing module, but original image is directly input to region candidate and extracts mould
Block.The main function of image pre-processing module is to accelerate target detection speed.
Specifically, present embodiments providing a kind of structural block diagram of object detection system as shown in FIG. 6, Fig. 6 Fig. 5
A kind of specific implementation, more visualize and embody for Fig. 5, below in conjunction with Fig. 6 to the present embodiment provides
Lightweight two-stage detection method be further described below:
Step 1: image procossing
Whitening operation is carried out to image to be detected, obtains the target image that can be input to neural network, and by target figure
As being scaled 320*320 pixel size.Specifically, input (Input) namely target image shown in fig. 6, size 320*
320*3。
Step 2: candidate region is extracted
Above-mentioned target image is input to base neural network (and can be described as backbone network, backbone), passes through basis
The feature of neural network extraction target area.In order to improve detection efficiency, the basic network Xception of lightweight can be used
It is realized with ShuffleNet.
In order to enhance the character representation ability of object detection system, object detection system shown in fig. 6 is also illustrated
Hereafter enhance network (and can be described as context enhancing module, Context Enhancement Module, CEM), to merge
The semantic information and contextual information of different scale.The context as shown in Figure 3 provided in conjunction with last embodiment enhances network
Structural schematic diagram, a kind of fisrt feature figure shown in Figure 7 generates schematic diagram, and it makes use of the in base neural network the 3rd
The characteristic pattern C that a stage generates4(scale 20*20), the characteristic pattern C that the 4th stage generates5(scale 10*10), Yi Jite
Sign figure C5Through the characteristic pattern C after global pool (global avg pooling)glb(scale 1*1), characteristic pattern C4Through upper and lower
The first convolutional layer in text enhancing network carries out 1*1 convolution and 245 channels of boil down to, obtains the C that scale is 20*204—lat;
Characteristic pattern C51*1 convolution, and 245 channels of boil down to are carried out through the second convolutional layer in context enhancing network, are passed through again later
It crosses up-sampling operation layer and carries out twice of up-sampling operation (2*Upsample), obtain the C that scale is 20*205—lat, characteristic pattern Cglb
1*1 convolution, and 245 channels of boil down to are carried out through the third convolutional layer in context enhancing network, later using setting-up exercises to music
Make layer and carry out broadcast operation (Broadcast), obtains the C that scale is 20*20glb—lat, final C4—lat、C5—latAnd Cglb—latIt is logical
The addition of add operation layer is crossed, fisrt feature figure CEM_fm (size 20*20*245) is obtained.
Later, fisrt feature figure CEM_fm is input to region candidate and generates network RPN, it is potential to be generated by RPN
Target frame (bounding box), potential target frame are aforementioned candidates region.In order to promote computational efficiency, region candidate is raw
It only include the 1x1 Standard convolution of the channel 5x5 convolution (a depthwise convolution) and 256 channels at network.
In the specific implementation, network is generated by region candidate, every picture can produce up to 200 candidate regions.Specifically,
Illustrate that region candidate generates network and is based on fisrt feature figure CEM_fm generation intermediate features figure RPN_fm (size 20* in Fig. 6
20*256), it then also illustrates and Rols (Region of Interest, area-of-interest) is generated by RPN_fm, also to obtain the final product
To candidate region information.
Step 3: identification candidate region
In order to further enhance the character representation ability of object detection system, object detection system shown in fig. 6 is also illustrated
Spatial attention network (and can be described as space transforms power module, Spatial Attention Module, SAM) is gone out, has been used to
The feature in fisrt feature figure generated to context enhancing network is weighted (re-weight).Specifically, in conjunction with Fig. 4
Shown in a kind of structural schematic diagram of spatial attention network, the present embodiment illustrates a kind of second feature as shown in Figure 8 again
Figure generates schematic diagram, illustrates that the intermediate features figure RPN_fm of RPN output successively passes through 1*1 convolutional layer, BatchNorm normalizing layer
With Sigmoid active coating, element multiplication is carried out with fisrt feature figure CEM_fm, obtains second feature figure SAM_fm.Such as Fig. 6 institute
Show, the size of the resulting second feature figure SAM_fm of spatial attention network is 20*20*245.
Later, second feature figure SAM_fm progress such as RoI pooling operation, PSRoI pooling can be operated,
RoI align operation or PSRoI align operation etc., to extract provincial characteristics.As shown in fig. 6, based on Rols to second feature
Figure SAM_fm executes PSROI Align operation (for sake of simplicity, not illustrating the candidate regions for executing PSROI Align operation in Fig. 6
Characteristic of field extract layer), the provincial characteristics Rol_fm (size 7*7*5) of each candidate region is obtained, and utilize R-CNN sub-network pair
Each candidate region is identified that identification includes that classification (classification) and frame return (bounding box
Regression) two tasks, finally obtain classification results and regression result.In practical applications, R-CNN sub-network can be first
First include the full articulamentum (FC, fully-connected layer) in one layer of 1024 channel, then includes connecing to connect entirely at this parallel
Two full articulamentums after layer are connect, one for classifying, port number is identical as classification number, another is returned for frame,
Namely the coordinate of target frame is calculated, port number is 4 channels.For sake of simplicity, Fig. 6 be symbol illustrate one 1024 it is logical
The full articulamentum FC in road can be used for carrying out feature to the provincial characteristics of candidate region extracting again, the feature that then will be extracted again
Classification is carried out respectively and frame returns, and obtains classification results and regression result.In order to verify light weight provided in an embodiment of the present invention
The performance of grade two-stage object detection method, by object detection method provided in an embodiment of the present invention on MS COCO data set
It is compared with existing lightweight object detection method, the results are shown in Table 1.
Table 1
AP (Average Precision) in table 1 is represented by the average detected precision of each object detection method,
MFLOPs is represented by calculation amount when each object detection method obtains testing result.From table 1 it follows that the present invention is implemented
The object detection method (Mobile Light-Head R-CNN shown in last three row) that example provides can use less than half of
Calculation amount realizes identical even preferably Detection accuracy;And under close calculation amount, target inspection provided in an embodiment of the present invention
Obvious better Detection accuracy may be implemented in survey method.That is, object detection method provided in an embodiment of the present invention is effectively
Improve target detection speed and target detection precision.
Example IV:
For object detection method provided in embodiment two, the embodiment of the invention provides a kind of target detection dresses
It sets, a kind of structural block diagram of object detecting device shown in Figure 9, comprising:
Image collection module 902, for obtaining target image to be detected;
Fisrt feature figure generation module 904 generates fisrt feature figure for carrying out feature extraction to target image;Wherein,
Fisrt feature figure includes the characteristic information of different scale;
Candidate identification module 906 obtains the candidate regions of target image for carrying out region candidate identification to fisrt feature figure
Domain information;
Detection module 908, for generating testing result according to candidate region information and fisrt feature figure;Testing result packet
Containing in target image target category and/or target position.
Above-mentioned object detecting device provided in an embodiment of the present invention can carry out feature extraction to the target image of acquisition,
Generation includes the fisrt feature figure of the characteristic information of different scale;Then region candidate identification is carried out to fisrt feature figure, obtained
To candidate region information, and then testing result can be generated according to candidate region information and fisrt feature figure.The present embodiment provides
Aforesaid way can using the characteristic information of different scale carry out target detection, detection effect be effectively promoted.
In one embodiment, above-mentioned image collection module 902 is used for: obtaining initial pictures to be detected;To initial
Image is pre-processed, and target image is obtained;Wherein, pretreatment includes whitening operation.
In one embodiment, above-mentioned fisrt feature figure generation module 904 is used for: target image is input to basic mind
Through network;Multistage feature extraction is carried out to target image by base neural network, obtains the characteristic information of different scale;Its
In, the scale for the characteristic information that each stage extracts is different;The corresponding feature fusion of multiple specified phases is formed the
One characteristic pattern.
In one embodiment, above-mentioned fisrt feature figure generation module 904 is further used for: obtaining base neural network
Penultimate stage extract fisrt feature information;Obtain the second spy that the last stage of base neural network extracts
Reference breath;Global pool operation is carried out to second feature information, obtains third feature information;Enhance network for the by context
One characteristic information, second feature information and third feature information merge to form fisrt feature figure.
In one embodiment, above-mentioned context enhancing network include the first parallel convolutional layer, the second convolutional layer and
Third convolutional layer;Wherein, the output end of the second convolutional layer is also connected with up-sampling operation layer, and the output end of third convolutional layer also connects
It is connected to broadcast operation layer;The output end of the output end of first convolutional layer, the output end for up-sampling operation layer and broadcast operation layer is total
It is same to be connected with add operation layer.
In one embodiment, above-mentioned fisrt feature figure generation module 904 is further used for: fisrt feature information is defeated
Enter to the first convolutional layer, by second feature information input to the second convolutional layer, and third feature information input to third is rolled up
Lamination;Convolution operation is carried out to fisrt feature information by the first convolutional layer, obtains the fisrt feature information with specified scale;
By the second convolutional layer and up-sampling operation layer successively carries out convolution operation to second feature information and up-sampling operates, and is had
There is the second feature information of specified scale;Convolution successively is carried out to third feature information by third convolutional layer and broadcast operation layer
Operation and broadcast operation obtain the third feature information with specified scale;By add operation layer to specified scale
Fisrt feature information, the second feature information with specified scale and the third feature information with specified scale sum up place
Reason forms fisrt feature figure.
In one embodiment, above-mentioned base neural network is lightweight feature extraction network.
In one embodiment, above-mentioned candidate identification module 906 is used for: it is raw that fisrt feature figure is input to region candidate
At network;Network is generated by region candidate, feature extraction is carried out to fisrt feature figure, obtain intermediate features figure, and in described
Between characteristic pattern carry out candidate region identification, obtain the candidate region information of target image.
In one embodiment, it includes sequentially connected channel convolutional layer and Volume Four that above-mentioned zone candidate, which generates network,
Lamination.
In one embodiment, above-mentioned detection module 908 is for fisrt feature figure and intermediate features figure to be input to
Spatial attention network;It merges fisrt feature figure and intermediate features figure to form second feature figure by spatial attention network;
Wherein, the foreground features of second feature figure are better than background characteristics;According to candidate region information and second feature figure, detection knot is generated
Fruit.
In one embodiment, above-mentioned spatial attention network includes sequentially connected 5th convolutional layer and activation primitive
Layer;The output end of activation primitive layer is connected with multiplying layer.
In one embodiment, batch normalization layer is also connected between above-mentioned 5th convolutional layer and activation primitive layer.
In one embodiment, above-mentioned detection module 908 is further used for: intermediate features figure is input to the 5th convolution
Layer successively handles intermediate characteristic pattern by the 5th convolutional layer, batch normalization layer and activation primitive layer, obtains activation primitive
The intermediate features figure after processing of layer output;Wherein, the foreground features of intermediate features figure after processing are better than background characteristics;
Intermediate features figure by fisrt feature figure and after processing is input to multiplying layer;By multiplying layer to fisrt feature figure
Intermediate features figure after processing carries out multiplying, generates second feature figure.
In one embodiment, above-mentioned detection module 908 is further used for: by candidate region information and second feature figure
It is input to candidate region feature extraction layer;It is based on candidate region information by candidate region feature extraction layer, in second feature figure
The upper provincial characteristics for extracting each candidate region;Provincial characteristics based on each candidate region carries out target detection, generates detection
As a result.
In one embodiment, above-mentioned detection module 908 is further used for: by classification sub-network to each candidate regions
The provincial characteristics in domain carries out classification processing, determines the target category in target image;And/or by returning sub-network to each
The provincial characteristics of candidate region carries out recurrence processing, obtains the target position in target image.
In one embodiment, above-mentioned classification sub-network and recurrence sub-network are a full articulamentum.
The technical effect of device provided by the present embodiment, realization principle and generation is identical with previous embodiment, for letter
It describes, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
In addition, present embodiments provide a kind of object detection system, the system include: image collecting device, processor and
Storage device;Image collecting device, for acquiring image to be detected;Computer program, computer journey are stored on storage device
Sequence executes preceding aim detection method when being run by processor.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description
Specific work process, can be with reference to the corresponding process in previous embodiment, and details are not described herein.
Further, a kind of computer readable storage medium is present embodiments provided, is deposited on the computer readable storage medium
The step of containing computer program, method provided by above-described embodiment two executed when computer program is run by processor.
The computer program product of object detection method, apparatus and system provided by the embodiment of the present invention, including storage
The computer readable storage medium of program code, the instruction that said program code includes can be used for executing previous methods embodiment
Described in method, specific implementation can be found in embodiment of the method, details are not described herein.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
In the description of the present invention, it should be noted that term " center ", "upper", "lower", "left", "right", "vertical",
The orientation or positional relationship of the instructions such as "horizontal", "inner", "outside" be based on the orientation or positional relationship shown in the drawings, merely to
Convenient for description the present invention and simplify description, rather than the device or element of indication or suggestion meaning must have a particular orientation,
It is constructed and operated in a specific orientation, therefore is not considered as limiting the invention.In addition, term " first ", " second ",
" third " is used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention
Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair
It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art
In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention
Within the scope of.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (19)
1. a kind of object detection method characterized by comprising
Obtain target image to be detected;
Feature extraction is carried out to the target image, generates fisrt feature figure;Wherein, the fisrt feature figure includes different rulers
The characteristic information of degree;
Region candidate identification is carried out to the fisrt feature figure, obtains the candidate region information of the target image;
According to the candidate region information and the fisrt feature figure, testing result is generated;The testing result includes the mesh
Target category and/or target position in logo image.
2. the method according to claim 1, wherein the step of acquisition target image to be detected, comprising:
Obtain initial pictures to be detected;
The initial pictures are pre-processed, target image is obtained;Wherein, the pretreatment includes whitening operation.
3. being generated the method according to claim 1, wherein described carry out feature extraction to the target image
The step of fisrt feature figure, comprising:
The target image is input to base neural network;
Multistage feature extraction is carried out to the target image by the base neural network, obtains the feature letter of different scale
Breath;Wherein, the scale for the characteristic information that each stage extracts is different;
The corresponding feature fusion of multiple specified phases is formed into fisrt feature figure.
4. according to the method described in claim 3, it is characterized in that, described by the corresponding feature fusion of multiple specified phases
The step of forming fisrt feature figure, comprising:
Obtain the fisrt feature information that the penultimate stage of the base neural network extracts;
Obtain the second feature information that the last stage of the base neural network extracts;
Global pool operation is carried out to the second feature information, obtains third feature information;
Enhance network by context to melt the fisrt feature information, the second feature information and the third feature information
Conjunction forms fisrt feature figure.
5. according to the method described in claim 4, it is characterized in that, context enhancing network includes the first parallel convolution
Layer, the second convolutional layer and third convolutional layer;Wherein, the output end of second convolutional layer is also connected with up-sampling operation layer, institute
The output end for stating third convolutional layer is also connected with broadcast operation layer;
The output end of the output end of first convolutional layer, the output end of the up-sampling operation layer and the broadcast operation layer is total
It is same to be connected with add operation layer.
6. according to the method described in claim 5, it is characterized in that, described enhance network for the fisrt feature by context
Information, the second feature information and the third feature information merge the step of forming fisrt feature figure, comprising:
By the fisrt feature information input to first convolutional layer, by the second feature information input to the volume Two
Lamination, and by the third feature information input to the third convolutional layer;
Convolution operation is carried out to the fisrt feature information by first convolutional layer, it is special to obtain first with specified scale
Reference breath;
By second convolutional layer and the up-sampling operation layer successively to the second feature information carry out convolution operation and
Up-sampling operation, obtains the second feature information with the specified scale;
Convolution operation and wide successively is carried out to the third feature information by the third convolutional layer and the broadcast operation layer
Operation is broadcast, the third feature information with the specified scale is obtained;
By the add operation layer to the fisrt feature information with the specified scale, second with the specified scale
Characteristic information and third feature information with the specified scale sum up, and form fisrt feature figure.
7. according to the described in any item methods of claim 3 to 6, which is characterized in that the base neural network is that lightweight is special
Sign extracts network.
8. the method according to claim 1, wherein described carry out candidate region knowledge to the fisrt feature figure
Not, the step of obtaining the candidate region information of the target image, comprising:
The fisrt feature figure is input to region candidate and generates network;
Network is generated by the region candidate, feature extraction is carried out to the fisrt feature figure, obtain intermediate features figure, and right
The intermediate features figure carries out candidate region identification, obtains the candidate region information of the target image.
9. according to the method described in claim 8, it is characterized in that, it includes sequentially connected logical that the region candidate, which generates network,
Road convolutional layer and Volume Four lamination.
10. according to the method described in claim 8, it is characterized in that, described according to the candidate region information and described first
The step of characteristic pattern, generation testing result, comprising:
The fisrt feature figure and the intermediate features figure are input to spatial attention network;
The fisrt feature figure and the intermediate features figure are merged and to form second feature figure by the spatial attention network;
Wherein, the foreground features of the second feature figure are better than background characteristics;
According to the candidate region information and the second feature figure, testing result is generated.
11. according to the method described in claim 10, it is characterized in that, the spatial attention network includes sequentially connected
Five convolutional layers and activation primitive layer;The output end of the activation primitive layer is connected with multiplying layer.
12. according to the method for claim 11, which is characterized in that between the 5th convolutional layer and the activation primitive layer
It is also connected with batch normalization layer.
13. according to the method for claim 12, which is characterized in that it is described by the spatial attention network by described the
One characteristic pattern and the intermediate features figure merge the step of forming second feature figure, comprising:
The intermediate features figure is input to the 5th convolutional layer, by the 5th convolutional layer, described batch of normalization layer and
The activation primitive layer is successively handled the intermediate features figure, obtain activation primitive layer output after processing
The intermediate features figure;Wherein, the foreground features of the intermediate features figure after processing are better than background characteristics;
The intermediate features figure by the fisrt feature figure and after processing is input to the multiplying layer;
Multiplying is carried out by the intermediate features figure of the multiplying layer to the fisrt feature figure and after processing,
Generate second feature figure.
14. according to the method described in claim 10, it is characterized in that, described according to the candidate region information and described second
The step of characteristic pattern, generation testing result, comprising:
The candidate region information and the second feature figure are input to candidate region feature extraction layer;
It is based on the candidate region information by the candidate region feature extraction layer, is extracted on the second feature figure each
The provincial characteristics of candidate region;
Provincial characteristics based on each candidate region carries out target detection, generates testing result.
15. according to the method for claim 14, which is characterized in that the provincial characteristics based on each candidate region
The step of carrying out target detection, generating testing result, comprising:
Classification processing is carried out by provincial characteristics of the classification sub-network to each candidate region, is determined in the target image
Target category;And/or
Recurrence processing is carried out to the provincial characteristics of each candidate region by returning sub-network, is obtained in the target image
Target position.
16. according to the method for claim 15, which is characterized in that the classification sub-network and the recurrence sub-network are
One full articulamentum.
17. a kind of object detecting device characterized by comprising
Image collection module, for obtaining target image to be detected;
Fisrt feature figure generation module generates fisrt feature figure for carrying out feature extraction to the target image;Wherein, institute
State the characteristic information that fisrt feature figure includes different scale;
Candidate identification module obtains the candidate of the target image for carrying out region candidate identification to the fisrt feature figure
Area information;
Detection module, for generating testing result according to multiple candidate regions and the fisrt feature figure;The detection knot
Fruit includes target category and/or target position in the target image.
18. a kind of object detection system, which is characterized in that the system comprises: image collecting device, processor and storage dress
It sets;
Described image acquisition device, for acquiring target image;
Computer program is stored on the storage device, the computer program is executed when being run by the processor as weighed
Benefit requires 1 to 16 described in any item methods.
19. a kind of computer readable storage medium, computer program, feature are stored on the computer readable storage medium
The step of being, the described in any item methods of the claims 1 to 16 executed when the computer program is run by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811049034.4A CN109255352B (en) | 2018-09-07 | 2018-09-07 | Target detection method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811049034.4A CN109255352B (en) | 2018-09-07 | 2018-09-07 | Target detection method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109255352A true CN109255352A (en) | 2019-01-22 |
CN109255352B CN109255352B (en) | 2021-06-22 |
Family
ID=65048187
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811049034.4A Active CN109255352B (en) | 2018-09-07 | 2018-09-07 | Target detection method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109255352B (en) |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815770A (en) * | 2019-01-31 | 2019-05-28 | 北京旷视科技有限公司 | Two-dimentional code detection method, apparatus and system |
CN109816036A (en) * | 2019-01-31 | 2019-05-28 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109871890A (en) * | 2019-01-31 | 2019-06-11 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109886155A (en) * | 2019-01-30 | 2019-06-14 | 华南理工大学 | Man power single stem rice detection localization method, system, equipment and medium based on deep learning |
CN109886230A (en) * | 2019-02-28 | 2019-06-14 | 中南大学 | A kind of image object detection method and device |
CN109948611A (en) * | 2019-03-14 | 2019-06-28 | 腾讯科技(深圳)有限公司 | A kind of method and device that method, the information of information area determination are shown |
CN110008951A (en) * | 2019-03-14 | 2019-07-12 | 深兰科技(上海)有限公司 | A kind of object detection method and device |
CN110111299A (en) * | 2019-03-18 | 2019-08-09 | 国网浙江省电力有限公司信息通信分公司 | Rust staining recognition methods and device |
CN110135406A (en) * | 2019-07-09 | 2019-08-16 | 北京旷视科技有限公司 | Image-recognizing method, device, computer equipment and storage medium |
CN110135307A (en) * | 2019-04-30 | 2019-08-16 | 北京邮电大学 | Method for traffic sign detection and device based on attention mechanism |
CN110287836A (en) * | 2019-06-14 | 2019-09-27 | 北京迈格威科技有限公司 | Image classification method, device, computer equipment and storage medium |
CN110298821A (en) * | 2019-05-28 | 2019-10-01 | 昆明理工大学 | A kind of reinforcing bar detection method based on Faster R-CNN |
CN110427898A (en) * | 2019-08-07 | 2019-11-08 | 广东工业大学 | Wrap up safety check recognition methods, system, device and computer readable storage medium |
CN110532955A (en) * | 2019-08-30 | 2019-12-03 | 中国科学院宁波材料技术与工程研究所 | Example dividing method and device based on feature attention and son up-sampling |
CN110533119A (en) * | 2019-09-04 | 2019-12-03 | 北京迈格威科技有限公司 | The training method of index identification method and its model, device and electronic system |
CN110674886A (en) * | 2019-10-08 | 2020-01-10 | 中兴飞流信息科技有限公司 | Video target detection method fusing multi-level features |
CN110837789A (en) * | 2019-10-31 | 2020-02-25 | 北京奇艺世纪科技有限公司 | Method and device for detecting object, electronic equipment and medium |
CN111008555A (en) * | 2019-10-21 | 2020-04-14 | 武汉大学 | Unmanned aerial vehicle image small and weak target enhancement extraction method |
CN111144238A (en) * | 2019-12-11 | 2020-05-12 | 重庆邮电大学 | Article detection method and system based on Faster R-CNN |
CN111163294A (en) * | 2020-01-03 | 2020-05-15 | 重庆特斯联智慧科技股份有限公司 | Building safety channel monitoring system and method for artificial intelligence target recognition |
CN111340092A (en) * | 2020-02-21 | 2020-06-26 | 浙江大华技术股份有限公司 | Target association processing method and device |
CN111598882A (en) * | 2020-05-19 | 2020-08-28 | 联想(北京)有限公司 | Organ detection method and device and computer equipment |
CN111666958A (en) * | 2019-03-05 | 2020-09-15 | 中科院微电子研究所昆山分所 | Method, device, equipment and medium for detecting equipment state based on image recognition |
CN111797737A (en) * | 2020-06-22 | 2020-10-20 | 重庆高新区飞马创新研究院 | Remote sensing target detection method and device |
CN111797657A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Vehicle peripheral obstacle detection method, device, storage medium, and electronic apparatus |
CN111860413A (en) * | 2020-07-29 | 2020-10-30 | Oppo广东移动通信有限公司 | Target object detection method and device, electronic equipment and storage medium |
CN111914600A (en) * | 2019-05-08 | 2020-11-10 | 四川大学 | Group emotion recognition method based on space attention model |
CN111914997A (en) * | 2020-06-30 | 2020-11-10 | 华为技术有限公司 | Method for training neural network, image processing method and device |
CN112016569A (en) * | 2020-07-24 | 2020-12-01 | 驭势科技(南京)有限公司 | Target detection method, network, device and storage medium based on attention mechanism |
CN112016511A (en) * | 2020-09-08 | 2020-12-01 | 重庆市地理信息和遥感应用中心 | Remote sensing image blue top room detection method based on large-scale depth convolution neural network |
CN112036400A (en) * | 2020-07-09 | 2020-12-04 | 北京航空航天大学 | Method for constructing network for target detection and target detection method and system |
CN112241669A (en) * | 2019-07-18 | 2021-01-19 | 杭州海康威视数字技术股份有限公司 | Target identification method, device, system and equipment, and storage medium |
WO2021083126A1 (en) * | 2019-10-31 | 2021-05-06 | 北京市商汤科技开发有限公司 | Target detection and intelligent driving methods and apparatuses, device, and storage medium |
CN110096960B (en) * | 2019-04-03 | 2021-06-08 | 罗克佳华科技集团股份有限公司 | Target detection method and device |
WO2021218037A1 (en) * | 2020-04-29 | 2021-11-04 | 北京迈格威科技有限公司 | Target detection method and apparatus, computer device and storage medium |
CN116580027A (en) * | 2023-07-12 | 2023-08-11 | 中国科学技术大学 | Real-time polyp detection system and method for colorectal endoscope video |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120219210A1 (en) * | 2011-02-28 | 2012-08-30 | Yuanyuan Ding | Multi-Scale, Perspective Context, and Cascade Features for Object Detection |
CN106845499A (en) * | 2017-01-19 | 2017-06-13 | 清华大学 | A kind of image object detection method semantic based on natural language |
CN106910185A (en) * | 2017-01-13 | 2017-06-30 | 陕西师范大学 | A kind of DBCC disaggregated models and construction method based on CNN deep learnings |
US20170206431A1 (en) * | 2016-01-20 | 2017-07-20 | Microsoft Technology Licensing, Llc | Object detection and classification in images |
CN107093189A (en) * | 2017-04-18 | 2017-08-25 | 山东大学 | Method for tracking target and system based on adaptive color feature and space-time context |
CN107563290A (en) * | 2017-08-01 | 2018-01-09 | 中国农业大学 | A kind of pedestrian detection method and device based on image |
CN107945153A (en) * | 2017-11-07 | 2018-04-20 | 广东广业开元科技有限公司 | A kind of road surface crack detection method based on deep learning |
CN108038409A (en) * | 2017-10-27 | 2018-05-15 | 江西高创保安服务技术有限公司 | A kind of pedestrian detection method |
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
-
2018
- 2018-09-07 CN CN201811049034.4A patent/CN109255352B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120219210A1 (en) * | 2011-02-28 | 2012-08-30 | Yuanyuan Ding | Multi-Scale, Perspective Context, and Cascade Features for Object Detection |
US20170206431A1 (en) * | 2016-01-20 | 2017-07-20 | Microsoft Technology Licensing, Llc | Object detection and classification in images |
CN106910185A (en) * | 2017-01-13 | 2017-06-30 | 陕西师范大学 | A kind of DBCC disaggregated models and construction method based on CNN deep learnings |
CN106845499A (en) * | 2017-01-19 | 2017-06-13 | 清华大学 | A kind of image object detection method semantic based on natural language |
CN107093189A (en) * | 2017-04-18 | 2017-08-25 | 山东大学 | Method for tracking target and system based on adaptive color feature and space-time context |
CN107563290A (en) * | 2017-08-01 | 2018-01-09 | 中国农业大学 | A kind of pedestrian detection method and device based on image |
CN108038409A (en) * | 2017-10-27 | 2018-05-15 | 江西高创保安服务技术有限公司 | A kind of pedestrian detection method |
CN107945153A (en) * | 2017-11-07 | 2018-04-20 | 广东广业开元科技有限公司 | A kind of road surface crack detection method based on deep learning |
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
Non-Patent Citations (3)
Title |
---|
SEAN BELL ET.AL: "Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
XIAOWEI ZHANG ET AL.: "Scale-aware hierarchical loss: A multipath RPN for multi-scale pedestrian detection", 《2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)》 * |
王飞: "基于区域的卷积神经网络及其在静态目标检测方面的应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (50)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886155A (en) * | 2019-01-30 | 2019-06-14 | 华南理工大学 | Man power single stem rice detection localization method, system, equipment and medium based on deep learning |
CN109886155B (en) * | 2019-01-30 | 2021-08-10 | 华南理工大学 | Single-plant rice detection and positioning method, system, equipment and medium based on deep learning |
CN109815770A (en) * | 2019-01-31 | 2019-05-28 | 北京旷视科技有限公司 | Two-dimentional code detection method, apparatus and system |
CN109816036A (en) * | 2019-01-31 | 2019-05-28 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109871890A (en) * | 2019-01-31 | 2019-06-11 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109816036B (en) * | 2019-01-31 | 2021-08-27 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109886230A (en) * | 2019-02-28 | 2019-06-14 | 中南大学 | A kind of image object detection method and device |
CN111666958A (en) * | 2019-03-05 | 2020-09-15 | 中科院微电子研究所昆山分所 | Method, device, equipment and medium for detecting equipment state based on image recognition |
CN109948611A (en) * | 2019-03-14 | 2019-06-28 | 腾讯科技(深圳)有限公司 | A kind of method and device that method, the information of information area determination are shown |
CN109948611B (en) * | 2019-03-14 | 2022-07-08 | 腾讯科技(深圳)有限公司 | Information area determination method, information display method and device |
CN110008951A (en) * | 2019-03-14 | 2019-07-12 | 深兰科技(上海)有限公司 | A kind of object detection method and device |
CN110008951B (en) * | 2019-03-14 | 2020-12-15 | 深兰科技(上海)有限公司 | Target detection method and device |
CN110111299A (en) * | 2019-03-18 | 2019-08-09 | 国网浙江省电力有限公司信息通信分公司 | Rust staining recognition methods and device |
CN110096960B (en) * | 2019-04-03 | 2021-06-08 | 罗克佳华科技集团股份有限公司 | Target detection method and device |
CN111797657A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Vehicle peripheral obstacle detection method, device, storage medium, and electronic apparatus |
CN110135307A (en) * | 2019-04-30 | 2019-08-16 | 北京邮电大学 | Method for traffic sign detection and device based on attention mechanism |
CN111914600A (en) * | 2019-05-08 | 2020-11-10 | 四川大学 | Group emotion recognition method based on space attention model |
CN110298821A (en) * | 2019-05-28 | 2019-10-01 | 昆明理工大学 | A kind of reinforcing bar detection method based on Faster R-CNN |
CN110287836A (en) * | 2019-06-14 | 2019-09-27 | 北京迈格威科技有限公司 | Image classification method, device, computer equipment and storage medium |
CN110287836B (en) * | 2019-06-14 | 2021-10-15 | 北京迈格威科技有限公司 | Image classification method and device, computer equipment and storage medium |
CN110135406A (en) * | 2019-07-09 | 2019-08-16 | 北京旷视科技有限公司 | Image-recognizing method, device, computer equipment and storage medium |
CN112241669A (en) * | 2019-07-18 | 2021-01-19 | 杭州海康威视数字技术股份有限公司 | Target identification method, device, system and equipment, and storage medium |
CN110427898A (en) * | 2019-08-07 | 2019-11-08 | 广东工业大学 | Wrap up safety check recognition methods, system, device and computer readable storage medium |
CN110532955A (en) * | 2019-08-30 | 2019-12-03 | 中国科学院宁波材料技术与工程研究所 | Example dividing method and device based on feature attention and son up-sampling |
CN110532955B (en) * | 2019-08-30 | 2022-03-08 | 中国科学院宁波材料技术与工程研究所 | Example segmentation method and device based on feature attention and sub-upsampling |
CN110533119A (en) * | 2019-09-04 | 2019-12-03 | 北京迈格威科技有限公司 | The training method of index identification method and its model, device and electronic system |
CN110674886B (en) * | 2019-10-08 | 2022-11-25 | 中兴飞流信息科技有限公司 | Video target detection method fusing multi-level features |
CN110674886A (en) * | 2019-10-08 | 2020-01-10 | 中兴飞流信息科技有限公司 | Video target detection method fusing multi-level features |
CN111008555A (en) * | 2019-10-21 | 2020-04-14 | 武汉大学 | Unmanned aerial vehicle image small and weak target enhancement extraction method |
CN110837789A (en) * | 2019-10-31 | 2020-02-25 | 北京奇艺世纪科技有限公司 | Method and device for detecting object, electronic equipment and medium |
CN110837789B (en) * | 2019-10-31 | 2023-01-20 | 北京奇艺世纪科技有限公司 | Method and device for detecting object, electronic equipment and medium |
WO2021083126A1 (en) * | 2019-10-31 | 2021-05-06 | 北京市商汤科技开发有限公司 | Target detection and intelligent driving methods and apparatuses, device, and storage medium |
JP2022535473A (en) * | 2019-10-31 | 2022-08-09 | ベイジン センスタイム テクノロジー デベロップメント シーオー.,エルティーディー | Target detection, intelligent driving methods, devices, equipment and storage media |
CN111144238A (en) * | 2019-12-11 | 2020-05-12 | 重庆邮电大学 | Article detection method and system based on Faster R-CNN |
CN111163294A (en) * | 2020-01-03 | 2020-05-15 | 重庆特斯联智慧科技股份有限公司 | Building safety channel monitoring system and method for artificial intelligence target recognition |
CN111340092A (en) * | 2020-02-21 | 2020-06-26 | 浙江大华技术股份有限公司 | Target association processing method and device |
CN111340092B (en) * | 2020-02-21 | 2023-09-22 | 浙江大华技术股份有限公司 | Target association processing method and device |
WO2021218037A1 (en) * | 2020-04-29 | 2021-11-04 | 北京迈格威科技有限公司 | Target detection method and apparatus, computer device and storage medium |
CN111598882B (en) * | 2020-05-19 | 2023-11-24 | 联想(北京)有限公司 | Organ detection method, organ detection device and computer equipment |
CN111598882A (en) * | 2020-05-19 | 2020-08-28 | 联想(北京)有限公司 | Organ detection method and device and computer equipment |
CN111797737A (en) * | 2020-06-22 | 2020-10-20 | 重庆高新区飞马创新研究院 | Remote sensing target detection method and device |
CN111914997A (en) * | 2020-06-30 | 2020-11-10 | 华为技术有限公司 | Method for training neural network, image processing method and device |
CN111914997B (en) * | 2020-06-30 | 2024-04-02 | 华为技术有限公司 | Method for training neural network, image processing method and device |
CN112036400B (en) * | 2020-07-09 | 2022-04-05 | 北京航空航天大学 | Method for constructing network for target detection and target detection method and system |
CN112036400A (en) * | 2020-07-09 | 2020-12-04 | 北京航空航天大学 | Method for constructing network for target detection and target detection method and system |
CN112016569A (en) * | 2020-07-24 | 2020-12-01 | 驭势科技(南京)有限公司 | Target detection method, network, device and storage medium based on attention mechanism |
CN111860413A (en) * | 2020-07-29 | 2020-10-30 | Oppo广东移动通信有限公司 | Target object detection method and device, electronic equipment and storage medium |
CN112016511A (en) * | 2020-09-08 | 2020-12-01 | 重庆市地理信息和遥感应用中心 | Remote sensing image blue top room detection method based on large-scale depth convolution neural network |
CN116580027A (en) * | 2023-07-12 | 2023-08-11 | 中国科学技术大学 | Real-time polyp detection system and method for colorectal endoscope video |
CN116580027B (en) * | 2023-07-12 | 2023-11-28 | 中国科学技术大学 | Real-time polyp detection system and method for colorectal endoscope video |
Also Published As
Publication number | Publication date |
---|---|
CN109255352B (en) | 2021-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109255352A (en) | Object detection method, apparatus and system | |
Shi et al. | A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection | |
US11321868B1 (en) | System for estimating a pose of one or more persons in a scene | |
Yu et al. | Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module | |
Raza et al. | Appearance based pedestrians’ head pose and body orientation estimation using deep learning | |
Li et al. | A densely attentive refinement network for change detection based on very-high-resolution bitemporal remote sensing images | |
Sirmacek et al. | Urban-area and building detection using SIFT keypoints and graph theory | |
Zhao et al. | Saliency detection by multi-context deep learning | |
CN109492638A (en) | Method for text detection, device and electronic equipment | |
CN109117879A (en) | Image classification method, apparatus and system | |
CN109815770A (en) | Two-dimentional code detection method, apparatus and system | |
Xu et al. | Effective face detector based on yolov5 and superresolution reconstruction | |
Lu et al. | Co-bootstrapping saliency | |
CN108280455A (en) | Human body critical point detection method and apparatus, electronic equipment, program and medium | |
Zhang et al. | Feature pyramid network for diffusion-based image inpainting detection | |
CN109711416A (en) | Target identification method, device, computer equipment and storage medium | |
CN112132739A (en) | 3D reconstruction and human face posture normalization method, device, storage medium and equipment | |
CN109492576A (en) | Image-recognizing method, device and electronic equipment | |
CN108875456A (en) | Object detection method, object detecting device and computer readable storage medium | |
CN109522970A (en) | Image classification method, apparatus and system | |
EP3642764A1 (en) | Learning unified embedding | |
CN110210480A (en) | Character recognition method, device, electronic equipment and computer readable storage medium | |
Luo et al. | Infrared and visible image fusion based on Multi-State contextual hidden Markov Model | |
Ouadiay et al. | Simultaneous object detection and localization using convolutional neural networks | |
Liu et al. | Double Mask R‐CNN for Pedestrian Detection in a Crowd |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |