CN109255352A - Object detection method, apparatus and system - Google Patents

Object detection method, apparatus and system Download PDF

Info

Publication number
CN109255352A
CN109255352A CN201811049034.4A CN201811049034A CN109255352A CN 109255352 A CN109255352 A CN 109255352A CN 201811049034 A CN201811049034 A CN 201811049034A CN 109255352 A CN109255352 A CN 109255352A
Authority
CN
China
Prior art keywords
feature
information
layer
fisrt feature
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811049034.4A
Other languages
Chinese (zh)
Other versions
CN109255352B (en
Inventor
秦政
黎泽明
俞刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN201811049034.4A priority Critical patent/CN109255352B/en
Publication of CN109255352A publication Critical patent/CN109255352A/en
Application granted granted Critical
Publication of CN109255352B publication Critical patent/CN109255352B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of object detection methods, apparatus and system, are related to field of artificial intelligence, this method comprises: obtaining target image to be detected;Feature extraction is carried out to target image, generates fisrt feature figure;Wherein, fisrt feature figure includes the characteristic information of different scale;Region candidate identification is carried out to fisrt feature figure, obtains the candidate region information of target image;According to candidate region information and fisrt feature figure, testing result is generated;The testing result includes target category and/or target position in target image.The present invention can effectively promote detection effect.

Description

Object detection method, apparatus and system
Technical field
The present invention relates to field of artificial intelligence, more particularly, to a kind of object detection method, apparatus and system.
Background technique
Target detection (Object Detection) is the very important task of one kind in computer vision, is such as people The basis of many complicated visual tasks such as face detection, target following, example segmentation.Existing object detection method is mostly based on volume Product neural fusion, can detecte out the object category for including in image, can also orient target object in the picture Position is widely used to the fields such as security system, traffic system.It is understood that object detection results pair Each application is of great significance, and the detection effect of existing object detection method is bad.
Summary of the invention
In view of this, can preferably be mentioned the purpose of the present invention is to provide a kind of object detection method, apparatus and system Rise detection effect.
To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:
In a first aspect, the embodiment of the invention provides a kind of object detection methods, comprising: obtain target figure to be detected Picture;Feature extraction is carried out to the target image, generates fisrt feature figure;Wherein, the fisrt feature figure includes different rulers The characteristic information of degree;Region candidate identification is carried out to the fisrt feature figure, obtains the candidate region information of the target image; According to multiple candidate regions and the fisrt feature figure, testing result is generated;The testing result includes the target figure Target category and/or target position as in.
Further, the step of acquisition target image to be detected, comprising: obtain initial pictures to be detected;To institute It states initial pictures to be pre-processed, obtains target image;Wherein, the pretreatment includes whitening operation.
Further, described the step of feature extraction is carried out to the target image, generates fisrt feature figure, comprising: by institute It states target image and is input to base neural network;Multistage feature is carried out to the target image by the base neural network It extracts, obtains the characteristic information of different scale;Wherein, the scale for the characteristic information that each stage extracts is different;By multiple fingers Determine stage corresponding feature fusion and forms fisrt feature figure.
Further, the described the step of corresponding feature fusion of multiple specified phases is formed into fisrt feature figure, comprising: Obtain the fisrt feature information that the penultimate stage of the base neural network extracts;Obtain the base neural network The second feature information that the last stage extracts;Global pool operation is carried out to the second feature information, obtains third spy Reference breath;Enhance network by context to believe the fisrt feature information, the second feature information and the third feature Breath fusion forms fisrt feature figure.
Further, the context enhancing network includes the first parallel convolutional layer, the second convolutional layer and third convolutional layer; Wherein, the output end of second convolutional layer is also connected with up-sampling operation layer, and the output end of the third convolutional layer is also connected with There is broadcast operation layer;The output end and the broadcast operation layer of the output end of first convolutional layer, the up-sampling operation layer Output end be connected with add operation layer jointly.
Further, described that network is enhanced for the fisrt feature information, the second feature information and institute by context It states third feature information and merges the step of forming fisrt feature figure, comprising: by the fisrt feature information input to described first Convolutional layer, by the second feature information input to second convolutional layer, and by the third feature information input to institute State third convolutional layer;Convolution operation is carried out to the fisrt feature information by first convolutional layer, obtains that there is specified ruler The fisrt feature information of degree;By second convolutional layer and the up-sampling operation layer successively to the second feature information into Row convolution operation and up-sampling operation, obtain the second feature information with the specified scale;Pass through the third convolutional layer Convolution operation and broadcast operation successively are carried out to the third feature information with the broadcast operation layer, obtain having described specified The third feature information of scale;By the add operation layer to the specified scale fisrt feature information, have institute It states the second feature information of specified scale and the third feature information with the specified scale sums up, form first Characteristic pattern.
Further, the base neural network is lightweight feature extraction network.
Further, described that candidate region identification is carried out to the fisrt feature figure, obtain the candidate regions of the target image The step of domain information, comprising: the fisrt feature figure is input to region candidate and generates network;It is generated by the region candidate Network carries out feature extraction to the fisrt feature figure, obtains intermediate features figure, and carry out candidate regions to the intermediate features figure Domain identification, obtains the candidate region information of the target image.
Further, it includes sequentially connected channel convolutional layer and Volume Four lamination that the region candidate, which generates network,.
Further, described according to the candidate region information and the fisrt feature figure, the step of generating testing result, packet It includes: the fisrt feature figure and the intermediate features figure is input to spatial attention network;Pass through the spatial attention The fisrt feature figure and the intermediate features figure are merged and to form second feature figure by network;Wherein, the second feature figure Foreground features are better than background characteristics;According to the candidate region information and the second feature figure, testing result is generated.
Further, the spatial attention network includes sequentially connected 5th convolutional layer and activation primitive layer;It is described to swash The output end of function layer living is connected with multiplying layer.
Further, batch normalization layer is also connected between the 5th convolutional layer and the activation primitive layer.
Further, described to be merged the fisrt feature figure and the intermediate features figure by the spatial attention network The step of forming second feature figure, comprising: the intermediate features figure is input to the 5th convolutional layer, passes through described volume five Lamination, described batch of normalization layer and the activation primitive layer are successively handled the intermediate features figure, obtain the activation The intermediate features figure after processing of function layer output;Wherein, the foreground features of the intermediate features figure after processing It is better than background characteristics;The intermediate features figure by the fisrt feature figure and after processing is input to the multiplying layer; Multiplying is carried out by the intermediate features figure of the multiplying layer to the fisrt feature figure and after processing, is generated Second feature figure.
Further, described according to the candidate region information and the second feature figure, the step of generating testing result, packet It includes: the candidate region information and the second feature figure is input to candidate region feature extraction layer;Pass through the candidate regions Characteristic of field extract layer is based on the candidate region information, and the region of each candidate region is extracted on the second feature figure Feature;Provincial characteristics based on each candidate region carries out target detection, generates testing result.
Further, the provincial characteristics based on each candidate region carries out target detection, generates testing result Step, comprising: classification processing is carried out to the provincial characteristics of each candidate region by the classification sub-network, determine described in Target category in target image;And/or the provincial characteristics of each candidate region is returned by returning sub-network Processing, obtains the target position in the target image.
Further, the classification sub-network and the recurrence sub-network are a full articulamentum.
Second aspect, the embodiment of the present invention also provide a kind of object detecting device, comprising: image collection module, for obtaining Take target image to be detected;Fisrt feature figure generation module generates first for carrying out feature extraction to the target image Characteristic pattern;Wherein, the fisrt feature figure includes the characteristic information of different scale;Candidate identification module, for described the One characteristic pattern carries out region candidate identification, obtains the candidate region information of the target image;Detection module, for according to Candidate region information and the fisrt feature figure generate testing result;The testing result includes the mesh in the target image Mark classification and/or target position.
The third aspect, the embodiment of the invention provides a kind of object detection system, the system comprises: image collector It sets, processor and storage device;Described image acquisition device, for acquiring target image;Meter is stored on the storage device Calculation machine program, the computer program execute the described in any item methods of above-mentioned first aspect when being run by the processor.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage Computer program is stored on medium, the computer program is executed when being run by processor described in above-mentioned any one of first aspect Method the step of.
The embodiment of the invention provides a kind of object detection method, apparatus and system, can to the target image of acquisition into Row feature extraction, generation include the fisrt feature figure of the characteristic information of different scale;Then region is carried out to fisrt feature figure Candidate's identification obtains candidate region information, and then can generate testing result according to candidate region information and fisrt feature figure. Aforesaid way provided in this embodiment can carry out target detection using the characteristic information of different scale, and detection is effectively promoted Effect.
Other features and advantages of the present invention will illustrate in the following description, alternatively, Partial Feature and advantage can be with Deduce from specification or unambiguously determine, or by implement the disclosure above-mentioned technology it can be learnt that.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 shows the structural schematic diagram of a kind of electronic equipment provided by the embodiment of the present invention;
Fig. 2 shows a kind of target detection flow charts provided by the embodiment of the present invention;
Fig. 3 shows a kind of structural schematic diagram of context enhancing network provided by the embodiment of the present invention;
Fig. 4 shows a kind of structural schematic diagram of spatial attention network provided by the embodiment of the present invention;
Fig. 5 shows a kind of structural schematic diagram of target detection model provided by the embodiment of the present invention;
Fig. 6 shows a kind of structural block diagram of object detection system provided by the embodiment of the present invention;
Fig. 7 shows a kind of fisrt feature figure generation schematic diagram provided by the embodiment of the present invention;
Fig. 8 shows a kind of second feature figure generation schematic diagram provided by the embodiment of the present invention;
Fig. 9 shows a kind of structural block diagram of object detecting device provided by the embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.
In view of target detection in the prior art is ineffective, to improve this problem, provided in an embodiment of the present invention one Kind object detection method, apparatus and system, which can be used corresponding software or hardware realization, below to the embodiment of the present invention It describes in detail.
Embodiment one:
Firstly, describing the example of object detection method for realizing the embodiment of the present invention, apparatus and system referring to Fig.1 Electronic equipment 100.
The structural schematic diagram of a kind of electronic equipment as shown in Figure 1, electronic equipment 100 include one or more processors 102, one or more storage devices 104, input unit 106, output device 108 and image collecting device 110, these components It is interconnected by bindiny mechanism's (not shown) of bus system 112 and/or other forms.It should be noted that electronic equipment shown in FIG. 1 100 component and structure be it is illustrative, and not restrictive, as needed, the electronic equipment also can have other Component and structure.
The processor 102 can use digital signal processor (DSP), field programmable gate array (FPGA), can compile At least one of journey logic array (PLA) example, in hardware realizes that the processor 102 can be central processing unit (CPU) or one or more of the processing unit of other forms with data-handling capacity and/or instruction execution capability Combination, and can control other components in the electronic equipment 100 to execute desired function.
The storage device 104 may include one or more computer program products, and the computer program product can To include various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.It is described easy The property lost memory for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non- Volatile memory for example may include read-only memory (ROM), hard disk, flash memory etc..In the computer readable storage medium On can store one or more computer program instructions, processor 102 can run described program instruction, to realize hereafter institute The client functionality (realized by processor) in the embodiment of the present invention stated and/or other desired functions.In the meter Can also store various application programs and various data in calculation machine readable storage medium storing program for executing, for example, the application program use and/or The various data etc. generated.
The input unit 106 can be the device that user is used to input instruction, and may include keyboard, mouse, wheat One or more of gram wind and touch screen etc..
The output device 108 can export various information (for example, image or sound) to external (for example, user), and It and may include one or more of display, loudspeaker etc..
Described image acquisition device 110 can shoot the desired image of user (such as photo, video etc.), and will be clapped The image taken the photograph is stored in the storage device 104 for the use of other components.
Illustratively, for realizing object detection method according to an embodiment of the present invention, the exemplary electron of apparatus and system Equipment may be implemented as the intelligent terminals such as smart phone, tablet computer, computer.
Embodiment two:
A kind of target detection flow chart shown in Figure 2, this method can be held by the electronic equipment that previous embodiment provides Row, this method specifically comprise the following steps:
Step S202 obtains target image to be detected.It wherein, include target pair to be detected in the target image As.Wherein, the classification of target object can sets itself according to actual needs, such as setting target object can for people, cat, Vehicle etc..
In one embodiment, the picture frame that the image capture devices such as camera can be directly acquired is as mesh Logo image.In another embodiment, initial pictures to be detected can be obtained by camera first, then to initial graph As being pre-processed, target image is obtained;That is, being pre-processed to the picture frame that camera directly acquires, after pretreatment Image as target image.Wherein, pretreatment may include the image processing operations such as whitening operation.Wherein, whitening operation It can be described as averaging operation, primary operational process is that each channel of initial pictures is subtracted to corresponding preset in the channel to be averaged Value, then again divided by the corresponding variance in the channel, to obtain pretreated image.By being located in advance to initial pictures Reason, obtains satisfactory image, can effectively accelerate to detect speed.Such as, target image is carried out using neural network When detection, the convergence rate of neural network is helped speed up through handling obtained target image.
Step S204 carries out feature extraction to target image, generates fisrt feature figure;Wherein, fisrt feature figure includes The characteristic information of different scale.
In the specific implementation, fisrt feature map generalization mode can be with are as follows: target image is input to base neural network; Multistage feature extraction is carried out to target image by base neural network, obtains the characteristic information of different scale;Wherein, each The scale for the characteristic information that stage extracts is different;It is special that the corresponding feature fusion of multiple specified phases is finally formed first Sign figure.
Above-mentioned base neural network is the backbone network that can extract characteristics of image, and main function is to extract image spy Sign, and generate characteristic pattern.In order to promote detection efficiency, it can choose the lightweights such as Xception or ShuffleNet Feature extraction network is as the base neural network in the present embodiment.Wherein, lightweight feature extraction network is characterized in that net Network structure is simple, memory requirements is low, operand is smaller and detection efficiency is higher.
Step S206 carries out region candidate identification to fisrt feature figure, obtains the candidate region information of target image.Its In, candidate region information may include location information and confidence level of multiple candidate regions etc..Candidate region namely target image Middle may include the region of target object.
Such as, fisrt feature figure is input to region candidate and generates network (Region Proposal Network, RPN); Network is generated by region candidate, feature extraction is carried out to fisrt feature figure, obtain intermediate features figure, and to the intermediate features Figure carries out candidate region identification, obtains the candidate region information of target image.Step S206 can be shown that the embodiment of the present invention provides Object detection method specifically use the basic principle of two-stage detection algorithm.Wherein, two-stage detection algorithm is substantially former Reason are as follows: the candidate region (and can be described as candidate frame) first in forecast image, being then based on candidate region prediction includes target The target area (and can be described as detection block) of object.For the stage detection algorithm for directly predicting target area, two The detection accuracy of stage detection algorithm is more preferable.The networks mould such as Faster R-CNN, R-FCN can be used in two-stage detection algorithm Type is realized, wherein it is that network is commonly used used by various two-stage detection models that region candidate, which generates network, multiple for generating Candidate region.
Step S208 generates testing result according to candidate region information and fisrt feature figure;Testing result includes target figure Target category and/or target position as in.
Above-mentioned object detection method provided in an embodiment of the present invention can carry out feature extraction to the target image of acquisition, Generation includes the fisrt feature figure of the characteristic information of different scale;Then region candidate identification is carried out to fisrt feature figure, obtained To candidate region information, and then testing result can be generated according to candidate region information and fisrt feature figure.The present embodiment provides Aforesaid way can using the characteristic information of different scale carry out target detection, detection effect be effectively promoted.
When carrying out feature extraction to target image using base neural network, it is to be understood that the spy of neural network Sign extraction process generally includes multiple stages, and each stage can (characteristic information can to the characteristic information that the last stage obtains To be embodied in the form of characteristic pattern) feature is further extracted, obtain the stage corresponding characteristic information, the spy that different phase obtains The scale of reference breath is different.In order to incorporate the semantic information and contextual information of different scale during target detection, The present embodiment chooses several stages as specified phases, by the corresponding spy of specified phases from multiple stages of base neural network Reference breath fusion forms fisrt feature figure.
In one embodiment, specified phases can based on neural network most latter two stage, such as, if base Plinth neural network altogether there are four the stage, then selects phase III and fourth stage as specified rank in characteristic extraction procedure Section.When the corresponding feature fusion of multiple specified phases is formed fisrt feature figure, it is referred to following steps realization:
Step 1, the fisrt feature information that the penultimate stage of base neural network extracts is obtained;
Step 2, the second feature information that the last stage of base neural network extracts is obtained;
Step 3, global pool operation is carried out to second feature information, obtains third feature information;
Step 4, enhance network by context to merge fisrt feature information, second feature information and third feature information Form fisrt feature figure.
By the above-mentioned means, being capable of effectively multi-scale expression target image.It is understood that extracting characteristics of image When, if taking the feature detection mode of fixed size, the testing result for being biased to the scale will be obtained, and the other scales of missing inspection Feature.Based on this, the embodiment of the present invention carries out multistage feature extraction to target image by base neural network, can will scheme As being detected and being matched on multiple scales, so that the characteristic information that the fisrt feature figure made is included is more accurate.
In one embodiment, a kind of structural schematic diagram of context shown in Figure 3 enhancing network, context increase Strong network may include the first parallel convolutional layer, the second convolutional layer and third convolutional layer;Wherein, the output end of the second convolutional layer It is also connected with up-sampling operation layer, the output end of third convolutional layer is also connected with broadcast operation layer.The output end of first convolutional layer, The output end of the output end and broadcast operation layer that up-sample operation layer is connected with add operation layer jointly.
The parameter of first convolutional layer, the second convolutional layer and third convolutional layer can be identical or different, and such as, all selection includes There is the convolutional layer that 245 sizes are the convolution kernel of 1*1 to realize, so that the characteristic information received to be passed through to the convolution kernel pressure of 1*1 It is condensed to 245 channels.
Fisrt feature information, second feature information and third feature information are merged and to be formed enhancing network by context When fisrt feature figure, it is referred to following steps realization:
(1) by fisrt feature information input to the first convolutional layer, by second feature information input to the second convolutional layer, and By third feature information input to third convolutional layer;That is, the characteristic information that different phase is obtained be separately input to it is corresponding In convolutional layer;
(2) convolution operation is carried out to fisrt feature information by the first convolutional layer, it is special obtains first with specified scale Reference breath;Such as, fisrt feature Informational Expression is the characteristic pattern that size is 20*20, passes through the 1*1 convolution operation of the first convolutional layer Afterwards, the characteristic pattern that specified size is 20*20 is obtained.
(3) convolution operation and up-sampling are successively carried out to second feature information by the second convolutional layer and up-sampling operation layer Operation obtains the second feature information with specified scale;Such as, second feature Informational Expression is the feature that size is 10*10 Figure is obtained after the 1*1 convolution operation of the second convolutional layer and twice of up-sampling (2*Upsample) of up-sampling operation layer The characteristic pattern that specified size is 20*20.
(4) convolution operation and setting-up exercises to music are successively carried out to third feature information by third convolutional layer and broadcast operation layer Make, obtains the third feature information with specified scale;Such as, third feature Informational Expression is the characteristic pattern of 1*1, passes through third After the 1*1 convolution operation of convolutional layer and the broadcast operation of broadcast operation layer, the characteristic pattern that specified size is 20*20 is obtained.
(5) by add operation layer to fisrt feature information, the second feature with specified scale with specified scale Information and third feature information with specified scale sum up, and form fisrt feature figure.
Same scale (size) is converted to by the characteristic information for obtaining different phase, so that different phase is obtained Characteristic information sums up, and obtains finally comprising there are many characteristic patterns of semantic information and contextual information.
In order to further enhance target detection speed, network is generated compared to conventional region candidate, the present embodiment provides A kind of region candidate that structure is simplified generates network, the region candidate generate network include sequentially connected channel convolutional layer and Volume Four lamination.Such as, the channel convolution that it is 5*5 including 1 size which, which can be, which can be with The convolution kernel for being 1*1 including 256 sizes, and can be described as the 1*1 Standard convolution in 256 channels.It is generated by this region candidate Network can easily identify that the candidate region in characteristic pattern such as can generate up to 200 candidate regions.
In order to further promote target detection precision, according to candidate region information and fisrt feature figure, detection is generated As a result a kind of embodiment, which may is that, generates network in the mistake for generating candidate region information for fisrt feature figure and region candidate Intermediate features figure obtained in journey is input to spatial attention network;By spatial attention network by fisrt feature figure and centre Characteristic pattern merges to form second feature figure;Wherein, the foreground features of second feature figure are better than background characteristics, can be regarded as again The characteristic value (abbreviation foreground features value) of the foreground area of two characteristic patterns is higher than characteristic value (the abbreviation background characteristics of background area Value).According to candidate region information and second feature figure, testing result is generated.Exist it is understood that region candidate generates network It includes foreground information and background information that it is also potential, which to generate obtained intermediate features figure during the information of candidate region, In, foreground information can be understood as the information of target object region again, and background information can be regarded as not comprising target pair The information in the region of elephant;Spatial attention network is based on intermediate features figure, can be to the feature of the foreground area in fisrt feature figure Carry out enhancing processing (such as, increasing foreground features value), the feature of background area carries out weakening process (such as, reduction background spy Value indicative), to obtain the second feature figure that foreground features are better than background characteristics.For ease of understanding, as follows in this simple examples: false If the foreground features value in fisrt feature figure is 0.5, background characteristics value is 0.4, and the discrimination of foreground area and background area is not Greatly;But after spatial attention network processes, foreground features value can be promoted to 0.6, background characteristics value is reduced to 0.1, so that foreground features are significantly stronger than background characteristics, the discrimination of foreground area and background area is increased, and effectively highlight Foreground area out.It should be noted that the foreground features value in fisrt feature figure has been illustrated the above is only exemplary illustration Slightly better than background characteristics value but when numerical value is not much different, spatial attention network can still enhance further foreground features value, into one Step weakens background characteristics value, to increase the discrimination between foreground area and background area.Certain fisrt feature figure also will appear Foreground features value be lower than background characteristics value the case where, at this time spatial attention network can greatly be promoted foreground features value with And background characteristics value is reduced, so that foreground features value be made to be higher than background characteristics value, details are not described herein.Space is used by this Attention network enhances foreground features, weakens the mode of background characteristics, facilitates the feature for enhancing candidate region, namely after being allowed to The feature for the candidate region directly extracted on second feature figure more highlights, so that testing result is more accurate.
In a kind of embodiment, spatial attention network includes sequentially connected 5th convolutional layer and activation primitive layer;Swash The output end of function layer living is connected with multiplying layer.Such as, the 5th convolutional layer may include the volume that 245 sizes are 1*1 Product core, the activation primitive layer can be realized using Sigmoid activation primitive.In another embodiment, the 5th convolutional layer and Batch normalization layer (BatchNorm) is also connected between activation primitive layer.
A kind of structural schematic diagram of spatial attention network as shown in connection with fig. 4, is further described and passes through spatial attention Network generation second feature figure specific implementation: intermediate features figure is input to the 5th convolutional layer, by the 5th convolutional layer, Batch normalization layer and activation primitive layer are successively handled intermediate characteristic pattern, obtain the output of activation primitive layer after processing Intermediate features figure;Wherein, the foreground features of intermediate features figure after processing are better than background characteristics;By fisrt feature figure and through locating Intermediate features figure after reason is input to multiplying layer;Enhancing by multiplying layer to fisrt feature figure and each candidate region Feature carries out multiplying, generates second feature figure.Intermediate features figure after processing can embody the feature power in each region Weight, above-mentioned spatial attention, which generates network, to be weighted (re- to the feature in fisrt feature figure based on feature weight Weight), the foreground features in second feature figure obtained after weighting are better than background characteristics, help to highlight target location Domain promotes the accuracy of object detection results.
After generating second feature figure, candidate region information and second feature figure can be input to candidate region feature Extract layer;It is based on candidate region information by candidate region feature extraction layer, each candidate region is extracted on second feature figure Provincial characteristics;The provincial characteristics for being then based on each candidate region carries out target detection, generates testing result.Candidate region mentions It takes layer when extracting provincial characteristics, one of such as following operation can be executed to candidate region;RoI pooling (Region of Interest pooling, area-of-interest pond) operation, PSRoI pooling (Position Sensitive Region of Interest pooling, the area-of-interest pond of position sensing) operation, RoI align (Region of Interest align, area-of-interest alignment) operation or PSRoI align (Position Sensitive Region of Interest align, the area-of-interest alignment of position sensing) operation etc..
When extracting the provincial characteristics of each candidate region by candidate region feature extraction layer, then each candidate can be based on The provincial characteristics in region carries out target detection, specifically, can be special by region of the classification sub-network to each candidate region Sign carries out classification processing, determines the target category in target image;And/or by returning sub-network to each candidate region Provincial characteristics carries out recurrence processing, obtains the target position in target image.
In order to further promote target detection efficiency, shorten detection time, the classification sub-network that the present embodiment uses It can be a full articulamentum with sub-network is returned.Wherein, the port number as the full articulamentum of classification sub-network can be Classification number;Port number as the full articulamentum for returning sub-network can be 4 channels.In addition to this, classification sub-network and recurrence It can be connected to a full articulamentum again before sub-network, by the full articulamentum first to the provincial characteristics of each candidate region into one Step extracts feature, can preferably be divided for the provincial characteristics further extracted to classify sub-network with sub-network is returned Class and frame return.In one embodiment, classification sub-network and the full articulamentum before returning sub-network can have 1024 Channel.
For ease of understanding, a kind of structural schematic diagram of target detection model shown in fig. 5, the target detection mould be may refer to Type can be used for realizing above-mentioned object detection method, specifically illustrate base neural network, context enhancing network, space transforms Power network, region candidate generate network, candidate region feature extraction layer, classification sub-network and the connection relationship for returning sub-network, Details are not described herein for the specific effect of each network.It should be noted that target detection model shown in fig. 5 is only a kind of example, In practical applications, adaptability doses other network structures or deletes portion in target detection model that can be shown in Fig. 5 Subnetwork structure.
In conclusion the above-mentioned object detection method provided through this embodiment, can be had using context enhancing network Effect combines the semantic information and contextual information of different scale, so that characteristic pattern includes the characteristic information there are many scale;Using Spatial attention network can the feature to candidate region carry out enhancing processing, so as to preferably based on the time with Enhanced feature Favored area carries out target detection, obtains more accurate testing result.Moreover, the base neural network that the present embodiment uses is light Quantative feature extracts network, context enhancing network, spatial attention network, classification sub-network and the recurrence that the present embodiment proposes Sub-network structure is simplified, and operand is smaller, effectively improves detection efficiency.In conclusion the above-mentioned target that the present embodiment proposes Detection method can effectively promote target detection precision and target detection speed.
Embodiment three:
A kind of specific example using preceding aim detection method is present embodiments provided, is specifically illustrated a kind of based on deep The object detection system (and can be described as target detection model) of neural network is spent, which is mainly to current light Magnitude two-stage algorithm of target detection (such as, Light-Head R-CNN) is improved, efficient, high-precision to realize Target detection.
Generally speaking, object detection system provided in an embodiment of the present invention mainly includes following three modules: image is located in advance Manage module, region candidate (Region Proposal) extraction module and region candidate identification module.Wherein, image preprocessing mould Block is responsible for pre-processing the image (that is, aforementioned original image) of input, and region candidate extraction module mainly uses convolution Neural network generates potential target area (that is, aforementioned candidates region), and region candidate identification module mainly uses nerve The region candidate that network extracts region candidate extraction module identifies, obtains testing result to the end.In practical applications, Object detection system can also be not provided with image pre-processing module, but original image is directly input to region candidate and extracts mould Block.The main function of image pre-processing module is to accelerate target detection speed.
Specifically, present embodiments providing a kind of structural block diagram of object detection system as shown in FIG. 6, Fig. 6 Fig. 5 A kind of specific implementation, more visualize and embody for Fig. 5, below in conjunction with Fig. 6 to the present embodiment provides Lightweight two-stage detection method be further described below:
Step 1: image procossing
Whitening operation is carried out to image to be detected, obtains the target image that can be input to neural network, and by target figure As being scaled 320*320 pixel size.Specifically, input (Input) namely target image shown in fig. 6, size 320* 320*3。
Step 2: candidate region is extracted
Above-mentioned target image is input to base neural network (and can be described as backbone network, backbone), passes through basis The feature of neural network extraction target area.In order to improve detection efficiency, the basic network Xception of lightweight can be used It is realized with ShuffleNet.
In order to enhance the character representation ability of object detection system, object detection system shown in fig. 6 is also illustrated Hereafter enhance network (and can be described as context enhancing module, Context Enhancement Module, CEM), to merge The semantic information and contextual information of different scale.The context as shown in Figure 3 provided in conjunction with last embodiment enhances network Structural schematic diagram, a kind of fisrt feature figure shown in Figure 7 generates schematic diagram, and it makes use of the in base neural network the 3rd The characteristic pattern C that a stage generates4(scale 20*20), the characteristic pattern C that the 4th stage generates5(scale 10*10), Yi Jite Sign figure C5Through the characteristic pattern C after global pool (global avg pooling)glb(scale 1*1), characteristic pattern C4Through upper and lower The first convolutional layer in text enhancing network carries out 1*1 convolution and 245 channels of boil down to, obtains the C that scale is 20*204—lat; Characteristic pattern C51*1 convolution, and 245 channels of boil down to are carried out through the second convolutional layer in context enhancing network, are passed through again later It crosses up-sampling operation layer and carries out twice of up-sampling operation (2*Upsample), obtain the C that scale is 20*205—lat, characteristic pattern Cglb 1*1 convolution, and 245 channels of boil down to are carried out through the third convolutional layer in context enhancing network, later using setting-up exercises to music Make layer and carry out broadcast operation (Broadcast), obtains the C that scale is 20*20glb—lat, final C4—lat、C5—latAnd Cglb—latIt is logical The addition of add operation layer is crossed, fisrt feature figure CEM_fm (size 20*20*245) is obtained.
Later, fisrt feature figure CEM_fm is input to region candidate and generates network RPN, it is potential to be generated by RPN Target frame (bounding box), potential target frame are aforementioned candidates region.In order to promote computational efficiency, region candidate is raw It only include the 1x1 Standard convolution of the channel 5x5 convolution (a depthwise convolution) and 256 channels at network. In the specific implementation, network is generated by region candidate, every picture can produce up to 200 candidate regions.Specifically, Illustrate that region candidate generates network and is based on fisrt feature figure CEM_fm generation intermediate features figure RPN_fm (size 20* in Fig. 6 20*256), it then also illustrates and Rols (Region of Interest, area-of-interest) is generated by RPN_fm, also to obtain the final product To candidate region information.
Step 3: identification candidate region
In order to further enhance the character representation ability of object detection system, object detection system shown in fig. 6 is also illustrated Spatial attention network (and can be described as space transforms power module, Spatial Attention Module, SAM) is gone out, has been used to The feature in fisrt feature figure generated to context enhancing network is weighted (re-weight).Specifically, in conjunction with Fig. 4 Shown in a kind of structural schematic diagram of spatial attention network, the present embodiment illustrates a kind of second feature as shown in Figure 8 again Figure generates schematic diagram, illustrates that the intermediate features figure RPN_fm of RPN output successively passes through 1*1 convolutional layer, BatchNorm normalizing layer With Sigmoid active coating, element multiplication is carried out with fisrt feature figure CEM_fm, obtains second feature figure SAM_fm.Such as Fig. 6 institute Show, the size of the resulting second feature figure SAM_fm of spatial attention network is 20*20*245.
Later, second feature figure SAM_fm progress such as RoI pooling operation, PSRoI pooling can be operated, RoI align operation or PSRoI align operation etc., to extract provincial characteristics.As shown in fig. 6, based on Rols to second feature Figure SAM_fm executes PSROI Align operation (for sake of simplicity, not illustrating the candidate regions for executing PSROI Align operation in Fig. 6 Characteristic of field extract layer), the provincial characteristics Rol_fm (size 7*7*5) of each candidate region is obtained, and utilize R-CNN sub-network pair Each candidate region is identified that identification includes that classification (classification) and frame return (bounding box Regression) two tasks, finally obtain classification results and regression result.In practical applications, R-CNN sub-network can be first First include the full articulamentum (FC, fully-connected layer) in one layer of 1024 channel, then includes connecing to connect entirely at this parallel Two full articulamentums after layer are connect, one for classifying, port number is identical as classification number, another is returned for frame, Namely the coordinate of target frame is calculated, port number is 4 channels.For sake of simplicity, Fig. 6 be symbol illustrate one 1024 it is logical The full articulamentum FC in road can be used for carrying out feature to the provincial characteristics of candidate region extracting again, the feature that then will be extracted again Classification is carried out respectively and frame returns, and obtains classification results and regression result.In order to verify light weight provided in an embodiment of the present invention The performance of grade two-stage object detection method, by object detection method provided in an embodiment of the present invention on MS COCO data set It is compared with existing lightweight object detection method, the results are shown in Table 1.
Table 1
AP (Average Precision) in table 1 is represented by the average detected precision of each object detection method, MFLOPs is represented by calculation amount when each object detection method obtains testing result.From table 1 it follows that the present invention is implemented The object detection method (Mobile Light-Head R-CNN shown in last three row) that example provides can use less than half of Calculation amount realizes identical even preferably Detection accuracy;And under close calculation amount, target inspection provided in an embodiment of the present invention Obvious better Detection accuracy may be implemented in survey method.That is, object detection method provided in an embodiment of the present invention is effectively Improve target detection speed and target detection precision.
Example IV:
For object detection method provided in embodiment two, the embodiment of the invention provides a kind of target detection dresses It sets, a kind of structural block diagram of object detecting device shown in Figure 9, comprising:
Image collection module 902, for obtaining target image to be detected;
Fisrt feature figure generation module 904 generates fisrt feature figure for carrying out feature extraction to target image;Wherein, Fisrt feature figure includes the characteristic information of different scale;
Candidate identification module 906 obtains the candidate regions of target image for carrying out region candidate identification to fisrt feature figure Domain information;
Detection module 908, for generating testing result according to candidate region information and fisrt feature figure;Testing result packet Containing in target image target category and/or target position.
Above-mentioned object detecting device provided in an embodiment of the present invention can carry out feature extraction to the target image of acquisition, Generation includes the fisrt feature figure of the characteristic information of different scale;Then region candidate identification is carried out to fisrt feature figure, obtained To candidate region information, and then testing result can be generated according to candidate region information and fisrt feature figure.The present embodiment provides Aforesaid way can using the characteristic information of different scale carry out target detection, detection effect be effectively promoted.
In one embodiment, above-mentioned image collection module 902 is used for: obtaining initial pictures to be detected;To initial Image is pre-processed, and target image is obtained;Wherein, pretreatment includes whitening operation.
In one embodiment, above-mentioned fisrt feature figure generation module 904 is used for: target image is input to basic mind Through network;Multistage feature extraction is carried out to target image by base neural network, obtains the characteristic information of different scale;Its In, the scale for the characteristic information that each stage extracts is different;The corresponding feature fusion of multiple specified phases is formed the One characteristic pattern.
In one embodiment, above-mentioned fisrt feature figure generation module 904 is further used for: obtaining base neural network Penultimate stage extract fisrt feature information;Obtain the second spy that the last stage of base neural network extracts Reference breath;Global pool operation is carried out to second feature information, obtains third feature information;Enhance network for the by context One characteristic information, second feature information and third feature information merge to form fisrt feature figure.
In one embodiment, above-mentioned context enhancing network include the first parallel convolutional layer, the second convolutional layer and Third convolutional layer;Wherein, the output end of the second convolutional layer is also connected with up-sampling operation layer, and the output end of third convolutional layer also connects It is connected to broadcast operation layer;The output end of the output end of first convolutional layer, the output end for up-sampling operation layer and broadcast operation layer is total It is same to be connected with add operation layer.
In one embodiment, above-mentioned fisrt feature figure generation module 904 is further used for: fisrt feature information is defeated Enter to the first convolutional layer, by second feature information input to the second convolutional layer, and third feature information input to third is rolled up Lamination;Convolution operation is carried out to fisrt feature information by the first convolutional layer, obtains the fisrt feature information with specified scale; By the second convolutional layer and up-sampling operation layer successively carries out convolution operation to second feature information and up-sampling operates, and is had There is the second feature information of specified scale;Convolution successively is carried out to third feature information by third convolutional layer and broadcast operation layer Operation and broadcast operation obtain the third feature information with specified scale;By add operation layer to specified scale Fisrt feature information, the second feature information with specified scale and the third feature information with specified scale sum up place Reason forms fisrt feature figure.
In one embodiment, above-mentioned base neural network is lightweight feature extraction network.
In one embodiment, above-mentioned candidate identification module 906 is used for: it is raw that fisrt feature figure is input to region candidate At network;Network is generated by region candidate, feature extraction is carried out to fisrt feature figure, obtain intermediate features figure, and in described Between characteristic pattern carry out candidate region identification, obtain the candidate region information of target image.
In one embodiment, it includes sequentially connected channel convolutional layer and Volume Four that above-mentioned zone candidate, which generates network, Lamination.
In one embodiment, above-mentioned detection module 908 is for fisrt feature figure and intermediate features figure to be input to Spatial attention network;It merges fisrt feature figure and intermediate features figure to form second feature figure by spatial attention network; Wherein, the foreground features of second feature figure are better than background characteristics;According to candidate region information and second feature figure, detection knot is generated Fruit.
In one embodiment, above-mentioned spatial attention network includes sequentially connected 5th convolutional layer and activation primitive Layer;The output end of activation primitive layer is connected with multiplying layer.
In one embodiment, batch normalization layer is also connected between above-mentioned 5th convolutional layer and activation primitive layer.
In one embodiment, above-mentioned detection module 908 is further used for: intermediate features figure is input to the 5th convolution Layer successively handles intermediate characteristic pattern by the 5th convolutional layer, batch normalization layer and activation primitive layer, obtains activation primitive The intermediate features figure after processing of layer output;Wherein, the foreground features of intermediate features figure after processing are better than background characteristics; Intermediate features figure by fisrt feature figure and after processing is input to multiplying layer;By multiplying layer to fisrt feature figure Intermediate features figure after processing carries out multiplying, generates second feature figure.
In one embodiment, above-mentioned detection module 908 is further used for: by candidate region information and second feature figure It is input to candidate region feature extraction layer;It is based on candidate region information by candidate region feature extraction layer, in second feature figure The upper provincial characteristics for extracting each candidate region;Provincial characteristics based on each candidate region carries out target detection, generates detection As a result.
In one embodiment, above-mentioned detection module 908 is further used for: by classification sub-network to each candidate regions The provincial characteristics in domain carries out classification processing, determines the target category in target image;And/or by returning sub-network to each The provincial characteristics of candidate region carries out recurrence processing, obtains the target position in target image.
In one embodiment, above-mentioned classification sub-network and recurrence sub-network are a full articulamentum.
The technical effect of device provided by the present embodiment, realization principle and generation is identical with previous embodiment, for letter It describes, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
In addition, present embodiments provide a kind of object detection system, the system include: image collecting device, processor and Storage device;Image collecting device, for acquiring image to be detected;Computer program, computer journey are stored on storage device Sequence executes preceding aim detection method when being run by processor.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description Specific work process, can be with reference to the corresponding process in previous embodiment, and details are not described herein.
Further, a kind of computer readable storage medium is present embodiments provided, is deposited on the computer readable storage medium The step of containing computer program, method provided by above-described embodiment two executed when computer program is run by processor.
The computer program product of object detection method, apparatus and system provided by the embodiment of the present invention, including storage The computer readable storage medium of program code, the instruction that said program code includes can be used for executing previous methods embodiment Described in method, specific implementation can be found in embodiment of the method, details are not described herein.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
In the description of the present invention, it should be noted that term " center ", "upper", "lower", "left", "right", "vertical", The orientation or positional relationship of the instructions such as "horizontal", "inner", "outside" be based on the orientation or positional relationship shown in the drawings, merely to Convenient for description the present invention and simplify description, rather than the device or element of indication or suggestion meaning must have a particular orientation, It is constructed and operated in a specific orientation, therefore is not considered as limiting the invention.In addition, term " first ", " second ", " third " is used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (19)

1. a kind of object detection method characterized by comprising
Obtain target image to be detected;
Feature extraction is carried out to the target image, generates fisrt feature figure;Wherein, the fisrt feature figure includes different rulers The characteristic information of degree;
Region candidate identification is carried out to the fisrt feature figure, obtains the candidate region information of the target image;
According to the candidate region information and the fisrt feature figure, testing result is generated;The testing result includes the mesh Target category and/or target position in logo image.
2. the method according to claim 1, wherein the step of acquisition target image to be detected, comprising:
Obtain initial pictures to be detected;
The initial pictures are pre-processed, target image is obtained;Wherein, the pretreatment includes whitening operation.
3. being generated the method according to claim 1, wherein described carry out feature extraction to the target image The step of fisrt feature figure, comprising:
The target image is input to base neural network;
Multistage feature extraction is carried out to the target image by the base neural network, obtains the feature letter of different scale Breath;Wherein, the scale for the characteristic information that each stage extracts is different;
The corresponding feature fusion of multiple specified phases is formed into fisrt feature figure.
4. according to the method described in claim 3, it is characterized in that, described by the corresponding feature fusion of multiple specified phases The step of forming fisrt feature figure, comprising:
Obtain the fisrt feature information that the penultimate stage of the base neural network extracts;
Obtain the second feature information that the last stage of the base neural network extracts;
Global pool operation is carried out to the second feature information, obtains third feature information;
Enhance network by context to melt the fisrt feature information, the second feature information and the third feature information Conjunction forms fisrt feature figure.
5. according to the method described in claim 4, it is characterized in that, context enhancing network includes the first parallel convolution Layer, the second convolutional layer and third convolutional layer;Wherein, the output end of second convolutional layer is also connected with up-sampling operation layer, institute The output end for stating third convolutional layer is also connected with broadcast operation layer;
The output end of the output end of first convolutional layer, the output end of the up-sampling operation layer and the broadcast operation layer is total It is same to be connected with add operation layer.
6. according to the method described in claim 5, it is characterized in that, described enhance network for the fisrt feature by context Information, the second feature information and the third feature information merge the step of forming fisrt feature figure, comprising:
By the fisrt feature information input to first convolutional layer, by the second feature information input to the volume Two Lamination, and by the third feature information input to the third convolutional layer;
Convolution operation is carried out to the fisrt feature information by first convolutional layer, it is special to obtain first with specified scale Reference breath;
By second convolutional layer and the up-sampling operation layer successively to the second feature information carry out convolution operation and Up-sampling operation, obtains the second feature information with the specified scale;
Convolution operation and wide successively is carried out to the third feature information by the third convolutional layer and the broadcast operation layer Operation is broadcast, the third feature information with the specified scale is obtained;
By the add operation layer to the fisrt feature information with the specified scale, second with the specified scale Characteristic information and third feature information with the specified scale sum up, and form fisrt feature figure.
7. according to the described in any item methods of claim 3 to 6, which is characterized in that the base neural network is that lightweight is special Sign extracts network.
8. the method according to claim 1, wherein described carry out candidate region knowledge to the fisrt feature figure Not, the step of obtaining the candidate region information of the target image, comprising:
The fisrt feature figure is input to region candidate and generates network;
Network is generated by the region candidate, feature extraction is carried out to the fisrt feature figure, obtain intermediate features figure, and right The intermediate features figure carries out candidate region identification, obtains the candidate region information of the target image.
9. according to the method described in claim 8, it is characterized in that, it includes sequentially connected logical that the region candidate, which generates network, Road convolutional layer and Volume Four lamination.
10. according to the method described in claim 8, it is characterized in that, described according to the candidate region information and described first The step of characteristic pattern, generation testing result, comprising:
The fisrt feature figure and the intermediate features figure are input to spatial attention network;
The fisrt feature figure and the intermediate features figure are merged and to form second feature figure by the spatial attention network; Wherein, the foreground features of the second feature figure are better than background characteristics;
According to the candidate region information and the second feature figure, testing result is generated.
11. according to the method described in claim 10, it is characterized in that, the spatial attention network includes sequentially connected Five convolutional layers and activation primitive layer;The output end of the activation primitive layer is connected with multiplying layer.
12. according to the method for claim 11, which is characterized in that between the 5th convolutional layer and the activation primitive layer It is also connected with batch normalization layer.
13. according to the method for claim 12, which is characterized in that it is described by the spatial attention network by described the One characteristic pattern and the intermediate features figure merge the step of forming second feature figure, comprising:
The intermediate features figure is input to the 5th convolutional layer, by the 5th convolutional layer, described batch of normalization layer and The activation primitive layer is successively handled the intermediate features figure, obtain activation primitive layer output after processing The intermediate features figure;Wherein, the foreground features of the intermediate features figure after processing are better than background characteristics;
The intermediate features figure by the fisrt feature figure and after processing is input to the multiplying layer;
Multiplying is carried out by the intermediate features figure of the multiplying layer to the fisrt feature figure and after processing, Generate second feature figure.
14. according to the method described in claim 10, it is characterized in that, described according to the candidate region information and described second The step of characteristic pattern, generation testing result, comprising:
The candidate region information and the second feature figure are input to candidate region feature extraction layer;
It is based on the candidate region information by the candidate region feature extraction layer, is extracted on the second feature figure each The provincial characteristics of candidate region;
Provincial characteristics based on each candidate region carries out target detection, generates testing result.
15. according to the method for claim 14, which is characterized in that the provincial characteristics based on each candidate region The step of carrying out target detection, generating testing result, comprising:
Classification processing is carried out by provincial characteristics of the classification sub-network to each candidate region, is determined in the target image Target category;And/or
Recurrence processing is carried out to the provincial characteristics of each candidate region by returning sub-network, is obtained in the target image Target position.
16. according to the method for claim 15, which is characterized in that the classification sub-network and the recurrence sub-network are One full articulamentum.
17. a kind of object detecting device characterized by comprising
Image collection module, for obtaining target image to be detected;
Fisrt feature figure generation module generates fisrt feature figure for carrying out feature extraction to the target image;Wherein, institute State the characteristic information that fisrt feature figure includes different scale;
Candidate identification module obtains the candidate of the target image for carrying out region candidate identification to the fisrt feature figure Area information;
Detection module, for generating testing result according to multiple candidate regions and the fisrt feature figure;The detection knot Fruit includes target category and/or target position in the target image.
18. a kind of object detection system, which is characterized in that the system comprises: image collecting device, processor and storage dress It sets;
Described image acquisition device, for acquiring target image;
Computer program is stored on the storage device, the computer program is executed when being run by the processor as weighed Benefit requires 1 to 16 described in any item methods.
19. a kind of computer readable storage medium, computer program, feature are stored on the computer readable storage medium The step of being, the described in any item methods of the claims 1 to 16 executed when the computer program is run by processor.
CN201811049034.4A 2018-09-07 2018-09-07 Target detection method, device and system Active CN109255352B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811049034.4A CN109255352B (en) 2018-09-07 2018-09-07 Target detection method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811049034.4A CN109255352B (en) 2018-09-07 2018-09-07 Target detection method, device and system

Publications (2)

Publication Number Publication Date
CN109255352A true CN109255352A (en) 2019-01-22
CN109255352B CN109255352B (en) 2021-06-22

Family

ID=65048187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811049034.4A Active CN109255352B (en) 2018-09-07 2018-09-07 Target detection method, device and system

Country Status (1)

Country Link
CN (1) CN109255352B (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815770A (en) * 2019-01-31 2019-05-28 北京旷视科技有限公司 Two-dimentional code detection method, apparatus and system
CN109816036A (en) * 2019-01-31 2019-05-28 北京字节跳动网络技术有限公司 Image processing method and device
CN109871890A (en) * 2019-01-31 2019-06-11 北京字节跳动网络技术有限公司 Image processing method and device
CN109886155A (en) * 2019-01-30 2019-06-14 华南理工大学 Man power single stem rice detection localization method, system, equipment and medium based on deep learning
CN109886230A (en) * 2019-02-28 2019-06-14 中南大学 A kind of image object detection method and device
CN109948611A (en) * 2019-03-14 2019-06-28 腾讯科技(深圳)有限公司 A kind of method and device that method, the information of information area determination are shown
CN110008951A (en) * 2019-03-14 2019-07-12 深兰科技(上海)有限公司 A kind of object detection method and device
CN110111299A (en) * 2019-03-18 2019-08-09 国网浙江省电力有限公司信息通信分公司 Rust staining recognition methods and device
CN110135406A (en) * 2019-07-09 2019-08-16 北京旷视科技有限公司 Image-recognizing method, device, computer equipment and storage medium
CN110135307A (en) * 2019-04-30 2019-08-16 北京邮电大学 Method for traffic sign detection and device based on attention mechanism
CN110287836A (en) * 2019-06-14 2019-09-27 北京迈格威科技有限公司 Image classification method, device, computer equipment and storage medium
CN110298821A (en) * 2019-05-28 2019-10-01 昆明理工大学 A kind of reinforcing bar detection method based on Faster R-CNN
CN110427898A (en) * 2019-08-07 2019-11-08 广东工业大学 Wrap up safety check recognition methods, system, device and computer readable storage medium
CN110532955A (en) * 2019-08-30 2019-12-03 中国科学院宁波材料技术与工程研究所 Example dividing method and device based on feature attention and son up-sampling
CN110533119A (en) * 2019-09-04 2019-12-03 北京迈格威科技有限公司 The training method of index identification method and its model, device and electronic system
CN110674886A (en) * 2019-10-08 2020-01-10 中兴飞流信息科技有限公司 Video target detection method fusing multi-level features
CN110837789A (en) * 2019-10-31 2020-02-25 北京奇艺世纪科技有限公司 Method and device for detecting object, electronic equipment and medium
CN111008555A (en) * 2019-10-21 2020-04-14 武汉大学 Unmanned aerial vehicle image small and weak target enhancement extraction method
CN111144238A (en) * 2019-12-11 2020-05-12 重庆邮电大学 Article detection method and system based on Faster R-CNN
CN111163294A (en) * 2020-01-03 2020-05-15 重庆特斯联智慧科技股份有限公司 Building safety channel monitoring system and method for artificial intelligence target recognition
CN111340092A (en) * 2020-02-21 2020-06-26 浙江大华技术股份有限公司 Target association processing method and device
CN111598882A (en) * 2020-05-19 2020-08-28 联想(北京)有限公司 Organ detection method and device and computer equipment
CN111666958A (en) * 2019-03-05 2020-09-15 中科院微电子研究所昆山分所 Method, device, equipment and medium for detecting equipment state based on image recognition
CN111797737A (en) * 2020-06-22 2020-10-20 重庆高新区飞马创新研究院 Remote sensing target detection method and device
CN111797657A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Vehicle peripheral obstacle detection method, device, storage medium, and electronic apparatus
CN111860413A (en) * 2020-07-29 2020-10-30 Oppo广东移动通信有限公司 Target object detection method and device, electronic equipment and storage medium
CN111914600A (en) * 2019-05-08 2020-11-10 四川大学 Group emotion recognition method based on space attention model
CN111914997A (en) * 2020-06-30 2020-11-10 华为技术有限公司 Method for training neural network, image processing method and device
CN112016569A (en) * 2020-07-24 2020-12-01 驭势科技(南京)有限公司 Target detection method, network, device and storage medium based on attention mechanism
CN112016511A (en) * 2020-09-08 2020-12-01 重庆市地理信息和遥感应用中心 Remote sensing image blue top room detection method based on large-scale depth convolution neural network
CN112036400A (en) * 2020-07-09 2020-12-04 北京航空航天大学 Method for constructing network for target detection and target detection method and system
CN112241669A (en) * 2019-07-18 2021-01-19 杭州海康威视数字技术股份有限公司 Target identification method, device, system and equipment, and storage medium
WO2021083126A1 (en) * 2019-10-31 2021-05-06 北京市商汤科技开发有限公司 Target detection and intelligent driving methods and apparatuses, device, and storage medium
CN110096960B (en) * 2019-04-03 2021-06-08 罗克佳华科技集团股份有限公司 Target detection method and device
WO2021218037A1 (en) * 2020-04-29 2021-11-04 北京迈格威科技有限公司 Target detection method and apparatus, computer device and storage medium
CN116580027A (en) * 2023-07-12 2023-08-11 中国科学技术大学 Real-time polyp detection system and method for colorectal endoscope video

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120219210A1 (en) * 2011-02-28 2012-08-30 Yuanyuan Ding Multi-Scale, Perspective Context, and Cascade Features for Object Detection
CN106845499A (en) * 2017-01-19 2017-06-13 清华大学 A kind of image object detection method semantic based on natural language
CN106910185A (en) * 2017-01-13 2017-06-30 陕西师范大学 A kind of DBCC disaggregated models and construction method based on CNN deep learnings
US20170206431A1 (en) * 2016-01-20 2017-07-20 Microsoft Technology Licensing, Llc Object detection and classification in images
CN107093189A (en) * 2017-04-18 2017-08-25 山东大学 Method for tracking target and system based on adaptive color feature and space-time context
CN107563290A (en) * 2017-08-01 2018-01-09 中国农业大学 A kind of pedestrian detection method and device based on image
CN107945153A (en) * 2017-11-07 2018-04-20 广东广业开元科技有限公司 A kind of road surface crack detection method based on deep learning
CN108038409A (en) * 2017-10-27 2018-05-15 江西高创保安服务技术有限公司 A kind of pedestrian detection method
CN108460403A (en) * 2018-01-23 2018-08-28 上海交通大学 The object detection method and system of multi-scale feature fusion in a kind of image

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120219210A1 (en) * 2011-02-28 2012-08-30 Yuanyuan Ding Multi-Scale, Perspective Context, and Cascade Features for Object Detection
US20170206431A1 (en) * 2016-01-20 2017-07-20 Microsoft Technology Licensing, Llc Object detection and classification in images
CN106910185A (en) * 2017-01-13 2017-06-30 陕西师范大学 A kind of DBCC disaggregated models and construction method based on CNN deep learnings
CN106845499A (en) * 2017-01-19 2017-06-13 清华大学 A kind of image object detection method semantic based on natural language
CN107093189A (en) * 2017-04-18 2017-08-25 山东大学 Method for tracking target and system based on adaptive color feature and space-time context
CN107563290A (en) * 2017-08-01 2018-01-09 中国农业大学 A kind of pedestrian detection method and device based on image
CN108038409A (en) * 2017-10-27 2018-05-15 江西高创保安服务技术有限公司 A kind of pedestrian detection method
CN107945153A (en) * 2017-11-07 2018-04-20 广东广业开元科技有限公司 A kind of road surface crack detection method based on deep learning
CN108460403A (en) * 2018-01-23 2018-08-28 上海交通大学 The object detection method and system of multi-scale feature fusion in a kind of image

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SEAN BELL ET.AL: "Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
XIAOWEI ZHANG ET AL.: "Scale-aware hierarchical loss: A multipath RPN for multi-scale pedestrian detection", 《2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)》 *
王飞: "基于区域的卷积神经网络及其在静态目标检测方面的应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886155A (en) * 2019-01-30 2019-06-14 华南理工大学 Man power single stem rice detection localization method, system, equipment and medium based on deep learning
CN109886155B (en) * 2019-01-30 2021-08-10 华南理工大学 Single-plant rice detection and positioning method, system, equipment and medium based on deep learning
CN109815770A (en) * 2019-01-31 2019-05-28 北京旷视科技有限公司 Two-dimentional code detection method, apparatus and system
CN109816036A (en) * 2019-01-31 2019-05-28 北京字节跳动网络技术有限公司 Image processing method and device
CN109871890A (en) * 2019-01-31 2019-06-11 北京字节跳动网络技术有限公司 Image processing method and device
CN109816036B (en) * 2019-01-31 2021-08-27 北京字节跳动网络技术有限公司 Image processing method and device
CN109886230A (en) * 2019-02-28 2019-06-14 中南大学 A kind of image object detection method and device
CN111666958A (en) * 2019-03-05 2020-09-15 中科院微电子研究所昆山分所 Method, device, equipment and medium for detecting equipment state based on image recognition
CN109948611A (en) * 2019-03-14 2019-06-28 腾讯科技(深圳)有限公司 A kind of method and device that method, the information of information area determination are shown
CN109948611B (en) * 2019-03-14 2022-07-08 腾讯科技(深圳)有限公司 Information area determination method, information display method and device
CN110008951A (en) * 2019-03-14 2019-07-12 深兰科技(上海)有限公司 A kind of object detection method and device
CN110008951B (en) * 2019-03-14 2020-12-15 深兰科技(上海)有限公司 Target detection method and device
CN110111299A (en) * 2019-03-18 2019-08-09 国网浙江省电力有限公司信息通信分公司 Rust staining recognition methods and device
CN110096960B (en) * 2019-04-03 2021-06-08 罗克佳华科技集团股份有限公司 Target detection method and device
CN111797657A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Vehicle peripheral obstacle detection method, device, storage medium, and electronic apparatus
CN110135307A (en) * 2019-04-30 2019-08-16 北京邮电大学 Method for traffic sign detection and device based on attention mechanism
CN111914600A (en) * 2019-05-08 2020-11-10 四川大学 Group emotion recognition method based on space attention model
CN110298821A (en) * 2019-05-28 2019-10-01 昆明理工大学 A kind of reinforcing bar detection method based on Faster R-CNN
CN110287836A (en) * 2019-06-14 2019-09-27 北京迈格威科技有限公司 Image classification method, device, computer equipment and storage medium
CN110287836B (en) * 2019-06-14 2021-10-15 北京迈格威科技有限公司 Image classification method and device, computer equipment and storage medium
CN110135406A (en) * 2019-07-09 2019-08-16 北京旷视科技有限公司 Image-recognizing method, device, computer equipment and storage medium
CN112241669A (en) * 2019-07-18 2021-01-19 杭州海康威视数字技术股份有限公司 Target identification method, device, system and equipment, and storage medium
CN110427898A (en) * 2019-08-07 2019-11-08 广东工业大学 Wrap up safety check recognition methods, system, device and computer readable storage medium
CN110532955A (en) * 2019-08-30 2019-12-03 中国科学院宁波材料技术与工程研究所 Example dividing method and device based on feature attention and son up-sampling
CN110532955B (en) * 2019-08-30 2022-03-08 中国科学院宁波材料技术与工程研究所 Example segmentation method and device based on feature attention and sub-upsampling
CN110533119A (en) * 2019-09-04 2019-12-03 北京迈格威科技有限公司 The training method of index identification method and its model, device and electronic system
CN110674886B (en) * 2019-10-08 2022-11-25 中兴飞流信息科技有限公司 Video target detection method fusing multi-level features
CN110674886A (en) * 2019-10-08 2020-01-10 中兴飞流信息科技有限公司 Video target detection method fusing multi-level features
CN111008555A (en) * 2019-10-21 2020-04-14 武汉大学 Unmanned aerial vehicle image small and weak target enhancement extraction method
CN110837789A (en) * 2019-10-31 2020-02-25 北京奇艺世纪科技有限公司 Method and device for detecting object, electronic equipment and medium
CN110837789B (en) * 2019-10-31 2023-01-20 北京奇艺世纪科技有限公司 Method and device for detecting object, electronic equipment and medium
WO2021083126A1 (en) * 2019-10-31 2021-05-06 北京市商汤科技开发有限公司 Target detection and intelligent driving methods and apparatuses, device, and storage medium
JP2022535473A (en) * 2019-10-31 2022-08-09 ベイジン センスタイム テクノロジー デベロップメント シーオー.,エルティーディー Target detection, intelligent driving methods, devices, equipment and storage media
CN111144238A (en) * 2019-12-11 2020-05-12 重庆邮电大学 Article detection method and system based on Faster R-CNN
CN111163294A (en) * 2020-01-03 2020-05-15 重庆特斯联智慧科技股份有限公司 Building safety channel monitoring system and method for artificial intelligence target recognition
CN111340092A (en) * 2020-02-21 2020-06-26 浙江大华技术股份有限公司 Target association processing method and device
CN111340092B (en) * 2020-02-21 2023-09-22 浙江大华技术股份有限公司 Target association processing method and device
WO2021218037A1 (en) * 2020-04-29 2021-11-04 北京迈格威科技有限公司 Target detection method and apparatus, computer device and storage medium
CN111598882B (en) * 2020-05-19 2023-11-24 联想(北京)有限公司 Organ detection method, organ detection device and computer equipment
CN111598882A (en) * 2020-05-19 2020-08-28 联想(北京)有限公司 Organ detection method and device and computer equipment
CN111797737A (en) * 2020-06-22 2020-10-20 重庆高新区飞马创新研究院 Remote sensing target detection method and device
CN111914997A (en) * 2020-06-30 2020-11-10 华为技术有限公司 Method for training neural network, image processing method and device
CN111914997B (en) * 2020-06-30 2024-04-02 华为技术有限公司 Method for training neural network, image processing method and device
CN112036400B (en) * 2020-07-09 2022-04-05 北京航空航天大学 Method for constructing network for target detection and target detection method and system
CN112036400A (en) * 2020-07-09 2020-12-04 北京航空航天大学 Method for constructing network for target detection and target detection method and system
CN112016569A (en) * 2020-07-24 2020-12-01 驭势科技(南京)有限公司 Target detection method, network, device and storage medium based on attention mechanism
CN111860413A (en) * 2020-07-29 2020-10-30 Oppo广东移动通信有限公司 Target object detection method and device, electronic equipment and storage medium
CN112016511A (en) * 2020-09-08 2020-12-01 重庆市地理信息和遥感应用中心 Remote sensing image blue top room detection method based on large-scale depth convolution neural network
CN116580027A (en) * 2023-07-12 2023-08-11 中国科学技术大学 Real-time polyp detection system and method for colorectal endoscope video
CN116580027B (en) * 2023-07-12 2023-11-28 中国科学技术大学 Real-time polyp detection system and method for colorectal endoscope video

Also Published As

Publication number Publication date
CN109255352B (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN109255352A (en) Object detection method, apparatus and system
Shi et al. A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection
US11321868B1 (en) System for estimating a pose of one or more persons in a scene
Yu et al. Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module
Raza et al. Appearance based pedestrians’ head pose and body orientation estimation using deep learning
Li et al. A densely attentive refinement network for change detection based on very-high-resolution bitemporal remote sensing images
Sirmacek et al. Urban-area and building detection using SIFT keypoints and graph theory
Zhao et al. Saliency detection by multi-context deep learning
CN109492638A (en) Method for text detection, device and electronic equipment
CN109117879A (en) Image classification method, apparatus and system
CN109815770A (en) Two-dimentional code detection method, apparatus and system
Xu et al. Effective face detector based on yolov5 and superresolution reconstruction
Lu et al. Co-bootstrapping saliency
CN108280455A (en) Human body critical point detection method and apparatus, electronic equipment, program and medium
Zhang et al. Feature pyramid network for diffusion-based image inpainting detection
CN109711416A (en) Target identification method, device, computer equipment and storage medium
CN112132739A (en) 3D reconstruction and human face posture normalization method, device, storage medium and equipment
CN109492576A (en) Image-recognizing method, device and electronic equipment
CN108875456A (en) Object detection method, object detecting device and computer readable storage medium
CN109522970A (en) Image classification method, apparatus and system
EP3642764A1 (en) Learning unified embedding
CN110210480A (en) Character recognition method, device, electronic equipment and computer readable storage medium
Luo et al. Infrared and visible image fusion based on Multi-State contextual hidden Markov Model
Ouadiay et al. Simultaneous object detection and localization using convolutional neural networks
Liu et al. Double Mask R‐CNN for Pedestrian Detection in a Crowd

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant