CN109670517A

CN109670517A - Object detection method, device, electronic equipment and target detection model

Info

Publication number: CN109670517A
Application number: CN201811587447.8A
Authority: CN
Inventors: 马宁宁; 张祥雨
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2018-12-24
Filing date: 2018-12-24
Publication date: 2019-04-23

Abstract

The present invention provides a kind of object detection method, device, electronic equipment and target detection models, belong to technical field of image detection.Wherein, object detection method includes: the characteristic pattern that image to be detected is extracted by feature extraction network, carries out target detection based on characteristic pattern.Wherein, feature extraction network includes multiple convolutional layers, at least one convolutional layer includes one or more structural units；Each structural unit includes parallel at least two channel branch and the concatenation unit and channel rearrangement units for being connected to channel branch tail end.By the way of channel branch, the execution speed of network can be improved, reset by channel, the information exchange between channel branch may be implemented, guarantees the detection accuracy and accuracy of network.Etc. calculation amounts in the case where, features described above extract network have optimal feature extraction speed.Therefore, object detection method provided in an embodiment of the present invention can be improved detection speed, save the time while guaranteeing detection accuracy.

Description

Object detection method, device, electronic equipment and target detection model

Technical field

The invention belongs to technical field of image detection, more particularly, to a kind of object detection method, device, electronic equipment and Target detection model.

Background technique

As increasingly intelligence, the target detection of electronic equipment are widely applied in every field, can detecte out With the presence or absence of the position of target object and target object in image.In order to improve the precision of target detection, currently used for carrying out mesh The neural network model of mark detection mostly uses large-scale neural network model, such as ResNet network, GoolgeNet network.By It is all very big in the calculation amount of these networks, therefore execution speed is very slow, needs to take a substantial amount of time.

Summary of the invention

In view of this, the purpose of the present invention is to provide a kind of object detection method, device, electronic equipment and target detections Model can be improved the speed of target detection, save the time.

To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:

In a first aspect, the embodiment of the invention provides a kind of object detection methods, comprising:

Feature extraction is carried out by feature extraction network handles detection image, obtains the characteristic pattern of described image to be detected； The feature extraction network includes multiple convolutional layers, at least one convolutional layer in the multiple convolutional layer includes one or more Structural unit；Each structural unit includes at least two parallel channel branch and is connected to the channel branch tail end Concatenation unit and channel rearrangement units；

Characteristic pattern input target detection network is subjected to target detection.

With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein every At least one channel branch in a structural unit includes multiple convolution units, is included at least in the multiple convolution unit The depth convolution unit of one convolution kernel with pre-set dimension.

The possible embodiment of with reference to first aspect the first, the embodiment of the invention provides second of first aspect Possible embodiment, wherein the convolution kernel of the pre-set dimension is the convolution kernel of 3*3.

With reference to first aspect, the embodiment of the invention provides the third possible embodiments of first aspect, wherein institute Stating feature extraction network includes the convolutional layer that step-length is 1；Each structural unit includes connecting in the convolutional layer that the step-length is 1 Connect the channel segmentation unit in the head end of at least two channel branch.

With reference to first aspect, the embodiment of the invention provides the 4th kind of possible embodiments of first aspect, wherein institute Stating target detection network includes classification sub-network and/or recurrence sub-network；The classification sub-network is used for based on described to be detected The characteristic pattern of image determines whether described image to be detected includes target object；The recurrence sub-network is used for based on described to be checked The characteristic pattern of altimetric image determines position of the target object in described image to be detected.

With reference to first aspect, the embodiment of the invention provides the 5th kind of possible embodiments of first aspect, wherein institute Stating target detection network includes sequentially connected big core depth separation convolutional layer, pond layer and full articulamentum, and with it is described entirely The parallel classification sub-network and recurrence sub-network of articulamentum connection；Characteristic pattern input target detection network is subjected to target The step of detection, comprising:

The characteristic pattern is passed sequentially through into the big core depth separation convolutional layer, pond layer and full articulamentum, is obtained described The characteristic of full articulamentum output；

The characteristic is inputted into the classification sub-network and the recurrence sub-network respectively, obtains the classification subnet The classification results of network output and the regression result for returning sub-network output；

In conjunction with the classification results and the regression result, object detection results are exported.

Second aspect, the embodiment of the present invention also provide a kind of target detection model, including feature extraction network and with it is described The target detection network of feature extraction network connection；The feature extraction network includes multiple convolutional layers, the multiple convolutional layer In at least one convolutional layer include one or more structural units；Each structural unit includes parallel at least two logical Road branch and the concatenation unit and channel rearrangement units for being connected to the channel branch tail end.

In conjunction with second aspect, the embodiment of the invention provides the first possible embodiments of second aspect, wherein institute Stating target detection network includes classification sub-network and/or recurrence sub-network.

In conjunction with second aspect, the embodiment of the invention provides second of possible embodiments of second aspect, wherein institute Stating target detection network includes sequentially connected big core depth separation convolutional layer, pond layer and full articulamentum, and with it is described entirely The parallel classification sub-network and recurrence sub-network of articulamentum connection.

The third aspect, the embodiment of the invention provides a kind of object detecting devices, comprising:

Characteristic extracting module, for carrying out feature extraction by feature extraction network handles detection image, obtain it is described to The characteristic pattern of detection image；The feature extraction network includes multiple convolutional layers, at least one of the multiple convolutional layer volume Lamination includes one or more structural units；Each structural unit includes parallel at least two channel branch and connection In the concatenation unit and channel rearrangement units of the channel branch tail end；

Module of target detection, for characteristic pattern input target detection network to be carried out target detection.

Fourth aspect, the embodiment of the invention provides a kind of electronic equipment, including image collecting device, memory and processing Device；

Described image acquisition device, for acquiring image data；

The computer program that can be run on the processor is stored in the memory, described in the processor executes The step of method described in any one of above-mentioned first aspect is realized when computer program.

5th aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage Computer program is stored on medium, the computer program is executed when being run by processor described in above-mentioned any one of first aspect Method the step of.

Object detection method, device, electronic equipment and target detection model provided in an embodiment of the present invention, are mentioned by feature It takes network to extract the characteristic pattern of image to be detected, target detection is carried out based on characteristic pattern.Wherein, feature extraction network includes multiple Convolutional layer, wherein at least one convolutional layer include one or more structural units；Each structural unit include it is parallel extremely Few two channel branch and the concatenation unit and channel rearrangement units for being connected to the channel branch tail end.Using channel branch Mode, the execution speed of network can be improved, reset by channel, the information exchange between channel branch may be implemented, protect Demonstrate,prove the detection accuracy and accuracy of network.Etc. calculation amounts in the case where, feature extraction network provided in an embodiment of the present invention tool There is optimal feature extraction speed.Therefore, object detection method provided in an embodiment of the present invention is guaranteeing the same of detection accuracy When, detection speed can be improved, save the time.

Other feature and advantage of the disclosure will illustrate in the following description, alternatively, Partial Feature and advantage can be with Deduce from specification or unambiguously determine, or by implement the disclosure above-mentioned technology it can be learnt that.

To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.

Detailed description of the invention

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 shows the structural schematic diagram of a kind of electronic equipment provided by the embodiment of the present invention；

Fig. 2 shows a kind of flow charts of object detection method provided by the embodiment of the present invention；

Fig. 3 shows a kind of schematic diagram of internal structure of feature extraction network provided by the embodiment of the present invention；

Fig. 4 shows the schematic diagram of internal structure of another kind feature extraction network provided by the embodiment of the present invention；

Fig. 5 shows a kind of structural schematic diagram of target detection network provided by the embodiment of the present invention；

Fig. 6 shows a kind of structural block diagram of object detecting device provided by the embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.

In order to solve to cause greatly to calculate speed very much for carrying out the neural network model calculation amount of target detection in the prior art Slow problem is spent, the embodiment of the invention provides a kind of object detection method, device, electronic equipment and target detection models.With Lower combination the drawings and specific embodiments are to object detection method provided in an embodiment of the present invention, device, electronic equipment and target Detection model is described in detail.

Embodiment one:

Firstly, describing the exemplary electronic device of the object detection method for realizing the embodiment of the present invention referring to Fig.1 100.The exemplary electronic device 100 can be the mobile terminals such as smart phone, tablet computer, camera；It can also be authentication Other equipment such as the server of equipment (such as attendance recorder, testimony of a witness all-in-one machine), monitor or monitoring center.

As shown in Figure 1, electronic equipment 100 includes one or more processors 102, one or more memories 104, input Device 106, output device 108, can also include image collecting device 110, these components by bus system 112 and/or its Bindiny mechanism's (not shown) of its form interconnects.It should be noted that the component and structure of electronic equipment 100 shown in FIG. 1 only show Example property, and not restrictive, as needed, the electronic equipment also can have other assemblies and structure.

The processor 102 can be central processing unit (CPU), graphics processor (Graphics Processing Unit, GPU) or the other forms with data-handling capacity, image-capable and/or instruction execution capability processing list Member, and can control other components in the electronic equipment 100 to execute desired function.

The memory 104 may include one or more computer program products, and the computer program product can be with Including various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.It is described volatile Property memory for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non-easy The property lost memory for example may include read-only memory (ROM), hard disk, flash memory etc..On the computer readable storage medium It can store one or more computer program instructions, processor 102 can run described program instruction, described below to realize The embodiment of the present invention in the function of image segmentation (realized by processor) and/or other desired functions.In the meter Can also store various application programs and various data in calculation machine readable storage medium storing program for executing, for example, the application program use and/or The various images etc. generated.

The input unit 106 can be the device that user is used to input instruction, and may include keyboard, mouse, wheat One or more of gram wind and touch screen etc..

The output device 108 can export various information (for example, image or sound) to external (for example, user), and It and may include one or more of display, loudspeaker etc..

Described image acquisition device 110 can shoot the desired image of user (such as photo, video etc.), and will be clapped The image taken the photograph is stored in the memory 104 for the use of other components.

One or more light compensating lamps are additionally provided on electronic equipment 100, light compensating lamp is arranged corresponding to image collecting device, uses In when ambient light deficiency, when influencing the Image Acquisition effect of image collecting device, light filling is carried out for described image acquisition device. Light compensating lamp can use infrared light compensating lamp, such as near-infrared LED lamp, laser infrared lamp.Infrared light compensating lamp issues invisible infrared Light carries out light filling in half-light environment for image collecting device.

Embodiment two:

A kind of object detection method is present embodiments provided, the speed of target detection can be improved, save the time.Fig. 2 shows The flow chart of the object detection method is gone out.It should be it should be noted that the step of showing in the flowchart of fig. 2 can be such as It is executed in the computer system of a group of computer-executable instructions, although also, logical order is shown in flow charts, It in some cases, can be with the steps shown or described are performed in an order that is different from the one herein.The present embodiment is carried out below It is discussed in detail.

As shown in Fig. 2, object detection method provided in this embodiment, includes the following steps:

Step S202 carries out feature extraction by feature extraction network handles detection image, obtains the spy of image to be detected Sign figure.

Wherein, image to be detected can be the image that image collecting device acquires in real time, or pre-stored figure Picture.In addition, image to be detected can be the image of picture format, the picture frame being also possible in video, the embodiment of the present invention is not It is restricted.Whether object detection method provided in this embodiment can detecte out in image to be detected comprising target object, may be used also With the position of detected target object.The target object includes but is not limited to face, pedestrian, vehicle, animal or plant etc..Target Object can also be a part of animal or a part of plant.

The network structure of the feature extraction network can be such that feature extraction network may include multiple convolutional layers, institute Convolutional layer is stated for extracting characteristic pattern from image to be detected.In order to provide the speed of convolutional calculation, in the multiple convolutional layer At least one convolutional layer may include structural unit as shown in Figure 3 or Figure 4.The structural unit includes parallel at least two Channel branch and the concatenation unit (concat) and channel rearrangement units (channel for being connected to channel branch tail end shuffle).For example, including at least one structure list in a part of convolutional layer of feature extraction network in some embodiments Member includes in further embodiments at least one structural unit in all convolutional layers of feature extraction network.It needs to illustrate , Fig. 3 and Fig. 4 illustrate only the structural unit including two channel branch, in a part of the embodiment, in structural unit It may include parallel three or four, even more channel branch, the tail end of each channel branch is single with the splicing Member connection.

Characteristic pattern input target detection network is carried out target detection by step S204.

A kind of achievable mode of the step are as follows: characteristic pattern is inputted into target detection network, it is defeated to obtain target detection network Object detection results out.Whether testing result may include comprising target object in image to be detected, can also include target Position of the object in image to be detected.Wherein, target detection network may include classification sub-network and/or recurrence sub-network； The classification sub-network is for determining whether image to be detected includes target object；The recurrence sub-network is for determining target pair As the position in image to be detected.When target detection network includes classification sub-network and returns sub-network, sub-network of classifying It is arranged parallel with sub-network is returned.

Object detection method provided in an embodiment of the present invention extracts the feature of image to be detected by feature extraction network Figure carries out target detection based on characteristic pattern.Wherein, feature extraction network includes multiple convolutional layers, wherein at least one convolutional layer Including one or more structural units；Each structural unit includes at least two parallel channel branch and is connected to institute State the concatenation unit and channel rearrangement units of channel branch tail end.By the way of channel branch, the execution of network can be improved Speed is reset by channel, and the information exchange between channel branch may be implemented, and guarantees the detection accuracy and accuracy of network. Etc. calculation amounts in the case where, feature extraction network provided in an embodiment of the present invention have optimal feature extraction speed.Therefore, Detection speed can be improved while guaranteeing detection accuracy in object detection method provided in an embodiment of the present invention, when saving Between.

In an alternative embodiment, features described above, which extracts network, can use (the second generation channel ShuffleNetV2 Reset network), ShuffleNetV2 network is a kind of convolutional neural networks model of lightweight, compared to existing large-scale nerve Network model (such as ResNet network, GoolgeNet network), ShuffleNetV2 network etc. calculation amounts in the case where, tool There is feature extraction speed optimal at present.

In order to further increase the accuracy rate of target detection, ShuffleNetV2 network can be improved, in network The middle convolution kernel for increasing pre-set dimension, improves the receptive field of feature extraction network.For example, in certain embodiments, ShuffleNetV2 network may include the convolutional layer that step-length is 1 (stride=1), in further embodiments, ShuffleNetV2 network may include the convolutional layer that step-length is 2 (stride=2).In further embodiments, ShuffleNetV2 network may include the convolutional layer of different step-lengths, had such as both included the convolutional layer that step-length is 1, and had also been including step-length 2 convolutional layer.

The convolutional layer for being 1 for step-length may include one or more structural units shown in Fig. 3, the knot in convolutional layer Structure unit includes two parallel channel branch, and the head end of two channel branch is connected with channel segmentation unit (channel Split), the port number of input is divided into Liang Ge branch by channel segmentation unit.For example, if the port number of the image of input is c, After channel segmentation unit, the port number for inputting first branch information an of channel branch is c₁, input another channel point The port number of second branch information of branch is then c-c₁.In general, if image to be detected of input is RGB image, lead to Road number is 3, the i.e. channel R, the channel G and channel B.

In two channel branch, a channel branch may include multiple convolution units, in the multiple convolution unit Including at least the depth convolution unit of a convolution kernel with pre-set dimension.The convolution kernel of pre-set dimension can be the volume of 5*5 Product core can also use the convolution kernel of 3*3 to reduce calculation amount.Tool can be set according to the actual needs of receptive field size There is the quantity of the depth convolution unit of the convolution kernel of pre-set dimension.As shown in figure 3, the channel branch on right side includes sequentially connected Four convolution units, the convolution kernel size of four convolution units are respectively 1*1,3*3,3*3 and 1*1, are preset including two The depth convolution unit of the convolution kernel of size (3*3), the convolution unit that two convolution kernels are 1*1 is using ReLU as activation letter Number.First branch information of the channel branch for inputting left side remains unchanged, and inputs the second branch information of the channel branch on right side The output of the second branch is obtained after multiple convolution, the port number of the second branch output is identical as the second branch information of input. The first branch information and the output of the second branch are stitched together by concatenation unit, thus the port number for exporting the structural unit Amount is identical as the input number of channels of the structural unit, remains unchanged number of channels.The output of concatenation unit is using channel Rearrangement units carry out channel rearrangement, make that information interchange can be carried out between two channel branch, avoid because believing between channel branch Breath exchanges ability to express that is unsmooth and influencing network.

The convolutional layer for being 2 for step-length may include one or more structural units shown in Fig. 4, the knot in convolutional layer Structure unit includes two parallel channel branch, and each channel branch includes multiple convolution units, the multiple convolution unit In include at least one with pre-set dimension convolution kernel depth convolution unit.Equally, the convolution kernel of pre-set dimension can be The convolution kernel of 5*5 is also possible to the convolution kernel of 3*3.Can be according to the actual needs of receptive field size, being arranged has pre-set dimension Convolution kernel depth convolution unit quantity.As shown in figure 4, the channel branch on right side includes sequentially connected four convolution lists Member, the convolution kernel size of four convolution units are respectively 1*1,3*3,3*3 and 1*1, including two pre-set dimensions (3*3) The depth convolution unit of convolution kernel.The channel branch in left side includes sequentially connected two convolution units, two convolution units Convolution kernel size is respectively 3*3 and 1*1, including the depth convolution unit of the convolution kernel of a pre-set dimension (3*3).

First branch information of the channel branch for inputting left side obtains the output of the first branch after convolution, inputs right side Second branch information of channel branch obtains the output of the second branch after multiple convolution.It is by concatenation unit that the first branch is defeated It exports and is stitched together with the second branch out, since the step-length of the convolutional layer is 2, the number of channels exported after splicing is input 2 times of the number of channels of the structural unit.The output of concatenation unit carries out channel rearrangement using channel rearrangement units, makes two Information interchange can be carried out between channel branch, avoid the expression energy that network is influenced due to information interchange is unsmooth between channel branch Power.

In order to further increase the speed of target detection, in embodiments of the present invention, target detection network can be used Light-Head R-CNN network.The specific network structure of target detection network can be as shown in figure 5, include sequentially connected big Core depth separates convolutional layer (Large separable convolution), pond layer and full articulamentum (Fully Collection, FC), and the parallel classification sub-network (classification subnet) that is connect with full articulamentum and It returns sub-network (Location subnet).The characteristic pattern that feature extraction network exports is passed sequentially through into big core depth separation volume Lamination, pond layer and full articulamentum, the characteristic of available full articulamentum output, the characteristic is inputted respectively and is divided Class sub-network and recurrence sub-network obtain the classification results of classification sub-network output and return the regression result that sub-network exports, Combining classification result and regression result export object detection results.

Specifically, characteristic pattern separates convolutional layer, available narrow characteristic pattern (thinner by big core depth Feature map), it include the candidate region of multiple candidate regions of different sizes or position sensing in the narrow characteristic pattern. The pond layer can use the candidate region pond layer (PSROI pooling) or candidate region pond layer (ROI of position sensing pooling).The effect of the pond layer is to adjust multiple candidate regions of different sizes to fixed dimension.By narrow characteristic pattern Pond layer is inputted, then the output of pond layer is obtained into characteristic by full articulamentum.Characteristic is inputted to classification respectively Network and recurrence sub-network, obtain output object detection results.In the target detection network, due to using big core depth point From convolutional layer, the score matrix that all categories are calculated for each region is no longer needed to when determining candidate region, therefore avoid weight Head construction and weight tail structure, can greatly reduce computation complexity, improve and execute speed.

Wherein, classification sub-network includes multiple convolutional layers, is mainly used for target classification.Fusion feature figure is inputted into classification Network, classification sub-network may determine that whether there is target object appearance in the fusion feature figure of input, and output target object occurs A possibility that, i.e., a possibility that target object occurs in image to be detected.For example, in Face datection task, subnet of classifying Network can export the testing result " with the presence or absence of face ".

Returning sub-network also includes multiple convolutional layers, is mainly used for target positioning, and target location tasks are also believed to back Return task.Fusion feature figure is inputted and returns sub-network, target pair in the fusion feature figure of input can be determined by returning sub-network The position of the position of elephant, i.e. target object in image to be detected.Mark target object position can be exported by returning sub-network Rectangle surrounds frame.For example, returning sub-network can export " the recurrence frame coordinate of face " in Face datection task, frame is returned Namely the rectangle encirclement frame of the face of sub-network prediction is returned, characterize the specific location where face.

In conclusion since target detection network uses Light-Head R-CNN network inspection can not lost In the case where surveying precision, the speed of target detection is further increased.

In order to make feature extraction network and target detection network may be directly applied to carry out target inspection to image to be detected It surveys, exports more accurately and reliably as a result, it is desirable to be trained in advance to feature extraction network and target detection network.Below in detail Describe the training process of bright feature extraction network and target detection network in detail.

Obtain training image sample set；The training image sample set includes multiple training images.Using training sample set Feature extraction network and target detection network are trained.

Optionally, a training image is randomly selected from training image sample set；Training image input feature vector is extracted Network obtains the characteristic pattern of training image；The characteristic pattern of training image is inputted into target detection network, obtains the inspection of training image Survey result.The testing result of training image is compared with the label manually marked, is calculated and is damaged using preset loss function Mistake value.Penalty values are to determine the degree of closeness of actual output and desired output.Penalty values are smaller, illustrate that actual output is got over Close to desired output.Back-propagation algorithm can be used, adjusts feature extraction network and target detection network according to penalty values Parameter, until penalty values when converging to preset desired value, complete the training to feature extraction network and target detection network, Using parameter current as the parameter of feature extraction network and target detection network.

Embodiment three:

With above-mentioned object detection method correspondingly, present embodiments provide a kind of target detection model, the target detection Model includes feature extraction network and the target detection network with feature extraction network connection.

Wherein, the feature extraction network is used to extract the characteristic pattern of image to be detected.Feature extraction network may include Multiple convolutional layers, at least one convolutional layer in the multiple convolutional layer include one or more structural units；Each knot Structure unit includes parallel at least two channel branch and the concatenation unit for being connected to the channel branch tail end and channel weight Arrange unit.In an alternative embodiment, as shown in figure 3, feature extraction network includes the convolutional layer that step-length is 1；The step Each structural unit in a length of 1 convolutional layer includes the channel segmentation unit for being connected to the head end of at least two channel branch. A channel branch at least two channel branch includes multiple convolution units, is at least wrapped in the multiple convolution unit Include the depth convolution unit of a convolution kernel with pre-set dimension.The convolution kernel of the pre-set dimension can be the convolution of 3*3 Core.In an alternative embodiment, as shown in figure 4, feature extraction network includes the convolutional layer that step-length is 2；The step-length It include being connected at least two channel branch for each structural unit in 2 convolutional layer, each channel branch includes multiple Convolution unit includes at least the depth convolution unit of a convolution kernel with pre-set dimension in the multiple convolution unit.Institute The convolution kernel for stating pre-set dimension can be the convolution kernel of 3*3.

As shown in figure 5, the target detection network includes sequentially connected big core depth separation convolutional layer, pond layer and entirely Articulamentum, and the parallel classification sub-network and recurrence sub-network that are connect with the full articulamentum.By described image to be detected Characteristic pattern pass sequentially through big core depth separation convolutional layer, pond layer and full articulamentum, obtain the characteristic that full articulamentum exports According to.The classification sub-network is used to carry out classification processing to the characteristic, determines in characteristic pattern whether include target object, And output category result.The recurrence sub-network determines the position of target object for carrying out recurrence processing to the characteristic It sets, and output regression is as a result, in conjunction with the classification results and the regression result, it can obtain object detection results.

Example IV:

Corresponding to above method embodiment, a kind of object detecting device, one kind shown in Figure 6 are present embodiments provided The structural schematic diagram of object detecting device, the device include:

Characteristic extracting module 61 obtains described for carrying out feature extraction by feature extraction network handles detection image The characteristic pattern of image to be detected；The feature extraction network includes at least one of multiple convolutional layers, the multiple convolutional layer Convolutional layer includes one or more structural units；Each structural unit includes parallel at least two channel branch and company Connect the concatenation unit and channel rearrangement units in the channel branch tail end；

Module of target detection 62, for characteristic pattern input target detection network to be carried out target detection.

Wherein, at least one channel branch in each structural unit includes multiple convolution units, the multiple volume The depth convolution unit of a convolution kernel with pre-set dimension is included at least in product unit.The convolution kernel of the pre-set dimension is The convolution kernel of 3*3.

In an alternative embodiment, the feature extraction network includes the convolutional layer that step-length is 1；The step-length is 1 Convolutional layer in each structural unit include the channel segmentation unit for being connected to the head end of at least two channel branch. A channel branch at least two channel branch includes multiple convolution units, is at least wrapped in the multiple convolution unit Include the depth convolution unit of a convolution kernel with pre-set dimension.

In an alternative embodiment, the feature extraction network includes the convolutional layer that step-length is 2；The step-length is Each structural unit includes being connected at least two channel branch in 2 convolutional layer, and each channel branch includes Multiple convolution units include at least the depth convolution list of a convolution kernel with pre-set dimension in the multiple convolution unit Member.

In some embodiments, the target detection network includes classification sub-network and/or recurrence sub-network；The classification Sub-network is used to determine whether described image to be detected includes target object based on the characteristic pattern of described image to be detected；Described time Characteristic pattern of the sub-network for based on described image to be detected is returned to determine position of the target object in described image to be detected.

In further embodiments, the target detection network includes the sequentially connected big separation of core depth convolutional layer, pond Change layer and full articulamentum, and the parallel classification sub-network and recurrence sub-network that connect with the full articulamentum.The target Detection module 62 can be also used for: the characteristic pattern is passed sequentially through the big core depth separation convolutional layer, pond layer and Quan Lian Layer is connect, the characteristic of the full articulamentum output is obtained；The characteristic is inputted into the classification sub-network and institute respectively Recurrence sub-network is stated, the classification results and the regression result for returning sub-network output of the classification sub-network output are obtained； In conjunction with the classification results and the regression result, object detection results are exported.

In an alternative embodiment, above-mentioned object detecting device can also include training module, training module and spy It levies extraction module 61 to connect, for obtaining training image sample set；The training image sample set includes multiple training images；It adopts The feature extraction network and the target detection network are trained with the training sample set.

The training module can be also used for: a training image is randomly selected from training image sample set；It will train Image input feature vector extracts network, obtains the characteristic pattern of training image；The characteristic pattern of training image is inputted into target detection network, Obtain the testing result of training image.The testing result of training image is compared with the label manually marked, using default Loss function calculate penalty values.Penalty values are to determine the degree of closeness of actual output and desired output.Penalty values are smaller, Illustrate actual export closer to desired output.Back-propagation algorithm can be used, adjusts feature extraction net according to penalty values The parameter of network and target detection network, until completing when penalty values converge to preset desired value to feature extraction network and mesh The training of mark detection network, using parameter current as the parameter of feature extraction network and target detection network.

The embodiment of the invention provides a kind of object detecting devices, and the spy of image to be detected is extracted by feature extraction network Sign figure carries out target detection based on characteristic pattern.Wherein, feature extraction network includes multiple convolutional layers, wherein at least one convolution Layer includes one or more structural units；Each structural unit includes at least two parallel channel branch and is connected to The concatenation unit and channel rearrangement units of the channel branch tail end.By the way of channel branch, holding for network can be improved Scanning frequency degree, is reset by channel, and the information exchange between channel branch may be implemented, and guarantees the detection accuracy of network and accurate Degree.Therefore, detection speed can be improved while guaranteeing detection accuracy in object detecting device provided in an embodiment of the present invention, Save the time.

The technical effect of device provided by the present embodiment, realization principle and generation is identical with previous embodiment, for letter It describes, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.

The embodiment of the invention also provides a kind of electronic equipment, including image collecting device, memory, processor.It is described Image collecting device, for acquiring image data；The computer that can be run on the processor is stored in the memory Program, the processor realize method documented by preceding method embodiment when executing the computer program.

It is apparent to those skilled in the art that for convenience and simplicity of description, the electronics of foregoing description The specific work process of equipment, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

Further, the present embodiment additionally provides a kind of computer readable storage medium, the computer readable storage medium On be stored with computer program, the computer program is executed when being run by processor provided by above-mentioned preceding method embodiment The step of method, specific implementation can be found in embodiment of the method, and details are not described herein.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.

Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features；And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.

Claims

1. a kind of object detection method characterized by comprising

Feature extraction is carried out by feature extraction network handles detection image, obtains the characteristic pattern of described image to be detected；It is described Feature extraction network includes multiple convolutional layers, at least one convolutional layer in the multiple convolutional layer includes one or more structures Unit；Each structural unit includes parallel at least two channel branch and the spelling for being connected to the channel branch tail end Order member and channel rearrangement units；

2. the method according to claim 1, wherein at least one channel branch in each structural unit Including multiple convolution units, the depth convolution of a convolution kernel with pre-set dimension is included at least in the multiple convolution unit Unit.

3. according to the method described in claim 2, it is characterized in that, the convolution kernel of the pre-set dimension is the convolution kernel of 3*3.

4. the method according to claim 1, wherein the feature extraction network includes the convolutional layer that step-length is 1； In the convolutional layer that the step-length is 1 each structural unit include be connected at least two channel branch head end it is logical Road cutting unit.

5. the method according to claim 1, wherein the target detection network include classification sub-network and/or Return sub-network；The classification sub-network is used to whether determine described image to be detected based on the characteristic pattern of described image to be detected Include target object；It is described return sub-network be used for based on described image to be detected characteristic pattern determine target object it is described to Position in detection image.

6. the method according to claim 1, wherein the target detection network includes that sequentially connected big core is deep Degree separation convolutional layer, pond layer and full articulamentum, and the parallel classification sub-network and recurrence being connect with the full articulamentum Sub-network；The step of characteristic pattern input target detection network is subjected to target detection, comprising:

The characteristic pattern is passed sequentially through into the big core depth separation convolutional layer, pond layer and full articulamentum, obtains described connecting entirely Connect the characteristic of layer output；

The characteristic is inputted into the classification sub-network and the recurrence sub-network respectively, it is defeated to obtain the classification sub-network Classification results and the regression result for returning sub-network output out；

7. a kind of target detection model, which is characterized in that be connected to the network including feature extraction network and with the feature extraction Target detection network；The feature extraction network includes multiple convolutional layers, at least one convolutional layer in the multiple convolutional layer Including one or more structural units；Each structural unit includes at least two parallel channel branch and is connected to institute State the concatenation unit and channel rearrangement units of channel branch tail end.

8. target detection model according to claim 7, which is characterized in that the target detection network includes classification subnet Network and/or recurrence sub-network.

9. target detection model according to claim 7, which is characterized in that the target detection network includes being sequentially connected Big core depth separation convolutional layer, pond layer and full articulamentum, and the parallel classification subnet being connect with the full articulamentum Network and recurrence sub-network.

10. a kind of object detecting device characterized by comprising

Characteristic extracting module obtains described to be detected for carrying out feature extraction by feature extraction network handles detection image The characteristic pattern of image；The feature extraction network includes multiple convolutional layers, at least one convolutional layer in the multiple convolutional layer Including one or more structural units；Each structural unit includes at least two parallel channel branch and is connected to institute State the concatenation unit and channel rearrangement units of channel branch tail end；

11. a kind of electronic equipment, which is characterized in that including image collecting device, memory and processor；

Described image acquisition device, for acquiring image data；

The computer program that can be run on the processor is stored in the memory, the processor executes the calculating The step of method described in any one of the claims 1~6 is realized when machine program.

12. a kind of computer readable storage medium, computer program, feature are stored on the computer readable storage medium It is, the step of method described in any one of the claims 1~6 is executed when the computer program is run by processor Suddenly.