CN108875763A

CN108875763A - Object detection method and object detecting device

Info

Publication number: CN108875763A
Application number: CN201710348872.0A
Authority: CN
Inventors: 张弛
Original assignee: Beijing Megvii Technology Co Ltd; Beijing Maigewei Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd; Beijing Maigewei Technology Co Ltd
Priority date: 2017-05-17
Filing date: 2017-05-17
Publication date: 2018-11-23

Abstract

Present disclose provides a kind of object detection method, object detecting device and computer readable storage mediums neural network based.The object detection method includes：Obtain the sequential frame image to be detected comprising target；Using the first feedforward neural network, the fisrt feature information of each frame image in the sequential frame image is obtained；The second feature information of each frame image is obtained based on the fisrt feature information of each frame image using the second Two-way Feedback neural network；And at least one classifier is utilized, it is based on the second feature information, obtains the attribute for corresponding at least one classifier of each pixel of the sequential frame image.

Description

Object detection method and object detecting device

Technical field

This disclosure relates to the field of image processing in artificial intelligence (AI), more specifically, this disclosure relates to a kind of based on mind Object detection method, object detecting device and computer readable storage medium through network.

Background technique

Target detection is a basic research topic in computer vision field, in recognition of face, security monitoring And many aspects such as dynamic tracing all have wide practical use.In target detection, to including pedestrian as target, vehicle Video structural be indispensable in many security protections application.Neural network is a kind of extensive, multi-parameters optimization work Tool.By a large amount of training data, neural network can learn the hiding feature for being difficult to summarize in data out, to complete multinomial Complicated task, such as Face datection, picture classification, object detection, movement tracking, natural language translation etc..Neural network by Artificial intelligence circle is widely applied.Currently, most widely used in the target detection of such as pedestrian detection is convolutional neural networks.

In existing object detection method, often target (pedestrian and vehicle) detection, target tracking and video knot Structure is divided into three independent steps to complete.In target detection step, to each frame image, the pedestrian as target is found Or vehicle, their position and size are showed by frame.Then, the target detected in each frame, according to space The correlates such as position, appearance similarity degree together, to carry out target tracking step.Finally, one tracing path of analysis In the attribute information of pedestrian or vehicle in each frame, realize the purpose of structuring.It, all may be respectively in three above step Additional error is introduced, so that the propagation of error be caused to expand.Especially in detecting step, under crowded environment, frame is simultaneously The position of target cannot be indicated well.For example, a large amount of pedestrian is mutually blocked in crowded crowd, therefore frame also phase Mutually block.If analyzing the attribute of pedestrian using frame, it is easy to because being blocked by other people, and lose information, or introduce Error message.

Summary of the invention

In view of the above problems, the present invention provide a kind of object detection method neural network based, object detecting device with And computer readable storage medium.

According to one embodiment of the disclosure, a kind of object detection method is provided, including：It obtains to be checked comprising target The sequential frame image of survey；Using the first feedforward neural network, obtain each frame image in the sequential frame image first is special Reference breath；It is obtained described each using the second Two-way Feedback neural network based on the fisrt feature information of each frame image The second feature information of frame image；And at least one classifier is utilized, it is based on the second feature information, is obtained described continuous The attribute for corresponding at least one classifier of each pixel of frame image.

In addition, according to the object detection method of one embodiment of the disclosure, wherein at least one described classifier includes First object detects classifier, second position divides classifier and third attributive classification device.

In addition, according to the object detection method of one embodiment of the disclosure, wherein utilize at least one classifier, base In the second feature information, at least one classifier corresponding to described in of each pixel of the sequential frame image is obtained Attribute includes：Based on the second feature information, classifier is detected using the first object, determines the affiliated mesh of each pixel The category attribute of classification is marked, and clusters the pixel with the same category attribute, with the mesh in the determination sequential frame image Mark；Based on the second feature information, classifier is divided using the second position, determines the target in the sequential frame image Various pieces；And the sequential frame image is determined using the third attributive classification device based on the second feature information In target various pieces attribute.

In addition, according to the object detection method of one embodiment of the disclosure, wherein clustering has the same category attribute Pixel includes with the target in the determination sequential frame image：Determine a pixel in the pixel of the same category attribute Point arrives the displacement of its affiliated central point, by the cluster for the central point, determines the pixel for belonging to same target.

In addition, further including according to the object detection method of one embodiment of the disclosure：Determine the picture of the same category attribute A pixel in vegetarian refreshments to the target predetermined number previous frame and the displacement of the central point in frame later, from And the target is tracked.

In addition, according to the object detection method of one embodiment of the disclosure, wherein determine in the sequential frame image The attribute of the various pieces of target includes：Weight is distributed for the various pieces of the target in the sequential frame image, based on described Weight, it is average to the attribute weight of the pixel of the various pieces, with the attribute of the target in the determination sequential frame image.

In addition, according to the object detection method of one embodiment of the disclosure, wherein first feedforward neural network is Convolution feedforward neural network, the second Two-way Feedback neural network are Two-way Feedback convolutional neural networks, and described first Feedforward neural network and the second Two-way Feedback neural network respectively include one or more layers convolutional neural networks.

In addition, according to the object detection method of one embodiment of the disclosure, wherein utilize the second Two-way Feedback nerve net Network, based on the fisrt feature information of each frame image, the second feature information for obtaining each frame image includes：It utilizes Positive feedback neural network in the second Two-way Feedback neural network obtains the positive feature letter of each frame image Breath；Using the reverse feedback neural network in the second Two-way Feedback neural network, the reversed of each frame image is obtained Characteristic information；And the positive characteristic information and the opposite feature information are integrated, the second feature information is obtained, In, the forward direction characteristic information reflection present frame and its before feature of predetermined number of frames, and the opposite feature information is anti- Reflect present frame and its later feature of predetermined number of frames.

According to another embodiment of the present disclosure, a kind of object detecting device is provided, including：Processor；And storage Device, wherein storing computer-readable program instructions, wherein when the computer-readable program instructions are run by the processor Execute following steps：Obtain the sequential frame image to be detected comprising target；Using the first feedforward neural network, the company is obtained The fisrt feature information of each frame image in continuous frame image；Using the second Two-way Feedback neural network, it is based on each frame The fisrt feature information of image obtains the second feature information of each frame image；And utilize at least one classifier, base In the second feature information, at least one classifier corresponding to described in of each pixel of the sequential frame image is obtained Attribute.

In addition, object detecting device according to another embodiment of the present disclosure, wherein at least one described classifier packet Include first object detection classifier, second position divides classifier and third attributive classification device.

In addition, object detecting device according to another embodiment of the present disclosure, wherein in the computer-readable program When instruction is run by the processor, using at least one classifier, it is based on the second feature information, obtains the successive frame The attribute for corresponding at least one classifier of each pixel of image includes：Based on the second feature information, benefit Classifier is detected with the first object, determines the category attribute of each pixel said target classification, and cluster with phase The pixel of generic attribute, with the target in the determination sequential frame image；Based on the second feature information, using described Second position divides classifier, determines the various pieces of the target in the sequential frame image；And it is based on the second feature Information determines the attribute of the various pieces of the target in the sequential frame image using the third attributive classification device.

In addition, object detecting device according to another embodiment of the present disclosure, wherein in the computer-readable program When instruction is run by the processor, the pixel with the same category attribute is clustered, in the determination sequential frame image Target includes：Determine the displacement of a pixel in the pixel of the same category attribute to its affiliated central point, by for The cluster of the central point determines the pixel for belonging to same target.

In addition, object detecting device according to another embodiment of the present disclosure, wherein in the computer-readable program When instruction is run by the processor, also execute：Determine a pixel in the pixel of the same category attribute to the mesh Be marked on predetermined number previous frame and the displacement of the central point in frame later, to be tracked to the target.

In addition, object detecting device according to another embodiment of the present disclosure, wherein in the computer-readable program When instruction is run by the processor, determine that the attribute of the various pieces of the target in the sequential frame image includes：It is described The various pieces of target in sequential frame image distribute weight, the weight are based on, to the category of the pixel of the various pieces Property weighted average, with the attribute of the target in the determination sequential frame image.

In addition, object detecting device according to another embodiment of the present disclosure, wherein first feedforward neural network For convolution feedforward neural network, the second Two-way Feedback neural network is Two-way Feedback convolutional neural networks, and described the One feedforward neural network and the second Two-way Feedback neural network respectively include one or more layers convolutional neural networks.

In addition, object detecting device according to another embodiment of the present disclosure, wherein in the computer-readable program When instruction is run by the processor, using the second Two-way Feedback neural network, the fisrt feature based on each frame image Information, the second feature information for obtaining each frame image include：Using in the second Two-way Feedback neural network just To Feedback Neural Network, the positive characteristic information of each frame image is obtained；Utilize the second Two-way Feedback neural network In reverse feedback neural network, obtain the opposite feature information of each frame image；And the comprehensive positive feature letter Breath and the opposite feature information, obtain the second feature information, wherein it is described forward direction characteristic information reflection present frame and its The feature of predetermined number of frames before, and the opposite feature message reflection present frame and its feature of predetermined number of frames later.

According to another embodiment of the disclosure, a kind of object detecting device is provided, including：Image collection module is used In the sequential frame image to be detected that acquisition includes target；Fisrt feature data obtaining module, for utilizing the first feed forward neural Network obtains the fisrt feature information of each frame image in the sequential frame image；Second feature data obtaining module, is used for Each frame image is obtained based on the fisrt feature information of each frame image using the second Two-way Feedback neural network Second feature information；And pixel property obtains module, for utilizing at least one classifier, is believed based on the second feature Breath obtains the attribute for corresponding at least one classifier of each pixel of the sequential frame image.

In addition, according to the object detecting device of another embodiment of the disclosure, wherein at least one described classifier packet Include first object detection classifier, second position divides classifier and third attributive classification device.

In addition, according to the object detecting device of another embodiment of the disclosure, wherein the pixel property obtains module Based on the second feature information, classifier is detected using the first object, determines each pixel said target classification Category attribute, and the pixel with the same category attribute is clustered, with the target in the determination sequential frame image；Based on institute Second feature information is stated, classifier is divided using the second position, determines each portion of the target in the sequential frame image Point；And the target in the sequential frame image is determined using the third attributive classification device based on the second feature information Various pieces attribute.

In addition, according to the object detecting device of another embodiment of the disclosure, wherein the pixel property obtains module The displacement of a pixel in the pixel of the same category attribute to its affiliated central point is determined, by for the central point Cluster, determine and belong to the pixel of same target.

In addition, further including according to the object detecting device of another embodiment of the disclosure：Target tracking module, is used for Determine a pixel in the pixel of the same category attribute to the target predetermined number previous frame and later in frame The central point displacement, to be tracked to the target.

In addition, according to the object detecting device of another embodiment of the disclosure, wherein the pixel property obtains module Weight is distributed for the various pieces of the target in the sequential frame image, the weight is based on, to the pixel of the various pieces The attribute weight of point is average, with the attribute of the target in the determination sequential frame image.

In addition, according to the object detecting device of another embodiment of the disclosure, wherein the second feature acquisition of information Module obtains the forward direction of each frame image using the positive feedback neural network in the second Two-way Feedback neural network Characteristic information obtains each frame image using the reverse feedback neural network in the second Two-way Feedback neural network Opposite feature letter, and the comprehensive positive characteristic information and the opposite feature information obtains the second feature information, Wherein, the positive characteristic information reflection present frame and its feature of predetermined number of frames, and the opposite feature information before Reflect present frame and its later feature of predetermined number of frames.

According to the further embodiment of the disclosure, a kind of computer readable storage medium is provided, stores computer thereon Readable program instructions execute the target inspection included the following steps when the computer-readable program instructions are run by processor Survey method：Obtain the sequential frame image to be detected comprising target；Using the first feedforward neural network, the successive frame figure is obtained The fisrt feature information of each frame image as in；Using the second Two-way Feedback neural network, based on each frame image Fisrt feature information obtains the second feature information of each frame image；And at least one classifier is utilized, based on described Second feature information obtains the attribute for corresponding at least one classifier of each pixel of the sequential frame image.

It can according to object detection method neural network based, object detecting device and the computer of the embodiment of the present disclosure Read storage medium, by being used in combination for convolutional neural networks and feedback neural network, carry out detection, the tracking of Pixel-level with And attribute information obtains, and improves the efficiency of target detection, and avoids and introduce unnecessary error, provides the essence of detection Degree.

It is to be understood that foregoing general description and following detailed description are both illustrative, and it is intended to In the further explanation of the claimed technology of offer.

Detailed description of the invention

The embodiment of the present invention is described in more detail in conjunction with the accompanying drawings, the above and other purposes of the present invention, Feature and advantage will be apparent.Attached drawing is used to provide to further understand the embodiment of the present invention, and constitutes explanation A part of book, is used to explain the present invention together with the embodiment of the present invention, is not construed as limiting the invention.In the accompanying drawings, Identical reference label typically represents same parts or step.

Fig. 1 is the flow chart for illustrating the object detection method according to the embodiment of the present disclosure.

Fig. 2 is the structural schematic diagram for illustrating the neural network for target detection according to the embodiment of the present disclosure.

Fig. 3 is further diagram according to the second feature information acquisition process in the object detection method of the embodiment of the present disclosure Flow chart.

Fig. 4 is detail flowchart of the further diagram according to the object detection method of the embodiment of the present disclosure.

Fig. 5 is signal of the further diagram according to the Pixel-level classification processing of the object detection method of the embodiment of the present disclosure Figure.

Fig. 6 is the structural schematic diagram for illustrating the object detecting device according to the embodiment of the present disclosure.

Fig. 7 is the functional block diagram for illustrating the object detecting device according to the embodiment of the present disclosure.

Fig. 8 is the schematic diagram for illustrating the computer readable storage medium according to the embodiment of the present disclosure.

Specific embodiment

In order to enable the purposes, technical schemes and advantages of the disclosure become apparent, root is described in detail below with reference to accompanying drawings According to the example embodiment of the disclosure.Obviously, described embodiment is only a part of this disclosure embodiment, rather than this public affairs The whole embodiments opened, it should be appreciated that the disclosure is not limited by example embodiment described herein.Based on described in the disclosure The embodiment of the present disclosure, those skilled in the art's obtained all other embodiment in the case where not making the creative labor It should all fall within the protection scope of the disclosure.

This disclosure relates to neural network based by convolutional neural networks and being used in combination for feedback neural network Object detection method, object detecting device and computer readable storage medium.Hereinafter, the disclosure will be described in detail with reference to the attached drawings Each embodiment.

Fig. 1 is the flow chart for illustrating the object detection method according to the embodiment of the present disclosure.Fig. 2 is diagram according to disclosure reality Apply the structural schematic diagram of the neural network for target detection of example.The structural representation of neural network as shown in connection with fig. 2 herein The process of figure description object detection method.As shown in Figure 1, being included the following steps according to the target detection side of the embodiment of the present disclosure.

In step s101, the sequential frame image to be detected comprising target is obtained.In one embodiment of the present disclosure, The monitoring camera for the image data that can obtain monitoring scene can be configured in monitoring scene as image collection module.It obtains Taking image to be detected comprising target includes but is not limited to acquire figure by the image collection module of physically configured separate As via wired or wireless mode, receiving from described image and obtaining the video data that module is sent after data.It is alternative Ground, image collection module can be physically located at the module or component of other in object detecting device same position even on Same casing internal, other module or components in object detecting device, which are received via internal bus from described image, obtains module The video data of transmission.Alternatively, image collection module directly can be received to be transferred in object detecting device from outside and be used In the video data of target detection.In one embodiment of the present disclosure, image to be detected can be image collection module acquisition The original image arrived is also possible to the image obtained after being pre-processed to original image.As shown in Figure 2 schematically, It obtains comprising the sequential frame image F (t-1) to be detected of target, F (t) and F (t+1).It is easily understood that Fig. 2 only illustrates Continuous three frames image is shown to property, but the scope of the present disclosure is without being limited thereto.

Hereafter, processing enters step S102.

In step s 102, using the first feedforward neural network, first of each frame image in sequential frame image is obtained Characteristic information.It will be described in as follows, in one embodiment of the present disclosure, the first feedforward neural network includes one layer or more Layer convolutional neural networks (CNN).Convolution unit in the case where multilayer convolutional neural networks, in each layer of convolutional neural networks The surrounding cells in a part of coverage area can be responded.The parameter of each convolution unit can be optimized by back-propagation algorithm It obtains.

In one embodiment of the present disclosure, the purpose of convolution algorithm is to extract the different characteristic of input.For example, first layer Convolutional neural networks can only extract some low-level features, the levels such as edge, lines and corner；Then, the volume of more layers Product neural network can from low-level features the more complicated feature of iterative extraction.In one embodiment of the present disclosure, for one Image (that is, frame image in video data) is one three by the fisrt feature information that the first feedforward neural network extracts Tie up tensor X.Three dimensions of the three-dimensional tensor respectively represent transverse direction, longitudinal direction and channel.In one embodiment of the present disclosure, nothing The feature of artificially defined image is needed, three-dimensional tensor X is automatically extracted by the first feedforward neural network (convolutional neural networks). Random initializtion can be carried out for the parameter of the convolutional neural networks, trained network has been (such as before also can use VGG, ResNet etc.) it is initialized.For these trained networks, certain parts therein can be chosen as this A part of disclosed first feedforward neural network can also fix a part of parameter and be not involved in training.It should be noted that In one embodiment of the disclosure, in order to realize the operation of pixel scale, after convolution and pond, need using interpolation and The methods of shearing keeps growing for three-dimensional tensor equal with the length and width for being wider than input picture.

As shown in Figure 2 schematically, sequential frame image F (t-1), F (t) and F (t+1) input the first feedforward neural network In 201.First feedforward neural network 201 is shown schematically as including two layers of convolutional neural networks (CNN).It is readily comprehensible It is that the scope of the present disclosure is without being limited thereto, the first feedforward neural network may include one or more layers convolutional neural networks.Such as figure Shown in 2, first layer convolutional neural networks can only extract some low-level features, the levels such as edge, lines and corner；With Afterwards, the convolutional neural networks of the second layer can from low-level features the more complicated feature of iterative extraction.

Hereafter, processing enters step S103.

In step s 103, it is obtained using the second Two-way Feedback neural network based on the fisrt feature information of each frame image Take the second feature information of each frame image.It will be described in as follows, in one embodiment of the present disclosure, by the first feedforward Fisrt feature information (that is, the three-dimensional tensor X) input for each frame image zooming-out in sequential frame image of neural network Two Two-way Feedback neural networks.It will be described in as follows, in one embodiment of the present disclosure, the second Two-way Feedback nerve net Network includes one or more layers convolution Feedback Neural Network (RNN).Can connect composition between the neuron of feedback neural network has Xiang Tu, by the way that circulating transfer, feedback neural network can receive wider time series knot in own net by state Structure input.That is, the predetermined number of each frame image described in the second feature informix of each frame image Previous frame and the second feature information of frame later.The second feature information is also a three-dimensional tensor, is combined Before and after information in each frame, the feature new as each frame image.In addition, in one embodiment of the present disclosure, such as Fruit is using the output characteristic sequence of one layer of Feedback Neural Network as input, then inputs in next layer of feedback neural network, just Form multi-layer biaxially oriented Feedback Neural Network.

As shown in Figure 2 schematically, it is input to by the fisrt feature information 203 that the first feedforward neural network 201 extracts Second Two-way Feedback neural network 202.Second Two-way Feedback neural network 202 is shown schematically as including two layers of feedback neural Network (RNN).It is easily understood that the scope of the present disclosure is without being limited thereto, the second Two-way Feedback neural network may include one layer Or more layer Feedback Neural Network.Digraph is connected and composed between the neuron of feedback neural network, each frame image The predetermined number of the comprehensive each frame image of the second feature information 204 previous frame and the second feature of frame later Information 204.Hereafter, specifically Fig. 3 will be combined to describe second feature information acquisition process.

Hereafter, processing enters step S104.

In step S104, using at least one classifier, it is based on second feature information, obtains each of sequential frame image The attribute corresponding at least one classifier of pixel.It will be described in as follows, second feature information be separately input to In three in the classifier of different levels, so as to obtain correspond to input picture each pixel different levels attribute information. Hereinafter, by the process that second feature information is separately input to the classifier of different levels in three is described referring to Fig. 5.

Additionally, it should be appreciated that referring to Fig. 2 shows neural network structure be only exemplary, the disclosure is unlimited In this.For realizing according to the neural network of the object detection method of the embodiment of the present disclosure first pass through in advance a large amount of sample data into Row training, each ginseng in convolutional network (CNN) and convolution reaction type network (RNN) is obtained using such as back-propagation algorithm Number.When carrying out target detection and tracking, the parameter of neural network it is known that each convolutional network and convolution reaction type network output It is extracted and comprehensive target signature.

More than, it is outlined by the structural schematic diagram of the neural network of the flow chart and Fig. 2 of Fig. 1 according to the embodiment of the present disclosure Object detection method.As described above, the object detection method of the embodiment of the present disclosure passes through convolutional neural networks and reaction type mind Combined use through network realizes the synthesis for the multiframe information in video to be detected, and for view to be detected Each pixel of frequency image, at the same to detected, track and attribute information obtain.

Fig. 3 is further diagram according to the second feature information acquisition process in the object detection method of the embodiment of the present disclosure Flow chart.After the step S102 described referring to Fig.1, the second spy is entered according to the object detection method of the embodiment of the present disclosure Levy information acquisition process.

As shown in figure 3, in step S301, using the positive feedback neural network in the second Two-way Feedback neural network, Obtain the positive characteristic information of each frame image.In one embodiment of the present disclosure, what positive feedback neural network obtained is every The positive characteristic information of one frame image can be expressed as：

Y_t=W*X_t+V*Y_ { t-1 }+b expression formula (1)

Wherein, W, V, b are the parameter of feedback neural network, and Y_t is the output result of t frame.

In the disclosure, using convolution feedback neural network, above-mentioned expression formula can be expressed as

Wherein, with convolution instead of the multiplication in general feedback neural network.In this way, information in each frame of synthesis When, the network unit in Feedback Neural Network only responds the surrounding cells in a part of coverage area, makes the ginseng of network in this way Number greatly reduces.In the expression formula of above-mentioned convolution feedback neural network, the output Y_t of t frame is a three-dimensional tensor.This Afterwards, processing enters step S302.

In step s 302, using the reverse feedback neural network in the second Two-way Feedback neural network, each frame is obtained The opposite feature information of image.Similar to above-mentioned expression formula (2), reverse feedback neural network obtain each frame image it is reversed Characteristic information can be expressed as：

Hereafter, processing enters step S303.

In step S303, comprehensive forward direction characteristic information and opposite feature information obtain second feature information.In the disclosure One embodiment in, for the information of each frame before allowing each frame in video that can not only integrate it, can also integrate it The information of each frame later uses Two-way Feedback formula neural network (for example, as shown in Figure 2).Comprehensive forward direction characteristic information and reversed The second feature information that characteristic information obtains can be expressed as：

H_t=concate (Y_t, Z_t) expression formula (4)

Wherein, Y_t is that positive feedback formula neural network is exported in t frame as a result, Z_t is reverse feedback formula neural network T frame output as a result, H_t merges Y_t and Z_t, i.e.,

H_t (x, y, c)=Y_t (x, y, c) if c<=C

H_t (x, y, c)=Z_t (x, y, c-C) if c>C expression formula (5)

Wherein, C is the channel number of Y_t, as whole network t frame output.The H_t indicated by expression formula (5) It is a three-dimensional tensor, its information before and after combining in each frame, the second feature information as new t frame.This Afterwards, processing can proceed onto the step S104 described referring to Fig.1, further to execute based on the second feature information Pixel-level classification processing.

Hereinafter, Fig. 4 and Fig. 5 detailed description will be referred to further according to the object detection method of the embodiment of the present disclosure, especially It is the Pixel-level classification processing according to the object detection method of the embodiment of the present disclosure.

Fig. 4 is detail flowchart of the further diagram according to the object detection method of the embodiment of the present disclosure.Fig. 5 is further Illustrate the schematic diagram of the Pixel-level classification processing of the object detection method according to the embodiment of the present disclosure.

Step S401 to S403 as shown in Figure 4 is identical as step S101 to the S103 difference described referring to Fig.1, herein will Omit its repeated description.

After the second feature information for obtaining each frame image in step S403, processing enters step S404.

In step s 404, it is based on second feature information, classifier is detected using first object, determines each pixel institute Belong to the category attribute of target category, and cluster the pixel with the same category attribute, to determine the mesh in sequential frame image Mark.As shown in Figure 5 schematically, second feature information 304 inputs target detection classifier 501, determines each pixel institute The target category information 502 of category.For example, the category attribute of each pixel said target classification may include target be pedestrian, Vehicle and background etc..Further, it is determined that a pixel in the pixel of the same category attribute is to its affiliated central point Displacement determines the pixel for belonging to same target by the cluster for the central point.Wherein, the same category category is being determined A pixel in the pixel of property to its affiliated center displacement when, may include that whole pixels are operated, It may include that partial pixel point is operated.For example, can distinguish by the method for cluster and belong to pedestrian/type of vehicle Pixel which specific pedestrian/vehicle belonged to.Hereafter, processing enters step S405.

Hereafter, in step S405, determine that in the pixel of the same category attribute pixel exists to the target Predetermined number previous frame and the displacement of the central point in frame later, to be tracked to the target.Pass through basis Predetermined number previous frame and the tracking of the displacement of the central point in frame later, can recall target previous frame in institute The position at place and prediction target in frame later may the location of, and even if blocked or the feelings of partial occlusion existing Under condition, reliable target tracking also may be implemented.

In step S406, it is based on second feature information, classifier is divided using second position, determines in sequential frame image Target various pieces.As shown in Figure 5 schematically, 304 input site of second feature information divides classifier 503, really Site categories information 504 belonging to fixed each pixel.For example, the head of pedestrian, upper body, the lower part of the body, shoes, back can be distinguished The parts such as packet, bag, trolley case, umbrella；And the parts such as vehicle body, vehicle window, license plate of vehicle.Hereafter, processing enters step S407。

In step S408, the mesh in sequential frame image is determined using third attributive classification device based on second feature information The attribute of target various pieces.As shown in Figure 5 schematically, second feature information 304 inputs pixel property classifier 505, Determine pixel property information 506 belonging to each pixel.Determine the category of the various pieces of the target in the sequential frame image Property includes：Weight is distributed for the various pieces of the target in the sequential frame image, the weight is based on, to the various pieces Pixel attribute weight it is average, with the attribute of the target in the determination sequential frame image.Specifically, attribute and pixel institute Position have a close correlation, such as the color of jacket is only related with the pixel for belonging to upper body, and gender and belongs to head The pixel relationship in portion is most close, but also related with the pixel for belonging to upper body and the lower part of the body.Therefore each position corresponds to every The weight of one attribute is weighted and averaged all properties and the position belonging to them in pedestrian/vehicle pixel is belonged to, It can be obtained by the attribute information of entire pedestrian/vehicle.

Fig. 6 is the schematic diagram for illustrating the object detecting device according to the embodiment of the present disclosure.

As shown in fig. 6, including one or more processors 602 according to the object detecting device 600 of the embodiment of the present disclosure, depositing Reservoir 604, image collecting device 606 and output device 608, the company that these components pass through bus system 610 and/or other forms The interconnection of connection mechanism (not shown).It should be noted that the component and structure of object detecting device 600 shown in fig. 6 are only exemplary, And not restrictive, as needed, object detecting device 600 also can have other assemblies and structure.

Processor 602 can be central processing unit (CPU) or have data-handling capacity and/or instruction execution capability Other forms processing unit, and can control other components in object detecting device 600 to execute desired function.

Memory 604 may include one or more computer program products, and the computer program product may include Various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.The volatibility is deposited Reservoir for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non-volatile Memory for example may include read-only memory (ROM), hard disk, flash memory etc..It can be on the computer readable storage medium One or more computer program instructions are stored, processor 62 can run described program instruction, to realize following steps：It obtains Sequential frame image to be detected comprising target；Using the first feedforward neural network, obtain each in the sequential frame image The fisrt feature information of frame image；Using the second Two-way Feedback neural network, the fisrt feature letter based on each frame image Breath obtains the second feature information of each frame image；And at least one classifier is utilized, believed based on the second feature Breath obtains the attribute for corresponding at least one classifier of each pixel of the sequential frame image.In addition, described The one or more computer program instructions stored on computer readable storage medium can also be held when being run by processor 602 All steps for the object detection method according to the embodiment of the present disclosure that row is described above with reference to attached drawing.Described computer-readable Various application programs and various data, such as the training image of input, loss function, each pixel can also be stored in storage medium Forecast confidence and true confidence level etc..

Image collecting device 606 can be used for acquiring training image with training objective and for target detection to Video image is detected, and by captured image storage in the memory 604 for the use of other components.It is of course also possible to The training image and image to be detected are acquired using other image capture devices, and the image of acquisition is sent to target inspection Survey device 600.

Output device 608 can export various information to external (such as user), for example, image information, training result and Object detection results.The output device 608 may include one or more in display, loudspeaker, projector, network interface card etc. It is a.

Fig. 7 is the functional block diagram for illustrating the object detecting device according to the embodiment of the present disclosure.It is as shown in Figure 7 according to this The object detecting device 700 of open embodiment can be used for executing the target as shown in Figure 1 and Figure 4 according to the embodiment of the present disclosure Detection method.As shown in fig. 7, including image collection module 701, first according to the object detecting device 700 of the embodiment of the present disclosure Characteristic information obtains module 702, second feature data obtaining module 703, pixel property and obtains module 704 and target tracking module 705。

Specifically, described image obtains module 701 for obtaining the sequential frame image comprising target.At one of the disclosure In embodiment, described image, which obtains module 701, can be the picture number that can obtain monitoring scene configured in monitoring scene According to monitoring camera.Described image, which obtains module 701, can separate distribution on each module physical position, and via wired or Person's wireless mode obtains module 701 from described image and sends image data to each module thereafter.Alternatively, described image Obtain module 701 can be physically located at other module or components in object detecting device 700 same position even on Same casing internal, other module or components in object detecting device 700, which are received via internal bus from described image, to be obtained The image data that module 701 is sent.Alternatively, described image acquisition module 701, which can also be received from outside, is transferred to target The video data of target detection is used in detection device.

Hereafter, fisrt feature data obtaining module 702, second feature data obtaining module 703 and pixel property obtain mould Block 704 can be by central processing unit (CPU) or the other forms with data-handling capacity and/or instruction execution capability The general or specialized processing unit of processing unit configures.Fisrt feature data obtaining module 702 is used to utilize the first feed forward neural Network obtains the fisrt feature information of each frame image in the sequential frame image.Second feature data obtaining module 703 is used In utilizing the second Two-way Feedback neural network, based on the fisrt feature information of each frame image, each frame figure is obtained The second feature information of picture.Pixel property obtains module 704 and is used to utilize at least one classifier, is believed based on the second feature Breath obtains the attribute for corresponding at least one classifier of each pixel of the sequential frame image.

More specifically, the second feature data obtaining module 703 is using in the second Two-way Feedback neural network Positive feedback neural network obtains the positive characteristic information of each frame image, utilizes the second Two-way Feedback nerve net Reverse feedback neural network in network obtains the opposite feature information of each frame image, and the comprehensive positive feature Information and the opposite feature information, obtain the second feature information, wherein it is described forward direction characteristic information reflection present frame and The feature of its predetermined number of frames previous, and the opposite feature message reflection present frame and its spy of predetermined number of frames later Sign.

The pixel property obtains module 704 and is based on the second feature information, is detected and is classified using the first object Device determines the category attribute of each pixel said target classification, and clusters the pixel with the same category attribute, with true Target in the fixed sequential frame image；Based on the second feature information, classifier is divided using the second position, is determined The various pieces of target in the sequential frame image；And it is based on the second feature information, utilize the third attribute point Class device determines the attribute of the various pieces of the target in the sequential frame image.The pixel property obtains module 704 and determines phase A pixel in the pixel of generic attribute passes through gathering for the central point to the displacement of its affiliated central point Class determines the pixel for belonging to same target.Also, it is in the sequential frame image that the pixel property, which obtains module 704, The various pieces of target distribute weight, are based on the weight, average to the attribute weight of the pixel of the various pieces, with true The attribute of target in the fixed sequential frame image.

The target tracking module 705 is used to determine a pixel in the pixel of the same category attribute to the mesh Be marked on predetermined number previous frame and the displacement of the central point in frame later, to be tracked to the target.

Fig. 8 is the schematic diagram for illustrating the computer readable storage medium according to the embodiment of the present disclosure.As shown in figure 8, according to The computer readable storage medium 800 of the embodiment of the present disclosure is stored thereon with computer-readable program instructions 801.When the calculating When machine readable program instructions 801 are run by processor, the target according to the embodiment of the present disclosure referring to the figures above description is executed Detection method.

More than, according to object detection method neural network based, object detecting device and the meter of the embodiment of the present disclosure Calculation machine readable storage medium storing program for executing.By the object detection method according to the embodiment of the present disclosure, the detection, tracking and attribute of Pixel-level Extraction process merges, and avoids introducing unnecessary error, and eliminates interference of the background to character attribute.In addition, According to the object detection method of the embodiment of the present disclosure, screening can be effectively treated in the case where dense population, wagon flow, serious shielding Gear problem.For example, certain target is blocked for example, working as in a frame or multiframe, since the target occurs in multiframe before, To the target be estimated in the position of present frame according to predictive information.Further, by according to the embodiment of the present disclosure The reduction of detection error may be implemented in object detection method.For example, when a target is only missed in certain frame, then before The information of multiframe afterwards can predict position of the target in the missing inspection frame.Similarly, if in certain frame, a target It is erroneous detection, then counting frame before and counting the information of frames later, it can be determined that it is insincere to go out the target that this is detected.

The basic principle of the disclosure is described in conjunction with specific embodiments above, however, it is desirable to, it is noted that in the disclosure The advantages of referring to, advantage, effect etc. are only exemplary rather than limitation, must not believe that these advantages, advantage, effect etc. are the disclosure Each embodiment is prerequisite.In addition, detail disclosed above is merely to exemplary effect and the work being easy to understand With, rather than limit, it is that must be realized using above-mentioned concrete details that above-mentioned details, which is not intended to limit the disclosure,.

Device involved in the disclosure, device, equipment, system block diagram only as illustrative example and be not intended to It is required that or hint must be attached in such a way that box illustrates, arrange, configure.As those skilled in the art will appreciate that , it can be connected by any way, arrange, configure these devices, device, equipment, system.Such as "include", "comprise", " tool " etc. word be open vocabulary, refer to " including but not limited to ", and can be used interchangeably with it.Vocabulary used herein above "or" and "and" refer to vocabulary "and/or", and can be used interchangeably with it, unless it is not such that context, which is explicitly indicated,.Here made Vocabulary " such as " refers to phrase " such as, but not limited to ", and can be used interchangeably with it.

In addition, as used herein, the "or" instruction separation used in the enumerating of the item started with "at least one" It enumerates, so that enumerating for such as " at least one of A, B or C " means A or B or C or AB or AC or BC or ABC (i.e. A and B And C).In addition, wording " exemplary " does not mean that the example of description is preferred or more preferable than other examples.

It may also be noted that in the system and method for the disclosure, each component or each step are can to decompose and/or again Combination nova.These decompose and/or reconfigure the equivalent scheme that should be regarded as the disclosure.

The technology instructed defined by the appended claims can not departed from and carried out to the various of technology described herein Change, replace and changes.In addition, the scope of the claims of the disclosure is not limited to process described above, machine, manufacture, thing Composition, means, method and the specific aspect of movement of part.Can use carried out to corresponding aspect described herein it is essentially identical Function or realize essentially identical result there is currently or later to be developed processing, machine, manufacture, event group At, means, method or movement.Thus, appended claims include such processing, machine, manufacture, event within its scope Composition, means, method or movement.

The above description of disclosed aspect is provided so that any person skilled in the art can make or use this It is open.Various modifications in terms of these are readily apparent to those skilled in the art, and are defined herein General Principle can be applied to other aspect without departing from the scope of the present disclosure.Therefore, the disclosure is not intended to be limited to Aspect shown in this, but according to principle disclosed herein and the consistent widest range of novel feature.

In order to which purpose of illustration and description has been presented for above description.In addition, this description is not intended to the reality of the disclosure It applies example and is restricted to form disclosed herein.Although already discussed above multiple exemplary aspects and embodiment, this field skill Its certain modifications, modification, change, addition and sub-portfolio will be recognized in art personnel.

Claims

1. a kind of object detection method, including：

Obtain the sequential frame image to be detected comprising target；

Using the first feedforward neural network, the fisrt feature information of each frame image in the sequential frame image is obtained；

Each frame is obtained based on the fisrt feature information of each frame image using the second Two-way Feedback neural network The second feature information of image；And

Using at least one classifier, it is based on the second feature information, obtains each pixel of the sequential frame image Attribute corresponding at least one classifier.

2. object detection method as described in claim 1, wherein at least one described classifier includes first object detection point Class device, second position divide classifier and third attributive classification device.

3. object detection method as claimed in claim 2, wherein utilize at least one classifier, be based on the second feature Information, the attribute for corresponding at least one classifier for obtaining each pixel of the sequential frame image include：

Based on the second feature information, classifier is detected using the first object, determines each pixel said target class Other category attribute, and the pixel with the same category attribute is clustered, with the target in the determination sequential frame image；

Based on the second feature information, classifier is divided using the second position, determines the mesh in the sequential frame image Target various pieces；And

The target in the sequential frame image is determined using the third attributive classification device based on the second feature information The attribute of various pieces.

4. object detection method as claimed in claim 3, wherein the pixel with the same category attribute is clustered, with determination Target in the sequential frame image includes：

The displacement of a pixel in the pixel of the same category attribute to its affiliated central point is determined, by in described The cluster of heart point determines the pixel for belonging to same target.

5. object detection method as claimed in claim 4, further includes：

Determine a pixel in the pixel of the same category attribute to the target predetermined number previous frame and later The displacement of the central point in frame, to be tracked to the target.

6. object detection method as claimed in claim 3, wherein determine the various pieces of the target in the sequential frame image Attribute include：

Weight is distributed for the various pieces of the target in the sequential frame image, the weight is based on, to the various pieces The attribute weight of pixel is average, with the attribute of the target in the determination sequential frame image.

7. such as described in any item object detection methods of claims 1 to 6, wherein first feedforward neural network is volume Product feedforward neural network, the second Two-way Feedback neural network are Two-way Feedback convolutional neural networks, and before described first Feedback neural network and the second Two-way Feedback neural network respectively include one or more layers convolutional neural networks.

8. such as described in any item object detection methods of claims 1 to 6, wherein the second Two-way Feedback neural network is utilized, Based on the fisrt feature information of each frame image, the second feature information for obtaining each frame image includes：

Using the positive feedback neural network in the second Two-way Feedback neural network, the forward direction of each frame image is obtained Characteristic information；

Using the reverse feedback neural network in the second Two-way Feedback neural network, the reversed of each frame image is obtained Characteristic information；And

The comprehensive positive characteristic information and the opposite feature information, obtain the second feature information,

Wherein, the positive characteristic information reflection present frame and its feature of predetermined number of frames, and the opposite feature before The feature of message reflection present frame and its later predetermined number of frames.

9. a kind of object detecting device, including：

Processor；And

Memory, wherein computer-readable program instructions are stored,

Wherein, following steps are executed when the computer-readable program instructions are run by the processor：

Obtain the sequential frame image to be detected comprising target；

10. object detecting device as claimed in claim 9, wherein at least one described classifier includes first object detection Classifier, second position divide classifier and third attributive classification device.

11. object detecting device as claimed in claim 10, wherein in the computer-readable program instructions by the processing When device is run, using at least one classifier, it is based on the second feature information, obtains each pixel of the sequential frame image That puts, which corresponds to the attribute of at least one classifier, includes：

12. object detecting device as claimed in claim 11, wherein in the computer-readable program instructions by the processing When device is run, the pixel with the same category attribute is clustered, includes with the target in the determination sequential frame image：

13. object detecting device as claimed in claim 11, wherein in the computer-readable program instructions by the processing When device is run, also execute：

14. object detecting device as claimed in claim 11, wherein in the computer-readable program instructions by the processing When device is run, determine that the attribute of the various pieces of the target in the sequential frame image includes：

15. such as described in any item object detecting devices of claim 9 to 14, wherein first feedforward neural network is Convolution feedforward neural network, the second Two-way Feedback neural network are Two-way Feedback convolutional neural networks, and described first Feedforward neural network and the second Two-way Feedback neural network respectively include one or more layers convolutional neural networks.

16. such as described in any item object detecting devices of claim 9 to 14, wherein refer in the computer-readable program When enabling by processor operation, using the second Two-way Feedback neural network, the fisrt feature letter based on each frame image Breath, the second feature information for obtaining each frame image include：

17. a kind of object detecting device, including：

Image collection module, for obtaining the sequential frame image to be detected comprising target；

Fisrt feature data obtaining module obtains each in the sequential frame image for utilizing the first feedforward neural network The fisrt feature information of frame image；

Second feature data obtaining module, for utilizing the second Two-way Feedback neural network, based on each frame image the One characteristic information obtains the second feature information of each frame image；And

Pixel property obtains module, for utilizing at least one classifier, is based on the second feature information, obtains described continuous The attribute for corresponding at least one classifier of each pixel of frame image.

18. object detecting device as claimed in claim 17, wherein at least one described classifier includes first object detection Classifier, second position divide classifier and third attributive classification device.

19. object detecting device as claimed in claim 18, wherein the pixel property obtains module and is based on second spy Reference breath detects classifier using the first object, determines the category attribute of each pixel said target classification, and gather Class has the pixel of the same category attribute, with the target in the determination sequential frame image；

20. object detecting device as claimed in claim 19, wherein the pixel property obtains module and determines the same category category Property pixel in a pixel to its affiliated central point displacement, pass through the cluster for the central point, determine belong to In the pixel of same target.

21. object detecting device as claimed in claim 19, further includes：

Target tracking module, a pixel in the pixel for determining the same category attribute is to the target in predetermined number The displacement of the central point in the previous frame of purpose and later frame, to be tracked to the target.

22. object detecting device as claimed in claim 19, wherein it is the successive frame figure that the pixel property, which obtains module, The various pieces of target as in distribute weight, are based on the weight, flat to the attribute weight of the pixel of the various pieces , with the attribute of the target in the determination sequential frame image.

23. such as described in any item object detecting devices of claim 17 to 22, wherein the second feature acquisition of information mould For block using the positive feedback neural network in the second Two-way Feedback neural network, the forward direction for obtaining each frame image is special Reference breath obtains each frame image using the reverse feedback neural network in the second Two-way Feedback neural network Opposite feature letter, and the comprehensive positive characteristic information and the opposite feature information, obtain the second feature information,