WO2018054329A1 - 物体检测方法和装置、电子设备、计算机程序和存储介质 - Google Patents

物体检测方法和装置、电子设备、计算机程序和存储介质 Download PDF

Info

Publication number
WO2018054329A1
WO2018054329A1 PCT/CN2017/102691 CN2017102691W WO2018054329A1 WO 2018054329 A1 WO2018054329 A1 WO 2018054329A1 CN 2017102691 W CN2017102691 W CN 2017102691W WO 2018054329 A1 WO2018054329 A1 WO 2018054329A1
Authority
WO
WIPO (PCT)
Prior art keywords
object candidate
feature
feature vector
candidate frame
feature vectors
Prior art date
Application number
PCT/CN2017/102691
Other languages
English (en)
French (fr)
Inventor
曾星宇
欧阳万里
杨斌
闫俊杰
王晓刚
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2018054329A1 publication Critical patent/WO2018054329A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2111Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms

Definitions

  • the present application relates to computer vision technology, and more particularly to an object detection method and apparatus, an electronic device, a computer program, and a storage medium.
  • Object detection is one of the most basic problems in the field of computer vision. It contains the most basic semantic analysis of pictures and videos, and has been invested in a lot of research.
  • the goal of an object detection system is to frame a target object in a picture or video using a box (called an object candidate box). For a long time, the object detection problem has been regarded as a classification problem. For each object candidate frame, the object detection system automatically determines which object is within the object candidate frame.
  • object detection due to various shooting angles, object shape, illumination changes, complex and varied backgrounds, object detection has always been a complex and challenging problem.
  • the embodiment of the present application provides a technical solution for performing object detection.
  • an object detecting method includes:
  • M+1 feature vectors corresponding to M+1 object candidate frames from at least one feature image of the image to be detected, respectively, by using each object candidate frame in the L object candidate frames as a current object candidate frame
  • the M+1 object candidate frame includes the current object candidate frame and its M associated object candidate frames, and the associated object candidate frame has the same center point and different height as the current object candidate frame And / or width;
  • M is an integer greater than 0;
  • an object detecting apparatus includes:
  • An object positioning unit configured to perform object positioning on the image to be detected, to obtain L object candidate frames; wherein L is an integer greater than 0;
  • a feature extraction unit configured to extract, according to each object candidate frame in the L object candidate frames, a current object candidate frame, and extract M+1 object candidate frames from at least one feature image of the image to be detected M+1 feature vectors; wherein the M+1 object candidate frames include the current object candidate frame and its M associated object candidate frames, the associated object candidate frame having the same same as the current object candidate frame Center point, different heights and/or widths; M is an integer greater than 0;
  • a feature association unit configured to associate the M+1 feature vectors to generate a final feature vector
  • an object detecting unit configured to perform object detection according to the final feature vector, and obtain an object detection result of the current object candidate frame.
  • an electronic device including the object detecting device according to any of the above embodiments of the present application.
  • another electronic device including:
  • the processor runs the object detecting device
  • the object detecting device according to any of the above embodiments of the present application
  • the unit is running.
  • another electronic device including: a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface complete each other through the communication bus Communication between
  • the memory is configured to store at least one executable instruction that causes the processor to perform an operation corresponding to the object detection method described in any of the above embodiments of the present application.
  • a computer program comprising computer readable code, when a computer readable code is run on a device, a processor in the device performs the above-described The instructions of the steps in the object detecting method described in one embodiment.
  • a computer readable storage medium for storing computer readable instructions, when the instructions are executed, implementing object detection according to any of the above embodiments of the present application. The operation of each step in the method.
  • the object detection method and apparatus and the electronic device provided by the above embodiments of the present application respectively acquire M related object candidate frames having the same center point, different height and width, respectively, with the current object candidate frame for each object candidate frame. Extracting the M+1 feature vectors corresponding to the current object candidate frame and the M related object candidate frames from the feature image of the image to be detected, and associating the total M+1 feature vectors to generate a final feature vector. Then, the object detection is performed according to the final feature vector, and the object detection result of the current object candidate frame is obtained.
  • the M+1 object candidate frame covers different regions and resolutions respectively, and the embodiment of the present application implements multiple regions/ Multi-resolution image input method, based on the M+1 feature vectors extracted by the M+1 object candidate frames, the final feature vector obtained by correlation is used for object detection, and the visual information of different regions/resolutions can be avoided.
  • the single input method causes problems in the coverage of the object candidate frame, such as the fact that the details in the input image are ignored, the visual content is insufficient, and the overlap rate is not accurate, which is beneficial to improve the accuracy of object detection.
  • Figure 1 is a schematic diagram of a picture to be detected.
  • FIG. 2 is a schematic diagram of another picture to be detected.
  • FIG. 3 is a flow chart of an embodiment of an object detecting method of the present application.
  • FIG. 4 is a schematic diagram of an application example of an associated object candidate frame acquired in the embodiment of the present application.
  • FIG. 5 is a flow chart of another embodiment of the object detecting method of the present application.
  • FIG. 6 is a flow chart of still another embodiment of the object detecting method of the present application.
  • FIG. 7 is a diagram showing a specific application example of a bidirectional conductive structure network in the embodiment of the present application.
  • FIG. 8 is a diagram showing a specific application example of a gate control structure network in an embodiment of the present application.
  • FIG. 9 is a schematic structural view of an embodiment of an object detecting device of the present application.
  • FIG. 10 is a schematic structural view of another embodiment of the object detecting device of the present application.
  • FIG. 11 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
  • FIG. 12 is a schematic structural diagram of another embodiment of an electronic device according to the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • the inventors have found through research that in the object detection system, for any object candidate frame, whether it can be judged that the target object has been detected is to look at the object candidate frame and the target object side. Whether the overlap rate of the box is greater than a certain threshold. When the coverage of an object candidate box is incorrect, the following three potential problems occur:
  • Picture (a) in Figure 1 is a picture of a rabbit
  • picture (b) is a picture of a hamster
  • the body of a rabbit and a hamster are similar, and the object candidates 102 and 104 are correct.
  • Object candidate box if the object candidate frames 106 and 108 are placed on their bodies, it is impossible to determine from the area marked by the object candidate frames 106 and 108 whether the target object in the area is a rabbit or a hamster;
  • an object candidate frame only covers a certain part of the target object and calculates the overlap ratio, it is necessary to consider the severity of the target object being occluded by other objects.
  • the object candidate frames 202 and 204 are correct.
  • the object candidate box, both object candidate boxes 202 and 208 are overlaid on the rabbit's head position, but the object candidate box 202 is the correct object candidate box (True positive) and the object candidate box 208 is the wrong object candidate box (False Positive), because the rabbit in Figure 2(b) is not occluded, the object candidate box 208 covers a sufficient range;
  • the object detection system scales the content covered by the object candidate frame to a fixed size, if an object candidate frame (for example, the object candidate frame 206 in FIG. 2) is too large, after zooming to a fixed size, the target object in the figure It will become smaller, and many details of the rabbit will be blurred. Therefore, if an object candidate box is too large, the object detection system will ignore the small details in the object candidate frame, and this part of the detail has strong guidance for target detection. effect.
  • an object candidate frame for example, the object candidate frame 206 in FIG. 2
  • the object detecting method of this embodiment includes:
  • L is an integer greater than 0, and the source of the image to be detected may be received, may be stored, or may be acquired by means of acquisition or the like.
  • Each object candidate frame in the L object candidate frames is used as the current object candidate frame, and M+1 feature vectors corresponding to the M+1 object candidate frames are extracted from at least one feature image of the image to be detected.
  • the M+1 object candidate frame includes a current object candidate frame and M associated object candidate frames, and the associated object candidate frame has the same center point, different height and/or width as the current object candidate frame, and M is taken.
  • the value is an integer greater than zero.
  • the object detection may be performed according to the final feature vector, and the probability value of the current object candidate frame including the target object is obtained; or the object detection is performed according to the final feature vector, and the object category corresponding to the current object candidate frame is obtained.
  • M related object candidate frames having the same center point, different height, and width respectively from the current object candidate frame are acquired, respectively, from the image to be detected. Extracting the M+1 feature vectors corresponding to the current object candidate frame and its M associated object candidate frames, and correlating the total M+1 feature vectors to generate a final feature vector, and then according to the final feature The vector performs object detection to obtain an object detection result of the current object candidate frame.
  • the M+1 object candidate frame covers different regions and resolutions respectively, and the embodiment of the present application implements multiple regions/ Multi-resolution image input method, based on the M+1 feature vectors extracted by the M+1 object candidate frames, the final feature vector obtained by correlation is used for object detection, and the visual information of different regions/resolutions can be avoided.
  • the single input method causes problems in the coverage of the object candidate frame, such as the fact that the details in the input image are ignored, the visual content is insufficient, and the overlap rate is not accurate, which is beneficial to improve the accuracy of object detection.
  • the method further includes: generating a feature map of the image to be detected by using a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the method further includes: acquiring M related object candidate frames of the current object candidate frame according to the current object candidate frame.
  • the M associated object candidate frames of the current object candidate frame may be obtained according to the current object candidate frame:
  • the M associated object candidate boxes can be obtained in the following manner:
  • b p represents the associated object candidate frame
  • x o and y o respectively represent the abscissa and ordinate of the center point of the current object candidate frame
  • w o represents the width of the current object candidate frame
  • h o represents the height of the current object candidate frame
  • (1+p)w o represents the width of the associated object candidate frame
  • (1+p)h o represents the height of the associated object candidate frame.
  • the values of p and M can be preset and can be adjusted according to actual needs.
  • the CNN may generate a feature map with the image to be detected, which may be, for example, a multi-dimensional matrix.
  • ROI-Pooling region of interest pooling
  • the M+1 object candidate frames may be obtained based on each object candidate frame obtained by the object positioning, and the object candidate frames have the same center point but different heights and/or widths, and the objects
  • the candidate box generates a corresponding feature vector through the ROI-Pooling operation, and finally each object candidate obtained based on the object positioning
  • the frame can be differentiated into M+1 feature vectors, and each differentiated object candidate frame covers different regions, and feature vectors of different resolutions are generated to realize multi-region/multi-resolution image input.
  • FIG. 4 it is an application example diagram of an associated object candidate frame obtained in the embodiment of the present application.
  • 402 is an object candidate frame obtained by performing object positioning on the image to be detected, and is used as a current object candidate frame;
  • M has a value of 4, that is, respectively, p is assigned -0.2, 0.2, 0.8, and 1.7 respectively.
  • the feature image of the detected image is one, that is, the current object candidate frame and the M related object candidate frames are extracted from a feature image of the image to be detected. M+1 feature vectors.
  • the feature map of the image to be detected may optionally include multiple feature maps of the image to be detected generated by multiple CNNs of different depths respectively. Extracting the M+1 feature vectors corresponding to the current object candidate frame and the M associated object candidate frames from the feature image of the image to be detected, optionally including extracting the current object candidate frame and the M from the plurality of feature images of the object The M+1 feature vectors corresponding to the candidate object candidate frames, wherein the number of feature vectors extracted from each of the plurality of feature maps may be the same or different.
  • four associated object candidate frames may be generated by an object candidate frame, wherein the first and second associated object candidate frames are derived from a feature map of the image to be detected, and the 3 or 4 associated object candidate frames are derived from the A feature map of the detected image.
  • the solution for acquiring multiple associated object candidate frames from the same feature map does not need to adopt different depth deep neural networks to acquire multiple feature maps.
  • the network structure is relatively simple and relatively easy to implement.
  • M+1 feature vectors may be associated based on a pre-trained two-way gate control structure network to generate a final feature vector.
  • the two-way gate control structure network may include a gate control structure network and a two-way conductive structure network.
  • FIG. 5 is a flow chart of another embodiment of the object detecting method of the present application. As shown in FIG. 5, the object detecting method of this embodiment includes:
  • L is an integer greater than 0.
  • Each of the object candidate frames in the L object candidate frames is used as the current object candidate frame, and the M related object candidate frames of the current object candidate frame are obtained according to the current object candidate frame.
  • the M+1 feature vectors may be sorted according to the size of the object candidate frame from large to small or small to large, and the object detection result of the embodiment of the present application is not affected.
  • the associated object candidate frame has the same center point, different height, and/or width as the current object candidate frame, and the value of M is an integer greater than 0.
  • the intermediate feature vector is a feature vector other than the first feature vector and the last feature vector of the sorted M+1 feature vectors.
  • the embodiment of the present application realizes multi-region and/or resolution image input through a two-way conductive structure network, and transmits effective visual information between inputs in different regions, and establishes credibility of information transmission by using a gate control structure network, which is beneficial to Improve the accuracy of object detection.
  • the gate control structure network may be a function that maps a feature vector to [0, 1], a function of generating a weight value, such as a regression (sigmoid) ) function, hyperbolic tangent (tanh) function, etc.
  • sigmoid regression
  • titaniumh hyperbolic tangent
  • the method further includes: training the initial gate control structure network through a plurality of sample images in advance, adjusting network parameters of the initial gate control structure network, and obtaining a gate control structure network.
  • FIG. 6 is a flow chart of still another embodiment of the object detecting method of the present application. As shown in FIG. 6, compared with the embodiment shown in FIG. 5, the object detecting method of this embodiment further includes:
  • operation 512 can be exemplarily implemented by the following operations:
  • Operation 514 can optionally be implemented by the following operations:
  • the two-way conductive structure network includes M+1 network layers, and operation 606 can optionally be implemented by:
  • the bidirectional conductive structure network includes M+1 network layers, and the operation 606 is optional and can also be implemented as follows:
  • An intermediate result vector is generated by the response of the intermediate feature vector and the valid input information of all other feature vectors through the first M network layers of the two-way conductive structure network;
  • the input of the two-way conductive structure network is M+1 feature vectors of the detection block, as shown in FIG. 7, which is an exemplary application example of the two-way conductive structure network in the embodiment of the present application.
  • M the value of M is 2, which is described by using the M+1 feature vectors as the three feature vectors.
  • three feature vectors are represented in the embodiment of the present application.
  • the actual input is, for example, five eigenvectors output through FIG.
  • the other variables in Figure 7 are neural network node variables in the deep neural network (DNN), and the arrows indicate convolution operations.
  • the subscript i indicates the sorting number of the feature vector in the M+1 feature vectors, and the superscript 0 indicates that the feature vector is the feature vector extracted from the feature map.
  • the values of h i 1 are derived from the eigenvectors h i 0 and h i-1 0
  • the values of h i 2 are derived from the eigenvectors h i 0 and h i+1 2
  • the eigenvectors h i 3 of the bidirectionally transmitted structure network are finally output. Then it comes from h i 1 and h i 2 .
  • the final output of the bidirectional conductive structure network is the response h i 3 produced by the convolution operation of h i 1 and h i 2 , with the following formula:
  • ⁇ () represents a nonlinear change operation of a modified linear unit (RELU) in a convolutional network
  • cat() represents a tandem operation, in which the feature vectors input in parentheses are concatenated
  • w and b both represent parameters in the convolutional network, w is the convolution kernel, and b is the bias, and the upper and lower subscripts are used to distinguish different networks in the bidirectional conduction structure.
  • the parameters of the network layer. h represents the response of each neural network node in the convolutional network, which is also a feature vector.
  • h i 1 or h i 2 may also be derived from the response of the intermediate feature vector and all other feature vectors, for example, h i 1 is derived from h i 0 , h i-1 1 , h I-2 1 can be expressed as a formula
  • h i 3 may also be derived directly from h i-1 0 , h i 0 and h i+1 0 , while h i 1 and h i 2 are ignored.
  • the effect of this is the lateral transfer from the feature vector i+1 to the feature vector i.
  • the feature vector i input combines the effects from the feature vector i+1 and the feature vector i-1 to become h i 3 .
  • the gate control structure network can control the transmission of information through a weight value function. If the transmitted information is considered to be trusted, its weight value is large. Conversely, if the transmitted information is considered untrustworthy, its weight value is small. .
  • the weight value pair h i- generated by the gate control structure in the information that h i-1 1 , h i+1 2 is passed to h i 1 , not all information is valid, so the weight value pair h i- generated by the gate control structure. The transfer of 1 1 and h i+1 2 information is constrained.
  • FIG. 8 is a diagram showing an example of an optional application of the door control structure network in the embodiment of the present application.
  • the gate control structure network is used to control the effective input of other feature vectors obtained by the intermediate feature vectors selected by the M+1 feature vectors, such as inputting from i+1 to i input or inputting from i-1 to i input in FIG.
  • the embodiment of the present application adds a gate control structure network in a bidirectional conductive structure network, which can be input from an i+1 input to an i input or a slave through a weight value generation function---sigmoid function.
  • I-1 input the information value passed to the i input for weighting.
  • the value that h i-1 1 passes to h i 1 will be a two-part product, one is the convolution output of h i-1 1 and the other is the convolution of h i-1 0 through sigmoid
  • represents the product of the matrix corresponding element
  • exp() represents the exponential function
  • w and b both represent the parameters in the convolution network
  • the upper and lower subscripts are used to distinguish different parameters
  • the superscript g indicates that the parameter is the gate control structure.
  • x represents the current intermediate feature vector, such as h i 0 in Figure 7.
  • Any object detection method provided by the embodiment of the present application may be performed by any suitable device having data processing capability, including but not limited to: a terminal device, a server, and the like.
  • any object detection method provided by the embodiment of the present application may be executed by a processor, such as the processor executing any one of the object detection methods mentioned in the embodiments of the present application by calling corresponding instructions stored in the memory. This will not be repeated below.
  • FIG. 9 is a schematic structural view of an embodiment of an object detecting device of the present application.
  • the object detecting device of this embodiment can be used to implement the above-described embodiments of the object detecting methods of the present application.
  • the object detecting apparatus of this embodiment includes an object positioning unit, a feature extracting unit, a feature associating unit, and an object detecting unit. among them:
  • the object positioning unit is configured to perform object positioning on the image to be detected, and obtain L object candidate frames. Where L is an integer greater than zero.
  • a feature extracting unit configured to extract, as the current object candidate frame, each of the L object candidate frames as the current object candidate frame, and extract M+1 corresponding to the M+1 object candidate frames from the at least one feature image of the image to be detected Feature vector.
  • the M+1 object candidate frame includes a current object candidate frame and the M associated object candidate frames, and the associated object candidate frame has the same center point, different height and/or width as the current object candidate frame; Is an integer greater than 0.
  • a feature association unit is configured to associate M+1 feature vectors to generate a final feature vector.
  • the object detecting unit is configured to perform object detection according to the final feature vector, and obtain an object detection result of the current object candidate frame.
  • the object detection result may include: the current object candidate box includes a probability value of the target object, or an object category corresponding to the current object candidate frame.
  • M related object candidate frames having the same center point, different height and width respectively from the current object candidate frame are acquired, respectively, from the image to be detected. Extracting the M+1 feature vectors corresponding to the current object candidate frame and its M associated object candidate frames, and correlating the total M+1 feature vectors to generate a final feature vector, and then according to the final feature The vector performs object detection to obtain an object detection result of the current object candidate frame.
  • the M+1 object candidate frame covers different regions and resolutions respectively, and the embodiment of the present application implements multiple regions/ Multi-resolution image input method, based on the M+1 feature vectors extracted by the M+1 object candidate frames, the final feature vector obtained by correlation is used for object detection, and the visual information of different regions/resolutions can be avoided.
  • the single input method causes problems in the coverage of the object candidate frame, such as the fact that the details in the input image are ignored, the visual content is insufficient, and the overlap rate is not accurate, which is beneficial to improve the accuracy of object detection.
  • FIG. 10 is a schematic structural view of another embodiment of the object detecting device of the present application. As shown in Figure 10, and Figure 9 Compared with the embodiment, the embodiment further includes a feature generating unit, configured to generate a feature map of the image to be detected.
  • a feature generating unit configured to generate a feature map of the image to be detected.
  • the feature extracting unit may be further configured to acquire the M associated object candidate frames of the current object candidate frame according to the current object candidate frame.
  • the feature extraction unit acquires the M related object candidate frames, by using the center coordinates of the current object candidate frame as a center point, the parameters in the preset width acquisition formula and the preset height acquisition formula are respectively assigned M different Numerically, the width and height of the M associated object candidate frames are obtained, thereby acquiring the M associated object candidate frames.
  • b p represents the associated object candidate frame
  • x o and y o respectively represent the abscissa and ordinate of the center point of the current object candidate frame
  • w o represents the width of the current object candidate frame
  • h o represents the height of the current object candidate frame
  • (1+p)w o represents the width of the associated object candidate frame
  • (1+p)h o represents the height of the associated object candidate frame.
  • the feature map of the detected image is one.
  • the feature extraction unit is configured to extract, from a feature image of the image to be detected, the M+1 feature vectors corresponding to the current object candidate frame and the M associated object candidate frames.
  • the feature map of the image to be detected includes a plurality of feature maps of the image to be detected generated by the plurality of CNNs of different depths, respectively.
  • the feature extraction unit extracts the current object from the plurality of feature images of the object when extracting the M+1 feature vectors corresponding to the current object candidate frame and the M related object candidate frames from the feature image of the image to be detected.
  • the candidate frame and its M associated object candidate frames correspond to M+1 feature vectors.
  • the number of feature vectors extracted from each feature map in the plurality of feature maps may be the same or different.
  • the feature associating unit is configured to associate M+1 feature vectors based on the two-way gate control structure network to generate a final feature vector.
  • the two-way gate control structure network includes a gate control structure network sub-unit and a bidirectional conductive structure network sub-unit
  • the feature associating unit may include: a sorting sub-unit, a gate control structure Network subunit and bidirectional conductive structure network subunit. among them:
  • the sorting subunit is configured to sort the M+1 feature vectors according to the size of the corresponding object candidate frame.
  • the gate control structure network sub-unit is configured to select an intermediate feature vector from the M+1 feature vectors, respectively obtain the weight values of the other feature vectors except the intermediate feature vector in the M+1 feature vectors, and pass the weight value pair
  • the inputs of the corresponding other feature vectors are controlled to obtain valid input information of other feature vectors.
  • the intermediate feature vector is a feature vector of the ordered M+1 feature vectors except the first feature vector and the last feature vector.
  • a bidirectional conductive structure network subunit for generating a final feature vector from the effective input information of the intermediate feature vector and other feature vectors.
  • the gate control structure network includes a function that maps feature vectors to [0, 1].
  • the method further includes: a network training unit, configured to train the initial gate control structure network by using a plurality of sample images, and adjust network parameters of the initial gate control structure network to obtain Door control structure network.
  • a network training unit configured to train the initial gate control structure network by using a plurality of sample images, and adjust network parameters of the initial gate control structure network to obtain Door control structure network.
  • the bidirectional conductive structure network subunit may be further configured to respectively acquire responses of M+1 feature vectors.
  • the gate control structure network sub-unit is configured to acquire the weight value of the response of the other feature vectors, and control the response of the corresponding other feature vectors by the weight value.
  • the bidirectional conductive structure network subunit is configured to respectively acquire responses of the M+1 feature vectors, and generate a final feature vector by the response of the intermediate feature vector and valid input information of other feature vectors.
  • the bidirectional conductive structure network subunit includes M+1 network layers. among them:
  • the first M network layers in the M+1 network layers are used to generate an intermediate result vector from the response of the intermediate feature vector and the valid input information of each of the other feature vectors.
  • the M+1th network layer in the M+1 network layers is used to serially sum all the intermediate result vectors to obtain The final feature vector.
  • the bidirectional conductive structure network subunit includes M+1 network layers. among them:
  • the first M network layers in the M+1 network layers are used to generate an intermediate result vector from the response of the intermediate feature vector and the valid input information of all other feature vectors;
  • the M+1th network layer in the M+1 network layers is used to perform series summation on all intermediate result vectors to obtain a final feature vector.
  • the embodiment of the present invention further provides a data processing apparatus, including the object detecting apparatus provided by any of the above embodiments of the present invention.
  • the data processing apparatus of the embodiment of the present invention may be any device having a data processing function, and may include, but is not limited to, an advanced reduced instruction set machine (ARM), a central processing unit (CPU), or a graphics processing unit. (GPU), etc.
  • ARM advanced reduced instruction set machine
  • CPU central processing unit
  • GPU graphics processing unit
  • the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, etc., which is provided with the data processing device of any of the above embodiments of the present application.
  • an electronic device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, etc., which is provided with the data processing device of any of the above embodiments of the present application.
  • the embodiment of the present application further provides another electronic device, including the object detecting device provided by any of the foregoing embodiments of the present application.
  • the embodiment of the present application further provides another electronic device, including:
  • the processor runs the object detecting device
  • the unit in the object detecting device provided by any of the above embodiments of the present application is operated.
  • the embodiment of the present application further provides another electronic device, including: a processor, a memory, a communication interface, and a communication bus, where the processor, the memory, and the communication interface complete each other through the communication bus.
  • the memory is configured to store at least one executable instruction that causes the processor to perform an operation corresponding to the object detection method of any of the above-described embodiments of the present application.
  • the electronic device may include a processor, a communications interface, a memory, and a communication bus. among them:
  • the processor, the communication interface, and the memory complete communication with each other through the communication bus.
  • a communication interface for communicating with network elements of other devices such as other clients or servers.
  • the processor may be a central processing unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application, or a graphics processor (Graphics) Processing Unit, GPU).
  • the one or more processors included in the terminal device may be the same type of processor, such as one or more CPUs, or one or more GPUs; or may be different types of processors, such as one or more CPUs and One or more GPUs.
  • the memory may include a high speed random access memory (RAM), and may also include a non-volatile memory such as at least one disk memory.
  • RAM high speed random access memory
  • non-volatile memory such as at least one disk memory.
  • FIG. 12 is a schematic structural diagram of another embodiment of an electronic device according to the present application.
  • an electronic device for implementing an embodiment of the present application includes a central processing unit (CPU) or a graphics processing unit (GPU), which may be according to executable instructions stored in a read only memory (ROM) or from a storage.
  • ROM read only memory
  • RAM random access memory
  • the central processing unit or the graphics processing unit can communicate with the read-only memory and/or the random access memory to execute the executable instructions to complete the operations corresponding to the object detection method provided by the embodiments of the present application, for example, performing an object on the received image to be detected.
  • each object candidate frame in the L object candidate frames is used as a current object candidate frame, respectively, from at least one feature of the image to be detected Extract M+1 specials corresponding to M+1 object candidate frames in the figure a eigenvector; wherein the M+1 object candidate frame includes the current object candidate frame and its M associated object candidate frames, the associated object candidate frame having the same center point and different from the current object candidate frame Height and/or width; M is an integer greater than 0; correlating the M+1 feature vectors to generate a final feature vector; performing object detection according to the final feature vector to obtain the current object candidate frame Object detection result.
  • the CPU, GPU, ROM, and RAM are connected to each other through a bus.
  • An input/output (I/O) interface is also connected to the bus.
  • the following components are connected to the I/O interface: an input portion including a keyboard, a mouse, and the like; an output portion including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker; a storage portion including a hard disk or the like; The communication part of the network interface card of the LAN card, modem, etc.
  • the communication section performs communication processing via a network such as the Internet.
  • the drive is also connected to the I/O interface as needed.
  • a removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive as needed so that a computer program read therefrom is installed into the storage portion as needed.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code being The instruction corresponding to the step of performing any object detection method provided by the embodiment of the present application, for example, performing object positioning on the received image to be detected, and obtaining an instruction of L object candidate frames; wherein L is an integer greater than 0; Extracting M+1 feature vectors corresponding to M+1 object candidate frames from at least one feature image of the image to be detected, respectively, by using each object candidate frame in the L object candidate frames as a current object candidate frame
  • the M+1 object candidate frame includes the current object candidate frame and its M associated object candidate frames, the associated object candidate frame having the same center point and different from the current object candidate frame Height and/or width; M is an integer greater than 0; an instruction
  • the computer program can be downloaded and installed from the network via the communication portion, and/or installed from a removable medium.
  • the above-described functions defined in the method of the present application are performed when the computer program is executed by a central processing unit (CPU) or a graphics processing unit (GPU).
  • CPU central processing unit
  • GPU graphics processing unit
  • the embodiment of the present application further provides a computer program, including computer readable code, when the computer readable code is run on a device, the processor in the device executes to implement any of the foregoing implementations of the present application.
  • the instructions of each step in the object detection method of the example are not limited to:
  • the embodiment of the present application further provides a computer readable storage medium for storing computer readable instructions, when the instructions are executed, implementing the operations of each step in the object detecting method of any of the above embodiments of the present application. .
  • the instruction includes: performing object positioning on the received image to be detected, and obtaining an instruction of L object candidate frames; wherein L is an integer greater than 0; respectively, the L object candidates Each object candidate frame in the frame is used as a current object candidate frame, and an instruction of M+1 feature vectors corresponding to the M+1 associated object candidate frames is extracted from the feature image of the image to be detected; wherein the M+ An object candidate frame includes the current object candidate frame and its M associated object candidate frames, and the associated object candidate frame and the current object candidate frame have the same center point, different heights and widths; An integer greater than 0; an instruction to associate the M+1 feature vectors to generate a final feature vector; and performing object detection according to the final feature vector to obtain an instruction of the object detection result of the current object candidate frame.
  • the methods, systems, devices, and devices of the present application may be implemented in a number of ways.
  • the methods, systems, apparatus, and devices of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Physiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种物体检测方法和装置、电子设备、计算机程序和存储介质,其中,方法包括:对待检测图像进行物体定位,获得L个物体候选框;分别以L个物体候选框中的每个物体候选框作为当前物体候选框,从待检测图像的至少一个特征图中抽取M+1个物体候选框对应的M+1个特征向量;其中,M+1个物体候选框包括当前物体候选框及其M个关联物体候选框,关联物体候选框与当前物体候选框具有相同的中心点、不同的高度和/或宽度;对M+1个特征向量进行关联,生成一个最终特征向量;根据最终特征向量进行物体检测,获得当前物体候选框的物体检测结果。本申请实施例有效解决了现有技术单输入方式导致物体候选框的覆盖范围不正确时出现的问题,提高了物体检测的准确性。

Description

物体检测方法和装置、电子设备、计算机程序和存储介质
本申请要求在2016年09月23日提交中国专利局、申请号为CN201610848961.7、发明名称为“物体检测方法和装置、数据处理装置和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术,尤其是一种物体检测方法和装置、电子设备、计算机程序和存储介质。
背景技术
物体检测是计算机视觉领域最基本的问题之一,其包含对图片、视频最基本的语义理解分析,一直以来被投入大量研究。物体检测系统的目标是使用方框(称为:物体候选框)将图片或视频中的目标物体框出。长期以来,物体检测问题被视为一种分类问题,对于每个物体候选框,物体检测系统自动做出该物体候选框内是何种物体的判断。但由于多样的拍摄角度、物体形态、光照变化、复杂多变的背景等因素影响,物体检测一直是一种复杂充满挑战的问题。
发明内容
本申请实施例提供一种用于进行物体检测的技术方案。
根据本申请实施例的一个方面,提供的一种物体检测方法,包括:
对待检测图像进行物体定位,获得L个物体候选框;其中,L为大于0的整数;
分别以所述L个物体候选框中的每个物体候选框作为当前物体候选框,从所述待检测图像的至少一个特征图中抽取M+1个物体候选框对应的M+1个特征向量;其中,所述M+1个物体候选框包括所述当前物体候选框及其M个关联物体候选框,所述关联物体候选框与所述当前物体候选框具有相同的中心点、不同的高度和/或宽度;M为大于0的整数;
对所述M+1个特征向量进行关联,生成一个最终特征向量;
根据所述最终特征向量进行物体检测,获得所述当前物体候选框的物体检测结果。
根据本申请实施例的另一个方面,提供的一种物体检测装置,包括:
物体定位单元,用于对待检测图像进行物体定位,获得L个物体候选框;其中,L为大于0的整数;
特征抽取单元,用于分别以所述L个物体候选框中的每个物体候选框作为当前物体候选框,从所述待检测图像的至少一个特征图中抽取M+1个物体候选框对应的M+1个特征向量;其中,所述M+1个物体候选框包括所述当前物体候选框及其M个关联物体候选框,所述关联物体候选框与所述当前物体候选框具有相同的中心点、不同的高度和/或宽度;M为大于0的整数;
特征关联单元,用于对所述M+1个特征向量进行关联,生成一个最终特征向量;
物体检测单元,用于根据所述最终特征向量进行物体检测,获得所述当前物体候选框的物体检测结果。
根据本申请实施例的又一个方面,提供一种电子设备,包括本申请上述任一实施例所述的物体检测装置。
根据本申请实施例的再一方面,提供另一种电子设备,包括:
处理器和本申请上述任一实施例所述的物体检测装置;
在处理器运行所述物体检测装置时,本申请上述任一实施例所述的物体检测装置中的 单元被运行。
根据本申请实施例的再一方面,提供又一种电子设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请上述任一实施例所述的物体检测方法对应的操作。
根据本申请实施例的再一方面,提供一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请上述任一实施例所述的物体检测方法中各步骤的指令。
根据本申请实施例的又一方面,还提供了一种计算机可读存储介质,用于存储计算机可读取的指令,所述指令被执行时实现本申请上述任一实施例所述的物体检测方法中各步骤的操作。
基于本申请上述实施例提供的物体检测方法和装置、电子设备,分别针对每个物体候选框,获取与当前物体候选框分别具有相同的中心点、不同的高度和宽度的M个关联物体候选框,从待检测图像的特征图中抽取当前物体候选框及其M个关联物体候选框对应的M+1个特征向量,并对该总共M+1个特征向量进行关联,生成一个最终特征向量,然后根据该最终特征向量进行物体检测,获得当前物体候选框的物体检测结果。由于M个关联物体候选框与当前物体候选框分别具有相同的中心点、不同的高度和宽度,M+1个物体候选框涵盖的区域、分辨率分别不同,本申请实施例实现了多区域/多分辨率的图像输入方式,基于该M+1个物体候选框抽取的M+1个特征向量进行关联获得的最终特征向量进行物体检测,由于采用了不同区域/分辨率的视觉信息,可以避免单输入方式导致物体候选框的覆盖范围不正确时出现的问题,例如输入图像中细节被忽略、视觉内容不足和交叠率判断不准的问题,有利于提高物体检测的准确性。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1为一个待检测图片的示意图。
图2为另一个待检测图片的示意图。
图3为本申请物体检测方法一个实施例的流程图。
图4为本申请实施例中获取的关联物体候选框的一个应用示例图。
图5为本申请物体检测方法另一个实施例的流程图。
图6为本申请物体检测方法又一个实施例的流程图。
图7为本申请实施例中双向传导结构网络的一个具体应用示例图。
图8为本申请实施例中门控制结构网络的一个具体应用示例图。
图9为本申请物体检测装置一个实施例的结构示意图。
图10为本申请物体检测装置另一个实施例的结构示意图。
图11为本申请电子设备一个实施例的结构示意图。
图12为本申请电子设备另一个实施例的结构示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请 的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
在实现本申请的过程中,发明人通过研究发现,在物体检测系统中,对于任何一个物体候选框,其是否能被判定已经检测到目标物体的准则,是看该物体候选框和目标物体方框的交叠率是否大于某个阈值。当一个物体候选框的覆盖范围不正确时,就会出现如下三个个潜在问题:
首先,当一个物体候选框只覆盖了目标物体某部分的时候,该物体候选框内的视觉内容不足以判断该目标物体是什么物体,这种情况常出现于两类物体存在部分类似的状况,如图1中的图片(a)是一只兔子的图片,图片(b)是一只仓鼠的图片,兔子(rabbit)和仓鼠(hasmter)的身体比较相似,物体候选框102和104为正确的物体候选框,如果物体候选框106和108放在了他们身体上,从物体候选框106和108所标定的区域无法判断该区域内的目标物体是一只兔子还是一只仓鼠;
再者,当一个物体候选框只覆盖了目标物体某部分,计算交叠率的时候,需要考虑目标物体被其他物体遮挡的严重程度,如图2所示,物体候选框202和204为正确的物体候选框,两个物体候选框202和208均覆盖在兔子的头部位置,但是物体候选框202为正确的物体候选框(True positive),而物体候选框208为错误的物体候选框(False positive),因为图2(b)中兔子没有被遮挡,物体候选框208覆盖的范围不够;
最后,由于物体检测系统会将物体候选框涵盖的内容缩放到一个固定大小,如果一个物体候选框(例如图2中物体候选框206)过大,在缩放到固定大小后,图中的目标物体会变得较小,兔子的很多细节信息将变模糊,因此,如果一个物体候选框过大,物体检测系统会忽略物体候选框内小部分细节,而该部分细节对于目标检测有较强的指导作用。
图3为本申请物体检测方法一个实施例的流程图。如图3所示,该实施例的物体检测方法包括:
302,对待检测图像进行物体定位,获得L个物体候选框。
其中,L为大于0的整数,该待检测图像的来源可能是接收到的,也可能是存储的,还可能是通过采集等方式获取的。
304,分别以L个物体候选框中的每个物体候选框作为当前物体候选框,从待检测图像的至少一个特征图中抽取M+1个物体候选框对应的M+1个特征向量。
其中,上述M+1个物体候选框包括当前物体候选框及其M个关联物体候选框,关联物体候选框与当前物体候选框具有相同的中心点、不同的高度和/或宽度,M的取值为大于0的整数。
306,对M+1个特征向量进行关联,生成一个最终特征向量。
308,根据最终特征向量进行物体检测,获得当前物体候选框的物体检测结果。
可选地,该操作308中,可以是根据最终特征向量进行物体检测,获得当前物体候选框包括目标物体的概率值;或者,根据最终特征向量进行物体检测,获得当前物体候选框对应的物体类别。
基于本申请上述实施例提供的物体检测方法,分别针对每个物体候选框,获取与当前物体候选框分别具有相同的中心点、不同的高度和宽度的M个关联物体候选框,从待检测图像的特征图中抽取当前物体候选框及其M个关联物体候选框对应的M+1个特征向量,并对该总共M+1个特征向量进行关联,生成一个最终特征向量,然后根据该最终特征向量进行物体检测,获得当前物体候选框的物体检测结果。由于M个关联物体候选框与当前物体候选框分别具有相同的中心点、不同的高度和宽度,M+1个物体候选框涵盖的区域、分辨率分别不同,本申请实施例实现了多区域/多分辨率的图像输入方式,基于该M+1个物体候选框抽取的M+1个特征向量进行关联获得的最终特征向量进行物体检测,由于采用了不同区域/分辨率的视觉信息,可以避免单输入方式导致物体候选框的覆盖范围不正确时出现的问题,例如输入图像中细节被忽略、视觉内容不足和交叠率判断不准的问题,有利于提高物体检测的准确性。
在本申请物体检测方法的另一个实施例中,还可以包括:通过卷积神经网络(CNN),生成待检测图像的特征图。
在本申请物体检测方法的另一个实施例中,还可以包括:根据当前物体候选框获取该当前物体候选框的M个关联物体候选框。
在本申请实施例的一个可选示例中,可以通过如下方式,根据当前物体候选框获取该当前物体候选框的M个关联物体候选框:
通过以所述当前物体候选框的中心坐标为中心点,对预设宽度获取公式和预设高度获取公式中的参数分别赋予M个不同的数值,获得所述M个关联物体候选框的宽度和高度,从而获取M个关联物体候选框。
例如,可采用以下方式获取M个关联物体候选框:
利用预设宽度获取公式和预设高度获取公式:bp=[xo,yo,(1+p)wo,(1+p)ho],对参数p分别赋予M个不同的数值,获得M个关联物体候选框。
其中,bp表示关联物体候选框,xo和yo分别表示当前物体候选框的中心点的横坐标和纵坐标,wo表示当前物体候选框的宽度,ho表示当前物体候选框的高度,(1+p)wo表示关联物体候选框的宽度,(1+p)ho表示关联物体候选框的高度。其中,p和M的取值可以预先设定,并且可以根据实际需求调整。
对于一张待检测图像,CNN可以生成一张与该待检测图像的特征图,该特征图例如可以是一个多维矩阵。根据当前物体候选框获取该当前物体候选框的M个关联物体候选框,可以通过基于感兴趣区域池化(ROI-Pooling)网络,从该特征图的多维矩阵中找到当前物体候选框及其M个关联物体候选框对应的区域,抽取对应的区域的矩阵值,并生成特定大小的M+1个特征向量。
基于本申请实施例的物体检测方法,可基于物体定位获得的每个物体候选框获得M+1物体候选框,这些物体候选框框拥有相同的中心点、但是不同的高度和/或宽度,而物体候选框通过ROI-Pooling操作生成对应的特征向量,最终基于物体定位获得的每个物体候选 框可分化成M+1个特征向量,每个分化的物体候选框覆盖不同区域,生成不同分辨率的特征向量,以便实现多区域/多分辨率的图像输入。
如图4所示,为本申请实施例中获取的关联物体候选框的一个应用示例图。该应用示例中,402为对待检测图像进行物体定位获得的一个物体候选框,作为当前物体候选框;M的取值为4,即:分别对p分别赋予-0.2、0.2、0.8、1.7这4个值,根据当前物体候选框402所在的位置,获得b-0.2、b0.2、b0.8、b1.7这4个区域大小不同、分辨率不同的关联物体候选框,对应的4个特征向量分别表示为(f-0.2,f0.2,f0.8,f1.7)。
在本申请各物体检测方法实施例的一个可选示例中,检测图像的特征图为一个,即:从待检测图像的一个特征图中抽取当前物体候选框及其M个关联物体候选框对应的M+1个特征向量。
另外,在本申请各物体检测方法实施例的另一个可选示例中,待检测图像的特征图可选可以包括分别由不同深度的多个CNN生成的待检测图像的多个特征图,此时,从待检测图像的特征图中抽取当前物体候选框及其M个关联物体候选框对应的M+1个特征向量,可选包括从物体的多个特征图中抽取当前物体候选框及其M个关联物体候选框对应的M+1个特征向量,其中,从多个特征图中各特征图中抽取的特征向量的数量可以相同或任意不同。例如,可以由一个物体候选框生成4个关联物体候选框,其中,第1、2个关联物体候选框来源于待检测图像的一张特征图,3、4个关联物体候选框来源于该待检测图像的一张特征图。
与通过多个特征图获取多个关联物体候选框的方案相比,由同一张特征图获取多个关联物体候选框的方案,由于不需要采用不同深度的深度神经网络来获取多个特征图,网络结构比较简单,比较易于实现。
在本申请各物体检测方法实施例的又一个可选示例中,可以基于预先训练好的双向门控制结构网络,对M+1个特征向量进行关联,生成最终特征向量。其中的双向门控制结构网络可以包括门控制结构网络和双向传导结构网络两部分。
图5为本申请物体检测方法另一个实施例的流程图。如图5所示,该实施例的物体检测方法包括:
504,对待检测图像进行物体定位,获得L个物体候选框;并通过CNN生成待检测图像的特征图。
其中,L的取值为大于0的整数。
506,分别以L个物体候选框中的每个物体候选框作为当前物体候选框,根据当前物体候选框获取该当前物体候选框的M个关联物体候选框。
508,对M+1个特征向量按照对应物体候选框的大小排序。
可选地,可以按照物体候选框的大小由大到小或有小到大的顺序对M+1个特征向量进行排序,将不影响本申请实施例的物体检测结果。
510,从待检测图像的至少一个特征图中抽取当前物体候选框及其M个关联物体候选框共M+1个关联物体候选框对应的M+1个特征向量。
其中,关联物体候选框与当前物体候选框具有相同的中心点、不同的高度和/或宽度,M的取值为大于0的整数。
512,从M+1个特征向量中选取一个中间特征向量,通过预先训练好的门控制结构网络,分别获取M+1个特征向量中除该中间特征向量外的其它特征向量的权重值,并通过权重值对相应的其它特征向量的输入进行控制,获得其它特征向量的有效输入信息。
其中,中间特征向量为排序的M+1个特征向量中除首个特征向量和末尾特征向量外的特征向量。
514,通过预先训练好的双向传导结构网络,由中间特征向量与其它特征向量的有效输入信息生成最终特征向量。
516,根据最终特征向量进行物体检测,获得当前物体候选框的物体检测结果。
本申请实施例通过双向传导结构网络实现了多区域和/或分辨率的图像输入,并在不同区域输入之间传递有效的视觉信息,利用门控制结构网络建立信息传递的可信性,有利于提升物体检测的准确性。
可选地,在本申请各物体检测方法实施例的再一个可选示例中,门控制结构网络可为将特征向量映射为[0,1]的函数,权重值的产生函数,例如回归(sigmoid)函数、双曲正切(tanh)函数等。
进一步地,在本申请物体检测方法的又一个实施例中,还可以包括:预先通过多个样本图像对初始门控制结构网络进行训练,调整初始门控制结构网络的网络参数,获得门控制结构网络。
图6为本申请物体检测方法又一个实施例的流程图。如图6所示,与图5所示的实施例相比,该实施例的物体检测方法还包括:
602,通过双向传导结构网络分别获取M+1个特征向量的响应。
相应地,该实施例中,操作512可以示例性地通过如下操作实现:
604,从M+1个特征向量中选取一个中间特征向量,通过预先训练好的门控制结构网络,分别获取M+1个特征向量中除该中间特征向量外的其它特征向量的响应的权重值,并通过权重值对相应的其它特征向量的响应进行控制,获得其它特征向量的有效输入信息。
操作514可选可以示例性地通过如下操作实现:
606,通过预先训练好的双向传导结构网络,由中间特征向量的响应与其它特征向量的有效输入信息生成最终特征向量。
在图6所示实施例的一个可选实例中,双向传导结构网络包括M+1个网络层,操作606可选可以示例性地通过如下方式实现:
分别通过双向传导结构网络的前M个网络层,由中间特征向量的响应与每个其它特征向量的有效输入信息生成一个中间结果向量;
通过双向传导结构网络的第M+1个网络层,对所有中间结果向量进行串联求和,获得最终特征向量。
另外,在图6所示实施例的一个可选实例中,双向传导结构网络包括M+1个网络层,操作606可选也可以通过如下方式实现:
分别通过双向传导结构网络的前M个网络层,由中间特征向量的响应与所有其它特征向量的有效输入信息生成一个中间结果向量;
通过双向传导结构网络的第M+1个网络层,对所有中间结果向量进行串联求和,获得最终特征向量。
双向传导结构网络的输入为检测方框的M+1个特征向量,如图7所示,为本申请实施例中双向传导结构网络的一个可选应用示例图。为示意简单,该图7中以M的取值为2进行说明,即以上述M+1个特征向量为三个特征向量进行说明,为方便起见,本申请实施例中将三个特征向量表示为(hi-1 0,hi 0,hi+1 0),实际输入的是例如经过图4输出的五个特征向量。图7中其他变量为深度神经网络(DNN)中神经网络节点变量,箭头表示卷积操作。其中下标i表示该特征向量的在M+1个特征向量中的排序序号,上标0表明该特征向量为从特征图中抽取出的特征向量。
对于上述M+1个特征向量中的一个中间特征向量hi 0,将通过卷积操作生成新的响应hi 1和hi 2,其中,hi 1为两个部分之和:hi 0通过卷积操作的响应和hi-1 1通过卷积操作的响应;hi 2同样为两个部分之和:hi 0通过卷积操作的响应和hi+1 2通过卷积的响应。双向传导结构网络的物理意义来源于hi 1和hi 2的定义。hi 1的数值来源于特征向量hi 0和hi-1 0,hi 2的数值来源于特征向量hi 0和hi+1 2,双向传导结构网络最后输出的特征向量hi 3则来源于hi 1和hi 2。在一个可选示例中,该双向传导结构网络最后的输出为hi 1和hi 2通过卷积操作操作产生的 响应hi 3,可选公式如下所示:
Figure PCTCN2017102691-appb-000001
Figure PCTCN2017102691-appb-000002
Figure PCTCN2017102691-appb-000003
其中,σ()表示卷积网络中修正线性单元(RELU)的非线性变化操作;cat()表示串联操作,即将括号内输入的特征向量串联起来;
Figure PCTCN2017102691-appb-000004
表示卷积操作,例如
Figure PCTCN2017102691-appb-000005
表示a和b之间做卷积操作;w和b均表示卷积网络中的参数,w为卷积核,而b代表偏差(bias),上、下标用于区分双向传导结构网络中不同网络层的参数。h表示卷积网络中各个神经网络节点的响应,该响应也是一个特征向量。
另外,在另一个可选示例中,hi 1或者hi 2也可以来源于中间特征向量的响应与所有其它特征向量,例如,hi 1来源于hi 0,hi-1 1,hi-2 1,可以通过公式表示为
Figure PCTCN2017102691-appb-000006
在又一个可选示例中,hi 3也可以直接来源于hi-1 0、hi 0和hi+1 0,而忽略hi 1与hi 2
根据双向传导结构的含义即:信息传递存在于hi-1 1和hi 1之间,同样也存在于hi 2和hi+1 2之间。例如基于如下公式:
Figure PCTCN2017102691-appb-000007
Figure PCTCN2017102691-appb-000008
的值受到
Figure PCTCN2017102691-appb-000009
的影响,这是一种从特征向量i-1输入到特征向量i输入的横向传递。反之,基于如下公式:
Figure PCTCN2017102691-appb-000010
Figure PCTCN2017102691-appb-000011
的值受到
Figure PCTCN2017102691-appb-000012
的影响,这是从特征向量i+1到特征向量i的横向传递,最后特征向量i输入将来自特征向量i+1和特征向量i-1的影响统合在一起,成为hi 3
门控制结构网络,可通过一个权重值函数控制信息的传递,如果传递的信息被认为是可信的,其权重值较大,反之,如传递的信息被认为不可信,则其权重值较小。在图7所示的示例中,hi-1 1、hi+1 2传递给hi 1的信息中,并不是所有的信息都有效,故而通过门控制结构产生的权重值对hi-1 1、hi+1 2信息的传递加以约束。
如图8所示,为本申请实施例中门控制结构网络的一个可选应用示例图。门控制结构网络用于控制M+1个特征向量选取出的中间特征向量以为的其它特征向量的有效输入,例如图7中从i+1输入到i输入、或者从i-1输入到i输入传递的信息。如图8所示,本申请实施例在双向传导结构网络中增加了门控制结构网络,其可选通过一个权重值的产生函数---sigmoid函数,从i+1输入到i输入、或者从i-1输入到i输入传递的信息值进行加权处理。在双向门结构中,hi-1 1传递给hi 1的数值将会是两部分乘积,一部分是hi-1 1的卷积输出,另一部分是hi-1 0的卷积通过sigmoid函数加权处理之后的输出;同理适用于hi 2和hi+1 2之间。双向传导结构加上门控制结构之后,其对
Figure PCTCN2017102691-appb-000013
Figure PCTCN2017102691-appb-000014
改变可选公式如下所示:
Figure PCTCN2017102691-appb-000015
Figure PCTCN2017102691-appb-000016
Figure PCTCN2017102691-appb-000017
其中,·表示矩阵对应元素乘积,exp()表示指数函数,w和b均表示卷积网络中的参数,上、下标用于区分不同的参数,上标g表示该参数是表示门控制结构网络的参数; 其他符号的物理含义不变,可参考之前公式中的介绍;x表示当前中间特征向量,例如图7中的hi 0
Figure PCTCN2017102691-appb-000018
的公式中包含两项,第一项
Figure PCTCN2017102691-appb-000019
来自于第i个特征向量输入,第二项
Figure PCTCN2017102691-appb-000020
来自于第i-1个特征向量输入,多输入信息之间可相互辅助以进行物体检测,但不是所有的输入信息都应该有效,例如,兔耳朵有时可以作为将目标物体检测为兔子的有效证据,但也有时候不是,例如兔子玩具上的兔子耳朵便不能作为将目标物体检测为兔子的有效证据,因此使用门控制函数对某一输入信息的权重值进行控制。在一个可选示例中,我们对第i-1个特征向量添加一个权重控制项
Figure PCTCN2017102691-appb-000021
用于控制第i-1个特征向量输入的影响。当物体检测系统认为来自第i-1个特征向量的输入不可靠时,权重值
Figure PCTCN2017102691-appb-000022
的值可以趋近于0,反之,如果物体检测系统认为来自第i-1个特征向量的输入可靠,该权重值
Figure PCTCN2017102691-appb-000023
可为1。
本申请实施例提供的任一种物体检测方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请实施例提供的任一种物体检测方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请实施例提及的任一种物体检测方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述物体检测方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
图9为本申请物体检测装置一个实施例的结构示意图。该实施例的物体检测装置可用于实现本申请上述各物体检测方法实施例。如图9所示,该实施例的物体检测装置包括:物体定位单元,特征抽取单元,特征关联单元和物体检测单元。其中:
物体定位单元,用于对待检测图像进行物体定位,获得L个物体候选框。其中,L为大于0的整数。
特征抽取单元,用于分别以L个物体候选框中的每个物体候选框作为当前物体候选框,从待检测图像的至少一个特征图中抽取M+1个物体候选框对应的M+1个特征向量。其中,M+1个物体候选框包括当前物体候选框及其M个关联物体候选框,关联物体候选框与当前物体候选框具有相同的中心点、不同的高度和/或宽度;M的取值为大于0的整数。
特征关联单元,用于对M+1个特征向量进行关联,生成一个最终特征向量。
物体检测单元,用于根据最终特征向量进行物体检测,获得当前物体候选框的物体检测结果。
示例性地,该物体检测结果可以包括:当前物体候选框包括目标物体的概率值,或者当前物体候选框对应的物体类别。
基于本申请上述实施例提供的物体检测装置,分别针对每个物体候选框,获取与当前物体候选框分别具有相同的中心点、不同的高度和宽度的M个关联物体候选框,从待检测图像的特征图中抽取当前物体候选框及其M个关联物体候选框对应的M+1个特征向量,并对该总共M+1个特征向量进行关联,生成一个最终特征向量,然后根据该最终特征向量进行物体检测,获得当前物体候选框的物体检测结果。由于M个关联物体候选框与当前物体候选框分别具有相同的中心点、不同的高度和宽度,M+1个物体候选框涵盖的区域、分辨率分别不同,本申请实施例实现了多区域/多分辨率的图像输入方式,基于该M+1个物体候选框抽取的M+1个特征向量进行关联获得的最终特征向量进行物体检测,由于采用了不同区域/分辨率的视觉信息,可以避免单输入方式导致物体候选框的覆盖范围不正确时出现的问题,例如输入图像中细节被忽略、视觉内容不足和交叠率判断不准的问题,有利于提高物体检测的准确性。
图10为本申请物体检测装置另一个实施例的结构示意图。如图10所示,与图9所示 的实施例相比,该实施例中,还包括特征生成单元,用于生成待检测图像的特征图。
在为本申请物体检测装置的又一个实施例中,特征抽取单元还可用于根据当前物体候选框获取该当前物体候选框的M个关联物体候选框。
示例性地,特征抽取单元获取M个关联物体候选框时,通过以当前物体候选框的中心坐标为中心点,对预设宽度获取公式和预设高度获取公式中的参数分别赋予M个不同的数值,获得该M个关联物体候选框的宽度和高度,从而获取该M个关联物体候选框。
例如,特征抽取单元可以通过预设宽度获取公式和预设高度获取公式:bp=[xo,yo,(1+p)wo,(1+p)ho],对p分别赋予M个不同的数值,获得M个关联物体候选框。
其中,bp表示关联物体候选框,xo和yo分别表示当前物体候选框的中心点的横坐标和纵坐标,wo表示当前物体候选框的宽度,ho表示当前物体候选框的高度,(1+p)wo表示关联物体候选框的宽度,(1+p)ho表示关联物体候选框的高度。
在本申请各物体检测装置实施例的一个可选示例中,检测图像的特征图为一个。此时,特征抽取单元用于从待检测图像的一个特征图中抽取当前物体候选框及其M个关联物体候选框对应的M+1个特征向量。
在本申请各物体检测装置实施例的另一个可选示例中,待检测图像的特征图包括分别由不同深度的多个CNN生成的待检测图像的多个特征图。此时,特征抽取单元从待检测图像的特征图中抽取当前物体候选框及其M个关联物体候选框对应的M+1个特征向量时,用于从物体的多个特征图中抽取当前物体候选框及其M个关联物体候选框对应的M+1个特征向量。其中,从多个特征图中各特征图中抽取的特征向量的数量可以相同或任意不同。
在本申请各物体检测装置实施例的又一个可选示例中,特征关联单元,用于基于双向门控制结构网络对M+1个特征向量进行关联,生成一个最终特征向量。
在本申请各物体检测装置实施例的再一个可选示例中,双向门控制结构网络包括门控制结构网络子单元和双向传导结构网络子单元,特征关联单元可以包括:排序子单元,门控制结构网络子单元和双向传导结构网络子单元。其中:
排序子单元,用于对M+1个特征向量按照对应物体候选框的大小排序。
门控制结构网络子单元,用于从M+1个特征向量中选取一个中间特征向量,分别获取M+1个特征向量中除中间特征向量外的其它特征向量的权重值,并通过权重值对相应的其它特征向量的输入进行控制,获得其它特征向量的有效输入信息。其中的中间特征向量为排序的M+1个特征向量中除首个特征向量和末尾特征向量外的特征向量。
双向传导结构网络子单元,用于由中间特征向量与其它特征向量的有效输入信息生成最终特征向量。
可选地,在本申请各物体检测装置实施例的再一个可选示例中,门控制结构网络包括将特征向量映射为[0,1]的函数。
另外,在本申请物体检测装置的又一个实施例中,还可以包括:网络训练单元,用于通过多个样本图像对初始门控制结构网络进行训练,调整初始门控制结构网络的网络参数,获得门控制结构网络。
另外,在本申请物体检测装置的又一个实施例中,双向传导结构网络子单元,还可用于分别获取M+1个特征向量的响应。相应地,该实施例中,门控制结构网络子单元,用于获取所其它特征向量的响应的权重值,并通过权重值对相应的其它特征向量的响应进行控制。双向传导结构网络子单元,用于分别获取所述M+1个特征向量的响应,以及由上述中间特征向量的响应与其它特征向量的有效输入信息生成最终特征向量。
在其中一个可选示例中,双向传导结构网络子单元包括M+1个网络层。其中:
M+1个网络层中的前M个网络层,用于由上述中间特征向量的响应与每个其它特征向量的有效输入信息生成一个中间结果向量。
M+1个网络层中的第M+1个网络层,用于对所有中间结果向量进行串联求和,获得 最终特征向量。
在其中一个可选示例中,双向传导结构网络子单元包括M+1个网络层。其中:
M+1个网络层中的前M个网络层,用于由上述中间特征向量的响应与所有其它特征向量的有效输入信息生成一个中间结果向量;
M+1个网络层中的第M+1个网络层,用于对所有中间结果向量进行串联求和,获得最终特征向量。
本发明实施例还提供了一种数据处理装置,包括本发明上述任一实施例提供的物体检测装置。
可选地地,本发明实施例的数据处理装置可以是任意具有数据处理功能的装置,例如可以包括但不限于:进阶精简指令集机器(ARM)、中央处理单元(CPU)或图形处理单元(GPU)等。
另外,本申请实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等,该电子设备设置有本申请上述任一实施例的数据处理装置。
另外,本申请实施例还提供了另一种电子设备,包括本申请上述任一实施例提供的物体检测装置。
另外,本申请实施例还提供了又一种电子设备,包括:
处理器和本申请上述任一实施例提供的物体检测装置;
在处理器运行所述物体检测装置时,本申请上述任一实施例提供的物体检测装置中的单元被运行。
另外,本申请实施例还提供了再一种电子设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请上述任一实施例的物体检测方法对应的操作。
参照图11,示出了根据本申请种电子设备一实施例的结构示意图,本申请具体实施例并不对电子设备的具体实现做限定。如图11所示,该电子设备可以包括:处理器(processor)、通信接口(Communications Interface)、存储器(memory)、以及通信总线。其中:
处理器、通信接口、以及存储器通过通信总线完成相互间的通信。
通信接口,用于与其它设备比如其它客户端或服务器等的网元通信。
处理器可能是中央处理器(CPU),或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路,或者是图形处理器(Graphics Processing Unit,GPU)。终端设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU,或者,一个或多个GPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个GPU。
存储器,用于至少一可执行指令,该可执行指令使处理器执行如本申请上述任一实施例在物体检测方法对应的操作。存储器可能包含高速随机存取存储器(random access memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
图12为本申请电子设备另一个实施例的结构示意图。如图12所示,用于实现本申请实施例的电子设备包括中央处理单元(CPU)或者图形处理单元(GPU),其可以根据存储在只读存储器(ROM)中的可执行指令或者从存储部分加载到随机访问存储器(RAM)中的可执行指令而执行各种适当的动作和处理。中央处理单元或者图形处理单元可与只读存储器和/或随机访问存储器中通信以执行可执行指令从而完成本申请实施例提供的物体检测方法对应的操作,例如:对接收的待检测图像进行物体定位,获得L个物体候选框;其中,L为大于0的整数;分别以所述L个物体候选框中的每个物体候选框作为当前物体候选框,从所述待检测图像的至少一个特征图中抽取M+1个物体候选框对应的M+1个特 征向量;其中,所述M+1个物体候选框包括所述当前物体候选框及其M个关联物体候选框,所述关联物体候选框与所述当前物体候选框具有相同的中心点、不同的高度和/或宽度;M为大于0的整数;对所述M+1个特征向量进行关联,生成一个最终特征向量;根据所述最终特征向量进行物体检测,获得所述当前物体候选框的物体检测结果。
此外,在RAM中,还可存储有系统操作所需的各种程序和数据。CPU、GPU、ROM以及RAM通过总线彼此相连。输入/输出(I/O)接口也连接至总线。
以下部件连接至I/O接口:包括键盘、鼠标等的输入部分;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分;包括硬盘等的存储部分;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分。通信部分经由诸如因特网的网络执行通信处理。驱动器也根据需要连接至I/O接口。可拆卸介质,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器上,以便于从其上读出的计算机程序根据需要被安装入存储部分。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,所述程序代码可包括对应执行本申请实施例提供的任一项物体检测方法步骤对应的指令,例如,对接收的待检测图像进行物体定位,获得L个物体候选框的指令;其中,L为大于0的整数;分别以所述L个物体候选框中的每个物体候选框作为当前物体候选框,从所述待检测图像的至少一个特征图中抽取M+1个物体候选框对应的M+1个特征向量的指令;其中,所述M+1个物体候选框包括所述当前物体候选框及其M个关联物体候选框,所述关联物体候选框与所述当前物体候选框具有相同的中心点、不同的高度和/或宽度;M为大于0的整数;对所述M+1个特征向量进行关联,生成一个最终特征向量的指令;根据所述最终特征向量进行物体检测,获得所述当前物体候选框的物体检测结果的指令。该计算机程序可以通过通信部分从网络上被下载和安装,和/或从可拆卸介质被安装。在该计算机程序被中央处理单元(CPU)或图形处理单元(GPU)执行时,执行本申请的方法中限定的上述功能。
另外,本申请实施例还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请上述任一实施例的物体检测方法中各步骤的指令。
另外,本申请实施例还提供了一种计算机可读存储介质,用于存储计算机可读取的指令,所述指令被执行时实现本申请上述任一实施例的物体检测方法中各步骤的操作。
例如,在一个可选示例中,所述指令包括:对接收的待检测图像进行物体定位,获得L个物体候选框的指令;其中,L为大于0的整数;分别以所述L个物体候选框中的每个物体候选框作为当前物体候选框,从所述待检测图像的特征图中抽取M+1个关联物体候选框对应的M+1个特征向量的指令;其中,所述M+1个物体候选框包括所述当前物体候选框及其M个关联物体候选框,所述关联物体候选框与所述当前物体候选框具有相同的中心点、不同的高度和宽度;M的取值为大于0的整数;对所述M+1个特征向量进行关联,生成一个最终特征向量的指令;根据所述最终特征向量进行物体检测,获得所述当前物体候选框的物体检测结果的指令。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统、装置、设备实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本申请的方法、系统、装置和设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法、系统、装置和设备。用于所 述方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上可选描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (33)

  1. 一种物体检测方法,其特征在于,包括:
    对待检测图像进行物体定位,获得L个物体候选框;其中,L为大于0的整数;
    分别以所述L个物体候选框中的每个物体候选框作为当前物体候选框,从所述待检测图像的至少一个特征图中抽取M+1个物体候选框对应的M+1个特征向量;其中,所述M+1个物体候选框包括所述当前物体候选框及其M个关联物体候选框,所述关联物体候选框与所述当前物体候选框具有相同的中心点、不同的高度和/或宽度;M为大于0的整数;
    对所述M+1个特征向量进行关联,生成一个最终特征向量;
    根据所述最终特征向量进行物体检测,获得所述当前物体候选框的物体检测结果。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    通过卷积神经网络CNN,生成所述待检测图像的特征图。
  3. 根据权利要求1或2所述的方法,其特征在于,还包括:
    根据所述当前物体候选框获取该当前物体候选框的M个关联物体候选框。
  4. 根据权利要求3所述的方法,其特征在于,根据所述当前物体候选框获取该当前物体候选框的M个关联物体候选框包括:
    以所述当前物体候选框的中心坐标为中心点,对预设宽度获取公式和预设高度获取公式中的参数分别赋予M个不同的数值,获得所述M个关联物体候选框的宽度和高度,从而获取M个关联物体候选框。
  5. 根据权利要求3或4所述的方法,其特征在于,所述待检测图像的特征图为一个。
  6. 根据权利要求3或4所述的方法,其特征在于,所述待检测图像的特征图包括分别由不同深度的多个CNN生成的所述待检测图像的多个特征图;
    从所述待检测图像的至少一个特征图中抽取M+1个物体候选框对应的M+1个特征向量包括:
    从所述物体的多个特征图中抽取当前物体候选框及其M个关联物体候选框对应的M+1个特征向量。
  7. 根据权利要求1至6任意一项所述的方法,其特征在于,对所述M+1个特征向量进行关联,包括:
    基于双向门控制结构网络对所述M+1个特征向量进行关联。
  8. 根据权利要求7所述的方法,其特征在于,所述基于双向门控制结构网络对所述M+1个特征向量进行关联,生成一个最终特征向量,包括:
    对所述M+1个特征向量按照对应物体候选框的大小排序;
    从所述M+1个特征向量中选取一个中间特征向量,通过所述双向门控制结构网络中的门控制结构网络,分别获取所述M+1个特征向量中除所述中间特征向量外的其它特征向量的权重值,并通过权重值对相应的所述其它特征向量的输入进行控制,获得所述其它特征向量的有效输入信息;所述中间特征向量为排序的所述M+1个特征向量中除首个特征向量和末尾特征向量外的特征向量;
    通过所述双向门控制结构网络中的双向传导结构网络,由所述中间特征向量与所述其它特征向量的有效输入信息生成所述最终特征向量。
  9. 根据权利要求8所述的方法,其特征在于,所述门控制结构网络包括将特征向量映射为[0,1]的函数。
  10. 根据权利要求9所述的方法,其特征在于,还包括:
    通过多个样本图像对初始门控制结构网络进行训练,调整初始门控制结构网络的网络参数,获得所述门控制结构网络。
  11. 根据权利要求8至10任意一项所述的方法,其特征在于,还包括:
    通过双向传导结构网络分别获取所述M+1个特征向量的响应;
    所述获取所述M+1个特征向量中除所述中间特征向量外的其它特征向量的权重值,并通过权重值对相应的所述其它特征向量的输入进行控制,包括:获取所其它特征向量的响应的权重值,并通过权重值对相应的所述其它特征向量的响应进行控制;
    由所述中间特征向量与所述其它特征向量的有效输入信息生成所述最终特征向量,包括:由所述中间特征向量的响应与所述其它特征向量的有效输入信息生成所述最终特征向量。
  12. 根据权利要求11所述的方法,其特征在于,由所述中间特征向量的响应与所述其它特征向量的有效输入信息生成所述最终特征向量,包括:
    分别通过双向传导结构网络的前M个网络层,由所述中间特征向量的响应与每个所述其它特征向量的有效输入信息生成一个中间结果向量;
    通过所述双向传导结构网络的第M+1个网络层,对所有中间结果向量进行串联求和,获得所述最终特征向量。
  13. 根据权利要求11所述的方法,其特征在于,由所述中间特征向量的响应与所述其它特征向量的有效输入信息生成所述最终特征向量,包括:
    分别通过双向传导结构网络的前M个网络层,由所述中间特征向量的响应与所有所述其它特征向量的有效输入信息生成一个中间结果向量;
    通过所述双向传导结构网络的第M+1个网络层,对所有中间结果向量进行串联求和,获得所述最终特征向量。
  14. 根据权利要求1至13任意一项所述的方法,其特征在于,所述物体检测结果包括:所述当前物体候选框包括目标物体的概率值;或者,所述当前物体候选框对应的物体类别。
  15. 一种物体检测装置,其特征在于,包括:
    物体定位单元,用于对待检测图像进行物体定位,获得L个物体候选框;其中,L为大于0的整数;
    特征抽取单元,用于分别以所述L个物体候选框中的每个物体候选框作为当前物体候选框,从所述待检测图像的至少一个特征图中抽取M+1个物体候选框对应的M+1个特征向量;其中,所述M+1个物体候选框包括所述当前物体候选框及其M个关联物体候选框,所述关联物体候选框与所述当前物体候选框具有相同的中心点、不同的高度和/或宽度;M为大于0的整数;
    特征关联单元,用于对所述M+1个特征向量进行关联,生成一个最终特征向量;
    物体检测单元,用于根据所述最终特征向量进行物体检测,获得所述当前物体候选框的物体检测结果。
  16. 根据权利要求15所述的装置,其特征在于,还包括:
    特征生成单元,用于生成所述待检测图像的特征图。
  17. 根据权利要求15或16所述的装置,其特征在于,所述特征抽取单元,还用于根据所述当前物体候选框获取该当前物体候选框的M个关联物体候选框。
  18. 根据权利要求17所述的装置,其特征在于,所述特征抽取单元,用于以所述当前物体候选框的中心坐标为中心点,对预设宽度获取公式和预设高度获取公式中的参数分别赋予M个不同的数值,获得所述M个关联物体候选框的宽度和高度,从而获取M个关联物体候选框。
  19. 根据权利要求17或18所述的装置,其特征在于,所述待检测图像的特征图为一个。
  20. 根据权利要求17或18所述的装置,其特征在于,所述待检测图像的特征图包括分别由不同深度的多个CNN生成的所述待检测图像的多个特征图;
    所述特征抽取单元,用于从所述物体的多个特征图中抽取当前物体候选框及其M个关 联物体候选框对应的M+1个特征向量。
  21. 根据权利要求15至20任意一项所述的装置,其特征在于,所述特征关联单元,用于基于双向门控制结构网络对所述M+1个特征向量进行关联,生成一个最终特征向量。
  22. 根据权利要求21所述的装置,其特征在于,所述双向门控制结构网络包括门控制结构网络子单元和双向传导结构网络子单元;
    所述特征关联单元包括:
    排序子单元,用于对所述M+1个特征向量按照对应物体候选框的大小排序;
    门控制结构网络子单元,用于从所述M+1个特征向量中选取一个中间特征向量,分别获取所述M+1个特征向量中除所述中间特征向量外的其它特征向量的权重值,并通过权重值对相应的所述其它特征向量的输入进行控制,获得所述其它特征向量的有效输入信息;所述中间特征向量为排序的所述M+1个特征向量中除首个特征向量和末尾特征向量外的特征向量;
    双向传导结构网络子单元,用于由所述中间特征向量与所述其它特征向量的有效输入信息生成所述最终特征向量。
  23. 根据权利要求22所述的装置,其特征在于,所述门控制结构网络包括将特征向量映射为[0,1]的函数。
  24. 根据权利要求23所述的装置,其特征在于,还包括:
    网络训练单元,用于通过多个样本图像对初始门控制结构网络进行训练,调整初始门控制结构网络的网络参数,获得所述门控制结构网络。
  25. 根据权利要求22至24任意一项所述的装置,其特征在于,
    所述门控制结构网络子单元,用于获取所其它特征向量的响应的权重值,并通过权重值对相应的所述其它特征向量的响应进行控制;
    所述双向传导结构网络子单元,用于分别获取所述M+1个特征向量的响应,以及由所述中间特征向量的响应与所述其它特征向量的有效输入信息生成所述最终特征向量。
  26. 根据权利要求25所述的装置,其特征在于,所述双向传导结构网络子单元包括M+1个网络层;
    所述M+1个网络层中的前M个网络层,用于由所述中间特征向量的响应与每个所述其它特征向量的有效输入信息生成一个中间结果向量;
    所述M+1个网络层中的第M+1个网络层,用于对所有中间结果向量进行串联求和,获得所述最终特征向量。
  27. 根据权利要求25所述的装置,其特征在于,所述双向传导结构网络子单元包括M+1个网络层;
    所述M+1个网络层中的前M个网络层,用于由所述中间特征向量的响应与所有所述其它特征向量的有效输入信息生成一个中间结果向量;
    所述M+1个网络层中的第M+1个网络层,用于对所有中间结果向量进行串联求和,获得所述最终特征向量。
  28. 根据权利要求15至27任意一项所述的装置,其特征在于,所述物体检测结果包括:所述当前物体候选框包括目标物体的概率值,或者所述当前物体候选框对应的物体类别。
  29. 一种电子设备,其特征在于,包括权利要求15至28任意一项所述的物体检测装置。
  30. 一种电子设备,其特征在于,包括:
    处理器和权利要求15-28任一所述的物体检测装置;
    在处理器运行所述物体检测装置时,权利要求15-28任一所述的物体检测装置中的单元被运行。
  31. 一种电子设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行权利要求1-14任一所述的物体检测方法对应的操作。
  32. 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1-14任一所述的物体检测方法中各步骤的指令。
  33. 一种计算机可读存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1-14任一所述的物体检测方法中各步骤的操作。
PCT/CN2017/102691 2016-09-23 2017-09-21 物体检测方法和装置、电子设备、计算机程序和存储介质 WO2018054329A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610848961.7A CN106529527A (zh) 2016-09-23 2016-09-23 物体检测方法和装置、数据处理装置和电子设备
CN201610848961.7 2016-09-23

Publications (1)

Publication Number Publication Date
WO2018054329A1 true WO2018054329A1 (zh) 2018-03-29

Family

ID=58344293

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/102691 WO2018054329A1 (zh) 2016-09-23 2017-09-21 物体检测方法和装置、电子设备、计算机程序和存储介质

Country Status (2)

Country Link
CN (1) CN106529527A (zh)
WO (1) WO2018054329A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348453A (zh) * 2018-04-04 2019-10-18 中国科学院上海高等研究院 一种基于级联的物体检测方法及系统、存储介质及终端
CN110427915A (zh) * 2019-08-14 2019-11-08 北京百度网讯科技有限公司 用于输出信息的方法和装置
CN111308456A (zh) * 2020-04-08 2020-06-19 加特兰微电子科技(上海)有限公司 目标物位置判断方法、装置、毫米波雷达及存储介质
CN111860136A (zh) * 2020-06-08 2020-10-30 北京阿丘机器人科技有限公司 包裹定位方法、装置、设备及计算机可读存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529527A (zh) * 2016-09-23 2017-03-22 北京市商汤科技开发有限公司 物体检测方法和装置、数据处理装置和电子设备
CN107292306A (zh) * 2017-07-07 2017-10-24 北京小米移动软件有限公司 目标检测方法及装置
CN108229307B (zh) * 2017-11-22 2022-01-04 北京市商汤科技开发有限公司 用于物体检测的方法、装置和设备
CN109447943B (zh) * 2018-09-21 2020-08-14 中国科学院深圳先进技术研究院 一种目标检测方法、系统及终端设备
TWI697846B (zh) 2018-11-26 2020-07-01 財團法人工業技術研究院 物體辨識方法及其裝置
CN109934214A (zh) * 2019-02-22 2019-06-25 深兰科技(上海)有限公司 一种对象类别的训练、检测方法及装置
CN109886208B (zh) * 2019-02-25 2020-12-18 北京达佳互联信息技术有限公司 物体检测的方法、装置、计算机设备及存储介质
CN109948497B (zh) * 2019-03-12 2022-01-28 北京旷视科技有限公司 一种物体检测方法、装置及电子设备
CN110082821B (zh) * 2019-03-26 2020-10-02 长江大学 一种无标签框微地震信号检测方法及装置
CN109977963B (zh) * 2019-04-10 2021-10-15 京东方科技集团股份有限公司 图像处理方法、设备、装置以及计算机可读介质
CN110210474B (zh) 2019-04-30 2021-06-01 北京市商汤科技开发有限公司 目标检测方法及装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872477A (zh) * 2009-04-24 2010-10-27 索尼株式会社 检测图像中的对象的方法、装置,及包括该装置的系统
CN104680190A (zh) * 2013-11-29 2015-06-03 华为技术有限公司 目标检测方法及装置
CN105740892A (zh) * 2016-01-27 2016-07-06 北京工业大学 一种高准确率的基于卷积神经网络的人体多部位识别方法
CN106529527A (zh) * 2016-09-23 2017-03-22 北京市商汤科技开发有限公司 物体检测方法和装置、数据处理装置和电子设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779267B (zh) * 2011-05-12 2015-08-12 株式会社理光 检测图像中特定对象区域的方法和设备
CN105512685B (zh) * 2015-12-10 2019-12-03 小米科技有限责任公司 物体识别方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872477A (zh) * 2009-04-24 2010-10-27 索尼株式会社 检测图像中的对象的方法、装置,及包括该装置的系统
CN104680190A (zh) * 2013-11-29 2015-06-03 华为技术有限公司 目标检测方法及装置
CN105740892A (zh) * 2016-01-27 2016-07-06 北京工业大学 一种高准确率的基于卷积神经网络的人体多部位识别方法
CN106529527A (zh) * 2016-09-23 2017-03-22 北京市商汤科技开发有限公司 物体检测方法和装置、数据处理装置和电子设备

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348453A (zh) * 2018-04-04 2019-10-18 中国科学院上海高等研究院 一种基于级联的物体检测方法及系统、存储介质及终端
CN110348453B (zh) * 2018-04-04 2022-10-04 中国科学院上海高等研究院 一种基于级联的物体检测方法及系统、存储介质及终端
CN110427915A (zh) * 2019-08-14 2019-11-08 北京百度网讯科技有限公司 用于输出信息的方法和装置
CN110427915B (zh) * 2019-08-14 2022-09-27 北京百度网讯科技有限公司 用于输出信息的方法和装置
CN111308456A (zh) * 2020-04-08 2020-06-19 加特兰微电子科技(上海)有限公司 目标物位置判断方法、装置、毫米波雷达及存储介质
CN111308456B (zh) * 2020-04-08 2023-05-23 加特兰微电子科技(上海)有限公司 目标物位置判断方法、装置、毫米波雷达及存储介质
CN111860136A (zh) * 2020-06-08 2020-10-30 北京阿丘机器人科技有限公司 包裹定位方法、装置、设备及计算机可读存储介质
CN111860136B (zh) * 2020-06-08 2024-03-29 北京阿丘机器人科技有限公司 包裹定位方法、装置、设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN106529527A (zh) 2017-03-22

Similar Documents

Publication Publication Date Title
WO2018054329A1 (zh) 物体检测方法和装置、电子设备、计算机程序和存储介质
US11763466B2 (en) Determining structure and motion in images using neural networks
US11170210B2 (en) Gesture identification, control, and neural network training methods and apparatuses, and electronic devices
WO2019091464A1 (zh) 目标检测方法和装置、训练方法、电子设备和介质
WO2018121737A1 (zh) 关键点预测、网络训练及图像处理方法和装置、电子设备
US10366313B2 (en) Activation layers for deep learning networks
US11551377B2 (en) Eye gaze tracking using neural networks
TWI721510B (zh) 雙目圖像的深度估計方法、設備及儲存介質
US10204423B2 (en) Visual odometry using object priors
EP3576017A1 (en) Method, apparatus, and device for determining pose of object in image, and storage medium
WO2018019126A1 (zh) 视频类别识别方法和装置、数据处理装置和电子设备
WO2019024808A1 (zh) 语义分割模型的训练方法和装置、电子设备、存储介质
US20230230275A1 (en) Inverting Neural Radiance Fields for Pose Estimation
CN108229353B (zh) 人体图像的分类方法和装置、电子设备、存储介质、程序
WO2018099473A1 (zh) 场景分析方法和系统、电子设备
WO2020062493A1 (zh) 图像处理方法和装置
CN108229418B (zh) 人体关键点检测方法和装置、电子设备、存储介质和程序
US20210342593A1 (en) Method and apparatus for detecting target in video, computing device, and storage medium
US10643063B2 (en) Feature matching with a subspace spanned by multiple representative feature vectors
US11669977B2 (en) Processing images to localize novel objects
WO2019100886A1 (zh) 用于确定目标对象的外接框的方法、装置、介质和设备
CN108229494B (zh) 网络训练方法、处理方法、装置、存储介质和电子设备
US11244475B2 (en) Determining a pose of an object in the surroundings of the object by means of multi-task learning
CN109345460B (zh) 用于矫正图像的方法和装置
US10163000B2 (en) Method and apparatus for determining type of movement of object in video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17852397

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17852397

Country of ref document: EP

Kind code of ref document: A1