US20230368033A1 - Information processing device, control method, and program - Google Patents
Information processing device, control method, and program Download PDFInfo
- Publication number
- US20230368033A1 US20230368033A1 US18/227,699 US202318227699A US2023368033A1 US 20230368033 A1 US20230368033 A1 US 20230368033A1 US 202318227699 A US202318227699 A US 202318227699A US 2023368033 A1 US2023368033 A1 US 2023368033A1
- Authority
- US
- United States
- Prior art keywords
- distribution
- partial
- likelihood
- image data
- target object
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims description 45
- 238000009826 distribution Methods 0.000 claims abstract description 167
- 238000013528 artificial neural network Methods 0.000 claims description 53
- 239000000284 extract Substances 0.000 abstract description 11
- 230000006870 function Effects 0.000 description 26
- 238000010586 diagram Methods 0.000 description 23
- 238000000605 extraction Methods 0.000 description 19
- 238000001514 detection method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000001902 propagating effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
Definitions
- the present invention relates to a technology of detecting an object from an image.
- Patent Document 1 discloses a technology of performing object detection by use of a deep neural network.
- a system in Patent Document 1 generates a feature map of image data by use of a convolutional neural network and, by inputting the generated feature map to a neural network called a region proposal network (RPN), outputs many proposals of rectangular regions (region proposals) each of which including an object.
- the system further estimates a class of an object included in a region proposal by performing classification in a layer called a box-classification layer.
- the system also adjusts a position and a size of a region proposal by performing regression in a layer called a box-regression convolutional layer.
- Non Patent Document 1 generates a plurality of feature maps by use of a convolutional neural network and outputs many object proposals from each feature map.
- the each object proposal includes rectangular coordinates and a likelihood of an object class.
- Patent Document 1 and Non Patent Document 1 a case of significantly overlapping objects is eliminated as erroneous detection, and therefore a case of significant overlap is conversely not considered; and it is conceivable that a plurality of overlapping objects are erroneously detected as a single object in such a case.
- the present invention has been made in view of the aforementioned problem and provides a technology capable of distinctively detecting objects even when the objects overlap each another in image data.
- An information processing apparatus includes: 1) a generation unit configured to acquire image data and generate likelihood data representing a likelihood of existence of a target object with respect to a position and a size for each of a plurality of partial regions included in the image data; 2) an extraction unit configured to compute a distribution of a likelihood of existence of a target object with respect to a position and a size by computing a total sum of likelihood data each piece of which is generated for each partial region and extract, from the computed distribution, one or more partial distributions each of which relates to one target object; and 3) an output unit configured to, for each extracted partial distribution, output a position and a size of a target object relating to the partial distribution, based on a statistic of the partial distribution.
- a control method is executed by a computer.
- the control method includes: 1) a generation step of acquiring image data and generating likelihood data representing a likelihood of existence of a target object with respect to a position and a size for each of a plurality of partial regions included in the image data; 2) an extraction step of computing a distribution of a likelihood of existence of a target object with respect to a position and a size by computing a total sum of likelihood data each piece of which is generated for each partial region and extracting, from the computed distribution, one or more partial distributions each of which relates to one target object; and 3) an output step of, for each extracted partial distribution, outputting a position and a size of a target object relating to the partial distribution, based on a statistic of the partial distribution.
- a program according to the present invention causes a computer to execute each step included in the control method according to the present invention.
- the present invention provides a technology capable of distinctively detecting objects even when the objects overlap each another in image data.
- FIG. 1 is a diagram conceptually illustrating processing performed by an information processing apparatus according to the example embodiment 1.
- FIG. 2 is a diagram illustrating image data including target objects significantly overlapping each other.
- FIG. 3 is a diagram illustrating a functional configuration of the information processing apparatus according to the example embodiment 1.
- FIG. 4 is a diagram illustrating a computer for providing the information processing apparatus.
- FIG. 5 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to the example embodiment 1.
- FIG. 6 is a diagram illustrating a method of extracting a partial region by use of a sliding window.
- FIG. 7 is a diagram illustrating a neural network used for generation of likelihood data.
- FIG. 8 is a diagram conceptually illustrating likelihood data generated based on a likelihood Li.
- FIG. 9 is a diagram illustrating a neural network outputting parameters of a normal distribution indicated by likelihood data.
- FIG. 10 is a flowchart illustrating a flow of processing of extracting a partial distribution on the basis of the maximum value of a PHD.
- FIG. 11 is a diagram illustrating a position and a size of a target object determined based on a partial distribution.
- FIG. 12 is a block diagram illustrating an information processing apparatus having a function of learning by a neural network.
- FIG. 13 is a diagram illustrating an ideal PHD.
- FIG. 1 is a diagram conceptually illustrating processing performed by an information processing apparatus 2000 according to the present example embodiment.
- the information processing apparatus 2000 acquires image data 10 and detects a target object from the image data 10 .
- Detection of a target object means determination of a position and a size of an image region (such as a circumscribed rectangle) including the target object from the image data 10 .
- Any object may be handled as a target object, or only a specific type of object (such as only a human) may be handled as a target object.
- the information processing apparatus 2000 detects an object by a method described below. First, the information processing apparatus 2000 generates parameters representing likelihood data for each of a plurality of partial regions 12 in the image data 10 .
- the likelihood data are data being associated with a position and a size on the image data 10 and indicating a distribution of a likelihood that a target object exists in an image region at the position with the size. Specifically, denoting a predetermined probability density function the integral of which is 1 as f and a generated parameter as L, likelihood data is expressed by L ⁇ f.
- a normal distribution the position and the variance of which vary for each partial region may be used as the probability density function f, or a ⁇ function may be used for expressing existence at a specific position only, or another probability density function may be adopted.
- a ⁇ function represents a function taking infinity only at a specific value, taking 0 at the other values, and having an integral value of 1.
- the integral value of the likelihood data L ⁇ f matches the value of the generated parameter L.
- the likelihood data in FIG. 1 indicate such a distribution. Further details of the likelihood data will be described later.
- the information processing apparatus 2000 computes a distribution of an existence likelihood of a target object with respect to a position and a size by computing the total sum of likelihood data each piece of which is generated for each partial region 12 .
- the distribution is a so-called probability hypothesis density (PHD).
- PHD is a distribution function having a characteristic that the integrated value matches the number of existing objects.
- the information processing apparatus 2000 extracts, from the PHD, partial distributions each of which relates to one target object (hereinafter referred to as partial distributions). Ideally, each of the partial distributions is extracted in such a way that the integral value thereof is 1, and each partial distribution relates to one target object.
- Three partial distributions are extracted from a PHD in FIG. 1 .
- the integrated value of the PHD is 3, and partial distributions are extracted in such a way that the integral of each partial distributions is 1.
- the partial distributions may be extracted in such a way as to overlap each other.
- each integral value becomes 1 when a shape of a partial distribution is limited to a normal distribution
- the partial distributions may be determined in such a way as to minimize the error between the sum of the partial distributions and the PHD.
- each partial distribution may be limited to a normal distribution ⁇ a weight.
- the integral value matches the weight in the case of the limitation, and therefore the partial distributions may be determined in such a way as to minimize the total sum of the error between the sum of the partial distributions and the PHD, and the error between the weight value and 1.
- a distribution other than a normal distribution may be adopted as a limited distribution shape.
- the information processing apparatus 2000 For each extracted partial distribution, the information processing apparatus 2000 outputs a position and a size of a target object represented by the partial distribution, based on a statistic such as the mean of the partial distribution.
- a position of a target object is represented by coordinates of a predetermined position (such as an upper-left corner) of a circumscribed rectangle representing the target object.
- a size of a target object can be represented by a width and a height of a rectangular region representing the target object.
- each distribution illustrated in FIG. 1 is depicted two-dimensionally (horizontal axis: position/size ⁇ vertical axis: likelihood) for convenience of illustration, the distribution is actually a distribution on a three-or-more-dimensional space.
- a position of an image region is represented by coordinates
- the shape of the image region is a rectangle
- the size of the rectangle is represented by a width and a height.
- each distribution illustrated in FIG. 1 is expressed on a five-dimensional (X coordinate, Y coordinate, width, height ⁇ likelihood) space.
- the information processing apparatus 2000 detects a target object by a method of computing a PHD by adding up likelihood data each piece of which is computed for each partial region, and extracting a partial distribution representing one target object.
- the method enables highly precise distinction even between significantly overlapping target objects and detection of the target objects as separate target objects. The reason will be described below with reference to FIG. 2 .
- FIG. 2 is a diagram illustrating image data 10 including significantly overlapping target objects.
- the image data 10 is a captured image of a scene in which two persons pass each other. When persons are correctly detected from the image data 10 , two persons are detected. However, it is difficult to distinctively detect persons being significantly overlapping objects by existing techniques, and the probability of the two persons being collectively detected as one person is high.
- the information processing apparatus 2000 generates a PHD acquired by adding up likelihood data each piece of which is generated for each partial region 12 .
- the integrated value in any section of the PHD represents the number of target objects in the section.
- information about the number of target objects is included in a PHD being information acquired by integrating information acquired from each partial region 12 .
- a partial distribution the integral value of which is 1 is extracted from a PHD.
- This enables separation of significantly overlapping target objects and acquisition of a probability distribution of a position and a size of an image region relating to each target object.
- a shaded partial distribution and a dotted partial distribution are extracted from a PHD in FIG. 2 . Then, by determining a position and a size of a target object for each extracted partial distribution, each target object can be detected.
- FIG. 1 and FIG. 2 is an exemplification for ease of understanding of the information processing apparatus 2000 and does not limit the functions of the information processing apparatus 2000 .
- the information processing apparatus 2000 according to the present example embodiment will be described in more detail below.
- FIG. 3 is a diagram illustrating a functional configuration of the information processing apparatus 2000 according to the example embodiment 1.
- the information processing apparatus 2000 includes a generation unit 2020 , an extraction unit 2040 , and an output unit 2060 .
- the generation unit 2020 acquires image data 10 and generates likelihood data for each of a plurality of partial regions 12 included in the image data 10 .
- the extraction unit 2040 computes a PHD by computing the total sum of likelihood data each piece of which is generated for each partial region 12 .
- the extraction unit 2040 extracts, from the computed PHD, one or more partial distributions each of which relates to one target object.
- the output unit 2060 outputs a position and a size of a target object represented by the partial distribution, based on a statistic of the partial distribution.
- Each functional configuration unit in the information processing apparatus 2000 may be provided by hardware (such as a hardwired electronic circuit) providing each functional configuration unit or may be provided by a combination of hardware and software (such as a combination of an electronic circuit and a program controlling the circuit).
- hardware such as a hardwired electronic circuit
- software such as a combination of an electronic circuit and a program controlling the circuit.
- FIG. 4 is a diagram illustrating a computer 1000 for providing the information processing apparatus 2000 .
- the computer 1000 may be any computer. Examples of the computer 1000 include a personal computer (PC) and a server machine.
- the computer 1000 may be a dedicated computer designed for providing the information processing apparatus 2000 or may be a general-purpose computer.
- the computer 1000 includes a bus 1020 , a processor 1040 , a memory 1060 , a storage device 1080 , an input-output interface 1100 , and a network interface 1120 .
- the bus 1020 is a data transmission channel for the processor 1040 , the memory 1060 , the storage device 1080 , the input-output interface 1100 , and the network interface 1120 to mutually transmit and receive data.
- a method of connecting the processor 1040 and the like to each another is not limited to the bus connection.
- the processor 1040 includes various processors such as a central processing unit (CPU), a graphics processing unit (GPU), and a field-programmable gate array (FPGA).
- the memory 1060 is a main storage provided by use of a random access memory (RAM) and/or the like.
- the storage device 1080 is an auxiliary storage provided by use of a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), and/or the like.
- the input-output interface 1100 is an interface for connecting the computer 1000 to an input/output device.
- the input-output interface 1100 is connected to an input apparatus such as a keyboard and an output apparatus such as a display apparatus.
- the network interface 1120 is an interface for connecting the computer 1000 to a communication network. Examples of the communication network include a local area network (LAN) and a wide area network (WAN).
- a method of connecting the network interface 1120 to the communication network may be a wireless connection or a wired connection.
- the storage device 1080 stores a program module providing each functional configuration unit in the information processing apparatus 2000 .
- the processor 1040 provides a function relating to each program module by reading the program module into the memory 1060 and executing the program module.
- FIG. 5 is a flowchart illustrating a flow of processing executed by the information processing apparatus 2000 according to the example embodiment 1.
- the generation unit 2020 acquires image data 10 (S 102 ).
- the generation unit 2020 generates likelihood data for each of a plurality of partial regions 12 included in the image data 10 (S 104 ).
- the extraction unit 2040 computes a PHD by adding up likelihoods represented by the likelihood data (S 106 ).
- the extraction unit 2040 extracts one or more partial distributions from the PHD (S 108 ). For each partial distribution, the output unit 2060 outputs a position and a size of a target object relating to the partial distribution (S 110 ).
- the information processing apparatus 2000 may execute a series of processes illustrated in FIG. 5 in response to any trigger.
- the information processing apparatus 2000 executes the aforementioned series of processes in response to input of the image data 10 .
- the information processing apparatus 2000 may execute the aforementioned series of processes in response to a predetermined input operation by a user.
- the generation unit 2020 acquires image data 10 (S 102 ). Any image data may be used as the image data 10 .
- the image data 10 are a captured image generated by a camera.
- the camera may be a still camera or a video camera.
- a captured image generated by a camera may be a captured image generated by a camera itself or an image acquired by applying some processing on a captured image generated by a camera.
- the information processing apparatus 2000 may be provided inside a camera generating the image data 10 .
- a camera generating the image data 10 .
- an object can be detected in real time from a surveillance video generated by the surveillance camera.
- types of camera called an intelligent camera, an Internet Protocol (IP) camera, and a network camera can be used as a camera incorporating the function of the information processing apparatus 2000 .
- IP Internet Protocol
- the generation unit 2020 may acquire image data 10 by any method.
- the generation unit 2020 acquires image data 10 from a storage storing the image data 10 .
- the storage storing the image data 10 may be provided inside the information processing apparatus 2000 or may be provided outside.
- the information processing apparatus 2000 acquires image data 10 input by an input operation by a user.
- the generation unit 2020 acquires image data 10 by receiving the image data 10 transmitted by another apparatus.
- a partial region 12 is a partial image region included in the image data 10 .
- a partial region 12 is different from another partial region 12 with respect to at least either one of a position and a size.
- the generation unit 2020 extracts each partial region 12 included in the image data 10 and, by analyzing the extracted partial region 12 , generates likelihood data for the partial region 12 .
- a partial region 12 can be extracted by use of a sliding window.
- FIG. 6 is a diagram illustrating a method of extracting a partial region 12 by use of a sliding window.
- the information processing apparatus 2000 moves a sliding window with a predetermined size (width: Ws, height: Hs) at a predetermined stride d.
- a plurality of image regions with different sizes are extracted from the sliding window at various positions and each image region is handled as a partial region 12 .
- partial regions 12 with varying positions and sizes can be extracted.
- a technique using an Anchor box disclosed in Patent Document 1 can be used to extract as a thus partial region 12 .
- a partial region 12 may be extracted from a feature map generated from the image data instead of being directly extracted from the image data 10 .
- a neural network 20 to be described later is constituted of a layer for extracting a feature map from the image data 10 (such as a convolutional layer in a convolutional neural network) and a layer for extracting a partial region 12 from a feature map output from the layer and generating likelihood data.
- a shape of a partial region 12 is not necessarily limited to a rectangle.
- the partial region 12 can be represented by center coordinates and a length of a radius.
- a polygon in any shape can be handled as a partial region 12 . In this case, both a position and a size of the partial region 12 is determined by a set of vertices of the partial region 12 .
- the generation unit 2020 generates parameters representing likelihood data for each of a plurality of partial regions 12 included in the image data 10 and generates likelihood data (S 104 ).
- parameters representing likelihood data are generated by use of a neural network.
- FIG. 7 is a diagram illustrating a neural network used for generation of parameters representing likelihood data.
- a neural network 20 outputs, for each partial region 12 included in the image data 10 , a likelihood Li that a target object exists in an image region with the position and the size of the partial region 12 .
- Li is a likelihood output for an i-th partial region 12 .
- the generation unit 2020 sets a distribution determined based on a likelihood Li as likelihood data.
- FIG. 8 is a diagram conceptually illustrating likelihood data generated based on a likelihood Li.
- likelihood data represent a distribution having a variance of 0 and being generated based on a likelihood Li.
- the distribution is expressed as Li ⁇ function by use a ⁇ function.
- likelihood data in the lower part of FIG. 8 represent a distribution with a nonzero variance.
- a distribution conforming to a predetermined model such as a normal distribution is predetermined as a distribution as a reference (hereinafter referred to as a reference distribution).
- a reference distribution may be determined as a distribution having 1 as the integral value, the position and the size of the partial region 12 as the mean, and a predetermined value as the variance. Any value may be set to the variance.
- the generation unit 2020 generates likelihood data by multiplying a reference distribution by a likelihood Li.
- a reference distribution model is a normal distribution. Then, based on the position (xi, yi) of the partial region 12 and the size (wi, hi) of the partial region 12 , the mean of the reference distribution is (xi, yi, wi, hi). Further, the variance of the reference distribution is vi. From the above, the reference distribution is N[(xi, yi, wi, hi), vi]. Furthermore, a likelihood output from the neural network 20 is Li. Then, the generation unit 2020 generates a distribution indicating the likelihood data by multiplying the reference distribution by Li. The integral value of a distribution of the acquired likelihood data is Li.
- a reference distribution conforming to a distribution model may not be predetermined, and parameters of a distribution model may be output from the neural network 20 .
- parameters of a distribution model are the aforementioned mean and variance. Then, the neural network 20 outputs a mean and a variance for each partial region 12 .
- FIG. 9 is a diagram illustrating the neural network 20 outputting parameters of a normal distribution indicated by likelihood data.
- a likelihood Li, (xiu, yiu, wiu, hiu) representing the mean of a normal distribution, and the variance vi of the normal distribution are output for each partial region 12 .
- the generation unit 2020 generates a distribution indicated by the likelihood data.
- the position (xi, yi) output from the neural network 20 may be different from the original position of a relating i-th partial region 12 .
- the size (wi, hi) output from the neural network 20 may be different from the original size of the relating i-th partial region 12 .
- the neural network 20 adjusts and outputs the position and the size of the partial region 12 in such a way as to increase a likelihood that a target object is included in the partial region 12 by causing the neural network 20 to perform learning in such a way as to output an ideal PHD.
- the neural network 20 does not necessarily output all parameters of the distribution model and may output only part of the parameters.
- the mean of the normal distribution is output from the neural network 20 , and a predetermined value is used as the variance.
- any structure may be used as an internal structure (such as the number and an order of layers, a type of each layer, and a connection relation between the layers) of the neural network.
- the same structure as that of the region proposal network (RPN) described in Patent Document 1 may be adopted as the structure of the neural network 20 .
- the network described in Non Patent Document 1 may be used.
- generation of likelihood data does not necessarily need to be performed by use of a neural network, and another existing technique of, for each of a plurality of partial regions in image data, computing a likelihood that a target object is included in the partial region may be used.
- the extraction unit 2040 extracts one or more partial distributions from the PHD.
- a partial distribution is a probability distribution representing, with respect to a partial region including one target object, an existence probability of a target object with respect to the position and the size of the partial region.
- a partial distribution is a probability distribution, and the integral value thereof is 1.
- the extraction unit 2040 computes the number of target objects included in the image data 10 , based on the PHD. Specifically, the extraction unit 2040 computes the integral value of the PHD and determines the computed integral value to be the number of target objects included in the image data 10 . However, it is conceivable that the integral value of the PHD does not completely match the number of target objects due to an error or the like and is not a natural number. Then, in this case, the extraction unit 2040 handles an approximate value (such as a value acquired by dropping the fractional portion) of the integral value of the PHD as the number of target objects.
- the extraction unit 2040 extracts the computed number of partial distributions from the PHD. For example, the extraction unit 2040 extracts partial distributions from the PHD on the basis of the maximum value of the PHD.
- FIG. 10 is a flowchart illustrating a flow of processing of extracting partial distributions on the basis of the maximum value of the PHD. Loop processing illustrated in the flowchart in FIG. 10 is repeatedly executed while a counter i is less than the integral value S of the PHD. The counter i is initialized to 0 at first and is incremented by 1 every time the loop processing is executed. In this case, the number of partial distributions is a maximum integer equal to or less than S.
- the extraction unit 2040 determines whether the counter i is less than S. When i is less than S, the processing in FIG. 10 advances to S 204 . On the other hand, when i is equal to or greater than S, the processing in FIG. 10 ends.
- the extraction unit 2040 determines a position and a size relating to the maximum value of the PHD (S 204 ).
- the extraction unit 2040 extracts a partial distribution being centered on the position and the size and having the integral value of 1 from the PHD (removes the partial distribution from the PHD) (S 206 ). Since S 208 is the end of the loop processing, the processing returns to S 202 .
- any space clustering technique may also be used as a method of extracting partial distributions from a PHD.
- a PHD can be written as the total sum ⁇ i(Li ⁇ fi) of the output results.
- Hierarchical clustering of computing a distance between positions represented by all output results Li, adding output results at a short distance from each other, and decreasing the total number down to a predetermined number may be adopted.
- the output unit 2060 For each extracted partial distribution, the output unit 2060 outputs a position and a size of a target object represented by the partial distribution (S 110 ). Specifically, the output unit 2060 determines the position and the size of the target object, based on a statistic of the partial distribution. For example, the output unit 2060 determines the mean of the partial distribution to be the position and the size of the target object. In addition, for example, the output unit 2060 may determine a position and a size relating to the maximum value of the partial distribution to be the position and the size of the target object. Then, the output unit 2060 outputs the determined position and size for each partial distribution.
- FIG. 11 is a diagram illustrating a position and a size of a target object determined based on a partial distribution.
- two partial distributions D 1 and D 2 are extracted from a PHD.
- the output unit 2060 determines a position (x1, y1) and a size (w1, h1) of a target object, based on the partial distribution D 1 .
- the output unit 2060 determines a position (x2, y2) and a size (w2, h2) of a target object, based on the partial distribution D 2 .
- each of an image region at the position (x1, y1) with a width w1 and a height h1, and an image region at the position (x2, y2) with a width w2 and a height h2 represents a target object.
- the output unit 2060 outputs a position and a size of a target object in various forms.
- the output unit 2060 stores, into a storage, data (such as a list) indicating, for each target object, a combination of “an identifier assigned to the target object, the position of the target object, and the size of the target object” in association with the image data 10 .
- data such as a list
- any method may be used as a method of assigning an identifier to an object detected from image data.
- the output unit 2060 may output a display (such as frame) indicating a position and a size of a determined target object, the display being superposed on the image data 10 , as illustrated in FIG. 11 .
- the display may be output to any destination and may be output to, for example, a storage and/or a display apparatus.
- the output unit 2060 may further output the number of target objects.
- a computation method of the number of target objects is as described above.
- FIG. 12 is a block diagram illustrating the information processing apparatus 2000 having a function of performing learning by the neural network 20 .
- the learning by the information processing apparatus 2000 is executed by a learning unit 2080 .
- the learning unit 2080 computes a predicted loss between a PHD based on an actual output of the neural network 20 and an ideal PHD.
- the ideal PHD may be expressed as the sum of normal distributions each of which being previously specified with a variance and being centered on a position of a rectangle representing an object being a correct answer. Alternatively, the ideal PHD may be handled as a ⁇ function the variance of which is 0, or another function may be used.
- learning by the neural network 20 is performed based on the predicted loss. More specifically, the learning unit 2080 performs learning by the neural network 20 by updating parameters (a weight value and a bias value) of the neural network 20 by propagating the computed predicted loss in inverse order (back propagating) from an output node in the neural network 20 .
- Various existing methods such as a gradient descent method may be used as a method of performing learning by a neural network by back propagation based on a predicted loss.
- a determination method and a computation method of a predicted loss used in learning by the neural network 20 will be described below.
- the learning unit 2080 computes a PHD relating to an actual output by use of the actual output acquired by inputting image data for learning (hereinafter referred to as learning image data) to the neural network 20 .
- the learning unit 2080 further computes a predicted loss between the PHD relating to the actual output and an ideal PHD predetermined based on the learning image data. For example, the square error between the PHDs may be used as the predicted loss.
- a PHD divided by the integral value can be handled as a probability density function the integral value of which is 1, any technique capable of handling a loss as an error between probability density functions may be used.
- the minus value of the product of an ideal probability density function and a probability density function relating to the actual output may be determined as a loss.
- an error of the integral value may be handled as a loss, or several of the losses may be combined.
- a PHD relating to an actual output can be written as ⁇ i(Li ⁇ fi).
- an ideal PHD can be written as ⁇ j(gj).
- Nj the number of the assigned outputs
- an error between Li for assigned i and (1/Nj) such as the square of (Li ⁇ 1/Nj) may be minimized. This is a technique for learning Li in such a way that the integral values match.
- an ideal PHD indicates a distribution ( ⁇ function) having a likelihood of 1 at a position of the position and the size of the image region and having a variance of 0.
- FIG. 13 is a diagram illustrating an ideal PHD.
- target objects are included in two image regions 40 - 1 and 40 - 2 .
- the position and the size of the image region 40 - 1 are (x1, y1) and (w1, h1), respectively. Therefore, an ideal PHD indicates a ⁇ function with a peak at (x1, y1, w1, h1).
- the position and the size of the image region 40 - 2 are (x2, y2) and (w2, h2), respectively. Therefore, an ideal PHD indicates a ⁇ function with a peak at (x2, y2, w2, h2).
- an ideal PHD relating to learning image data is previously generated by hand and is stored in a storage in association with the learning image data.
- the learning unit 2080 performs learning by the neural network 20 by use of one or more of thus prepared combinations of learning image data and an ideal PHD.
- An information processing apparatus 2000 according to an example embodiment 2 distinctively handles a plurality of types of target objects. To do so, the generation unit 2020 according to the example embodiment 2 generates likelihood data for each of mutually different types of target objects. Therefore, likelihood data are generated for each type of target object for one partial region 12 .
- an extraction unit 2040 generates a PHD for each type of target object. This is achieved by adding up likelihood data for each type of target object. Then, the extraction unit 2040 extracts a partial distribution from each PHD.
- An output unit 2060 outputs a position and a size of a target object relating to each partial distribution. Each partial distribution relates to one type of target object. Then, the output unit 2060 outputs a position and a size of a target object relating to a partial distribution along with the type of the target object.
- the information processing apparatus 2000 includes a neural network 20 for each type of target object.
- Each neural network 20 previously performs learning in such a way as to detect a relating type of target object.
- an ideal PHD is set to indicate a likelihood of 1 for a position and a size of an image region representing a human in learning image data and indicate a likelihood of 0 for a position and a size of another image region (an image region in which an object does not exist or an object other than a human exists).
- a learning unit 2080 causes a neural network 20 for detecting a certain type of target object to perform learning by use of a combination of “learning image data and an ideal PHD for the type of target object.”
- FIG. 4 a hardware configuration of a computer providing the information processing apparatus 2000 according to the example embodiment 2 is illustrated by FIG. 4 , similarly to the example embodiment 1.
- a storage device 1080 in a computer 1000 providing the information processing apparatus 2000 according to the present example embodiment further stores a program module providing the function of the information processing apparatus 2000 according to the present example embodiment.
- the information processing apparatus 2000 can detect a target object for each type thereof. Accordingly, positions of mutually different types of target objects can be recognized including the types thereof.
Abstract
An information processing apparatus (2000) generates likelihood data for each of a plurality of partial regions (12) in image data (10). The likelihood data are data being associated with a position and a size on the image data (10) and indicating a likelihood that a target object exists in an image region at the position with the size. The information processing apparatus (2000) computes a distribution (probability hypothesis density: PHD) of an existence likelihood of a target object with respect to a position and a size by computing the total sum of likelihood data each piece of which is generated for each partial region (12). The information processing apparatus (2000) extracts, from the PHD, partial distributions each of which relates to one target object. For each extracted partial distribution, the information processing apparatus (2000) outputs a position and a size of a target object represented by the partial distribution, based on a statistic of the partial distribution.
Description
- The present application is a continuation application of U.S. patent application Ser. No. 17/059,678 filed on Nov. 30, 2020, which is a National Stage Entry of PCT/JP2018/021207 filed on Jun. 1, 2018, the contents of all of which are incorporated herein by reference, in their entirety.
- The present invention relates to a technology of detecting an object from an image.
- Technologies of detecting an object from image data have been developed. For example,
Patent Document 1 discloses a technology of performing object detection by use of a deep neural network. A system inPatent Document 1 generates a feature map of image data by use of a convolutional neural network and, by inputting the generated feature map to a neural network called a region proposal network (RPN), outputs many proposals of rectangular regions (region proposals) each of which including an object. The system further estimates a class of an object included in a region proposal by performing classification in a layer called a box-classification layer. The system also adjusts a position and a size of a region proposal by performing regression in a layer called a box-regression convolutional layer. - Further, a system in
Non Patent Document 1 generates a plurality of feature maps by use of a convolutional neural network and outputs many object proposals from each feature map. The each object proposal includes rectangular coordinates and a likelihood of an object class. - Many erroneous outputs not being correct answers are included in the aforementioned outputs in both the technique in
Patent Document 1 and the technique inNon Patent Document 1. Therefore, a detection result to be finally output is acquired out of many object proposals by performing processing of reducing neighboring and significantly overlapping region proposals, the processing being called non-maximum suppression. -
- [Patent Document 1] United States Patent Application Publication No. 2017/0206431, Specification
-
- [Non Patent Document 1] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg, “Single Shot MultiBox Detector,” ECCV 2016
- In
Patent Document 1 andNon Patent Document 1, a case of significantly overlapping objects is eliminated as erroneous detection, and therefore a case of significant overlap is conversely not considered; and it is conceivable that a plurality of overlapping objects are erroneously detected as a single object in such a case. - The present invention has been made in view of the aforementioned problem and provides a technology capable of distinctively detecting objects even when the objects overlap each another in image data.
- An information processing apparatus according to the present invention includes: 1) a generation unit configured to acquire image data and generate likelihood data representing a likelihood of existence of a target object with respect to a position and a size for each of a plurality of partial regions included in the image data; 2) an extraction unit configured to compute a distribution of a likelihood of existence of a target object with respect to a position and a size by computing a total sum of likelihood data each piece of which is generated for each partial region and extract, from the computed distribution, one or more partial distributions each of which relates to one target object; and 3) an output unit configured to, for each extracted partial distribution, output a position and a size of a target object relating to the partial distribution, based on a statistic of the partial distribution.
- A control method according to the present invention is executed by a computer. The control method includes: 1) a generation step of acquiring image data and generating likelihood data representing a likelihood of existence of a target object with respect to a position and a size for each of a plurality of partial regions included in the image data; 2) an extraction step of computing a distribution of a likelihood of existence of a target object with respect to a position and a size by computing a total sum of likelihood data each piece of which is generated for each partial region and extracting, from the computed distribution, one or more partial distributions each of which relates to one target object; and 3) an output step of, for each extracted partial distribution, outputting a position and a size of a target object relating to the partial distribution, based on a statistic of the partial distribution.
- A program according to the present invention causes a computer to execute each step included in the control method according to the present invention.
- The present invention provides a technology capable of distinctively detecting objects even when the objects overlap each another in image data.
- The aforementioned object, other objects, features and advantages will become more apparent by use of the following preferred example embodiments and accompanying drawings.
-
FIG. 1 is a diagram conceptually illustrating processing performed by an information processing apparatus according to theexample embodiment 1. -
FIG. 2 is a diagram illustrating image data including target objects significantly overlapping each other. -
FIG. 3 is a diagram illustrating a functional configuration of the information processing apparatus according to theexample embodiment 1. -
FIG. 4 is a diagram illustrating a computer for providing the information processing apparatus. -
FIG. 5 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to theexample embodiment 1. -
FIG. 6 is a diagram illustrating a method of extracting a partial region by use of a sliding window. -
FIG. 7 is a diagram illustrating a neural network used for generation of likelihood data. -
FIG. 8 is a diagram conceptually illustrating likelihood data generated based on a likelihood Li. -
FIG. 9 is a diagram illustrating a neural network outputting parameters of a normal distribution indicated by likelihood data. -
FIG. 10 is a flowchart illustrating a flow of processing of extracting a partial distribution on the basis of the maximum value of a PHD. -
FIG. 11 is a diagram illustrating a position and a size of a target object determined based on a partial distribution. -
FIG. 12 is a block diagram illustrating an information processing apparatus having a function of learning by a neural network. -
FIG. 13 is a diagram illustrating an ideal PHD. - Example embodiments of the present invention will be described below by use of drawings. Note that, in all drawings, a similar sign is given to similar components, and description thereof is omitted as appropriate. Further, each block in each block diagram represents a function-based configuration rather than a hardware-based configuration unless otherwise described.
-
FIG. 1 is a diagram conceptually illustrating processing performed by aninformation processing apparatus 2000 according to the present example embodiment. Theinformation processing apparatus 2000 acquiresimage data 10 and detects a target object from theimage data 10. Detection of a target object means determination of a position and a size of an image region (such as a circumscribed rectangle) including the target object from theimage data 10. Any object may be handled as a target object, or only a specific type of object (such as only a human) may be handled as a target object. - The
information processing apparatus 2000 detects an object by a method described below. First, theinformation processing apparatus 2000 generates parameters representing likelihood data for each of a plurality ofpartial regions 12 in theimage data 10. The likelihood data are data being associated with a position and a size on theimage data 10 and indicating a distribution of a likelihood that a target object exists in an image region at the position with the size. Specifically, denoting a predetermined probability density function the integral of which is 1 as f and a generated parameter as L, likelihood data is expressed by L×f. - For example, a normal distribution the position and the variance of which vary for each partial region may be used as the probability density function f, or a δ function may be used for expressing existence at a specific position only, or another probability density function may be adopted. Note that a δ function represents a function taking infinity only at a specific value, taking 0 at the other values, and having an integral value of 1.
- The integral value of the likelihood data L×f matches the value of the generated parameter L. The likelihood data in
FIG. 1 indicate such a distribution. Further details of the likelihood data will be described later. - The
information processing apparatus 2000 computes a distribution of an existence likelihood of a target object with respect to a position and a size by computing the total sum of likelihood data each piece of which is generated for eachpartial region 12. The distribution is a so-called probability hypothesis density (PHD). The PHD is a distribution function having a characteristic that the integrated value matches the number of existing objects. Theinformation processing apparatus 2000 extracts, from the PHD, partial distributions each of which relates to one target object (hereinafter referred to as partial distributions). Ideally, each of the partial distributions is extracted in such a way that the integral value thereof is 1, and each partial distribution relates to one target object. - Three partial distributions are extracted from a PHD in
FIG. 1 . The integrated value of the PHD is 3, and partial distributions are extracted in such a way that the integral of each partial distributions is 1. Note that while the three partial distributions are extracted in such a way as not to overlap each other inFIG. 1 , the partial distributions may be extracted in such a way as to overlap each other. For example, while each integral value becomes 1 when a shape of a partial distribution is limited to a normal distribution, the partial distributions may be determined in such a way as to minimize the error between the sum of the partial distributions and the PHD. Alternatively, each partial distribution may be limited to a normal distribution×a weight. The integral value matches the weight in the case of the limitation, and therefore the partial distributions may be determined in such a way as to minimize the total sum of the error between the sum of the partial distributions and the PHD, and the error between the weight value and 1. Alternatively, a distribution other than a normal distribution may be adopted as a limited distribution shape. - For each extracted partial distribution, the
information processing apparatus 2000 outputs a position and a size of a target object represented by the partial distribution, based on a statistic such as the mean of the partial distribution. For example, a position of a target object is represented by coordinates of a predetermined position (such as an upper-left corner) of a circumscribed rectangle representing the target object. For example, a size of a target object can be represented by a width and a height of a rectangular region representing the target object. - Note that while each distribution illustrated in
FIG. 1 is depicted two-dimensionally (horizontal axis: position/size×vertical axis: likelihood) for convenience of illustration, the distribution is actually a distribution on a three-or-more-dimensional space. For example, it is assumed that a position of an image region is represented by coordinates, the shape of the image region is a rectangle, and the size of the rectangle is represented by a width and a height. In this case, each distribution illustrated inFIG. 1 is expressed on a five-dimensional (X coordinate, Y coordinate, width, height×likelihood) space. - As described above, the
information processing apparatus 2000 according to the present example embodiment detects a target object by a method of computing a PHD by adding up likelihood data each piece of which is computed for each partial region, and extracting a partial distribution representing one target object. The method enables highly precise distinction even between significantly overlapping target objects and detection of the target objects as separate target objects. The reason will be described below with reference toFIG. 2 . -
FIG. 2 is a diagram illustratingimage data 10 including significantly overlapping target objects. Theimage data 10 is a captured image of a scene in which two persons pass each other. When persons are correctly detected from theimage data 10, two persons are detected. However, it is difficult to distinctively detect persons being significantly overlapping objects by existing techniques, and the probability of the two persons being collectively detected as one person is high. - With regard to this point, the
information processing apparatus 2000 according to the present example embodiment generates a PHD acquired by adding up likelihood data each piece of which is generated for eachpartial region 12. The integrated value in any section of the PHD represents the number of target objects in the section. Thus, in theinformation processing apparatus 2000, information about the number of target objects is included in a PHD being information acquired by integrating information acquired from eachpartial region 12. By thus checking an integral value of a PHD including information about the number of target objects, each target object can be precisely detected even from image data including significantly overlapping target objects. - Specifically, a partial distribution the integral value of which is 1 is extracted from a PHD. This enables separation of significantly overlapping target objects and acquisition of a probability distribution of a position and a size of an image region relating to each target object. For example, a shaded partial distribution and a dotted partial distribution are extracted from a PHD in
FIG. 2 . Then, by determining a position and a size of a target object for each extracted partial distribution, each target object can be detected. - Note that the aforementioned description with reference to
FIG. 1 andFIG. 2 is an exemplification for ease of understanding of theinformation processing apparatus 2000 and does not limit the functions of theinformation processing apparatus 2000. Theinformation processing apparatus 2000 according to the present example embodiment will be described in more detail below. - Example of Functional Configuration of
Information Processing Apparatus 2000FIG. 3 is a diagram illustrating a functional configuration of theinformation processing apparatus 2000 according to theexample embodiment 1. Theinformation processing apparatus 2000 includes ageneration unit 2020, anextraction unit 2040, and anoutput unit 2060. Thegeneration unit 2020 acquiresimage data 10 and generates likelihood data for each of a plurality ofpartial regions 12 included in theimage data 10. Theextraction unit 2040 computes a PHD by computing the total sum of likelihood data each piece of which is generated for eachpartial region 12. Theextraction unit 2040 extracts, from the computed PHD, one or more partial distributions each of which relates to one target object. For each extracted partial distribution, theoutput unit 2060 outputs a position and a size of a target object represented by the partial distribution, based on a statistic of the partial distribution. - Each functional configuration unit in the
information processing apparatus 2000 may be provided by hardware (such as a hardwired electronic circuit) providing each functional configuration unit or may be provided by a combination of hardware and software (such as a combination of an electronic circuit and a program controlling the circuit). The case of each functional configuration unit in theinformation processing apparatus 2000 being provided by a combination of hardware and software will be further described below. -
FIG. 4 is a diagram illustrating acomputer 1000 for providing theinformation processing apparatus 2000. Thecomputer 1000 may be any computer. Examples of thecomputer 1000 include a personal computer (PC) and a server machine. Thecomputer 1000 may be a dedicated computer designed for providing theinformation processing apparatus 2000 or may be a general-purpose computer. - The
computer 1000 includes abus 1020, aprocessor 1040, amemory 1060, astorage device 1080, an input-output interface 1100, and anetwork interface 1120. Thebus 1020 is a data transmission channel for theprocessor 1040, thememory 1060, thestorage device 1080, the input-output interface 1100, and thenetwork interface 1120 to mutually transmit and receive data. However, a method of connecting theprocessor 1040 and the like to each another is not limited to the bus connection. - The
processor 1040 includes various processors such as a central processing unit (CPU), a graphics processing unit (GPU), and a field-programmable gate array (FPGA). Thememory 1060 is a main storage provided by use of a random access memory (RAM) and/or the like. Thestorage device 1080 is an auxiliary storage provided by use of a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), and/or the like. - The input-
output interface 1100 is an interface for connecting thecomputer 1000 to an input/output device. For example, the input-output interface 1100 is connected to an input apparatus such as a keyboard and an output apparatus such as a display apparatus. Thenetwork interface 1120 is an interface for connecting thecomputer 1000 to a communication network. Examples of the communication network include a local area network (LAN) and a wide area network (WAN). A method of connecting thenetwork interface 1120 to the communication network may be a wireless connection or a wired connection. - The
storage device 1080 stores a program module providing each functional configuration unit in theinformation processing apparatus 2000. Theprocessor 1040 provides a function relating to each program module by reading the program module into thememory 1060 and executing the program module. -
FIG. 5 is a flowchart illustrating a flow of processing executed by theinformation processing apparatus 2000 according to theexample embodiment 1. Thegeneration unit 2020 acquires image data 10 (S102). Thegeneration unit 2020 generates likelihood data for each of a plurality ofpartial regions 12 included in the image data 10 (S104). Theextraction unit 2040 computes a PHD by adding up likelihoods represented by the likelihood data (S106). Theextraction unit 2040 extracts one or more partial distributions from the PHD (S108). For each partial distribution, theoutput unit 2060 outputs a position and a size of a target object relating to the partial distribution (S110). - The
information processing apparatus 2000 may execute a series of processes illustrated inFIG. 5 in response to any trigger. For example, theinformation processing apparatus 2000 executes the aforementioned series of processes in response to input of theimage data 10. In addition, for example, theinformation processing apparatus 2000 may execute the aforementioned series of processes in response to a predetermined input operation by a user. - The
generation unit 2020 acquires image data 10 (S102). Any image data may be used as theimage data 10. For example, theimage data 10 are a captured image generated by a camera. The camera may be a still camera or a video camera. Note that “a captured image generated by a camera” may be a captured image generated by a camera itself or an image acquired by applying some processing on a captured image generated by a camera. - When a captured image is used as the
image data 10, theinformation processing apparatus 2000 may be provided inside a camera generating theimage data 10. For example, by providing theinformation processing apparatus 2000 inside a surveillance camera, an object can be detected in real time from a surveillance video generated by the surveillance camera. For example, types of camera called an intelligent camera, an Internet Protocol (IP) camera, and a network camera can be used as a camera incorporating the function of theinformation processing apparatus 2000. - The
generation unit 2020 may acquireimage data 10 by any method. For example, thegeneration unit 2020 acquiresimage data 10 from a storage storing theimage data 10. The storage storing theimage data 10 may be provided inside theinformation processing apparatus 2000 or may be provided outside. In addition, for example, theinformation processing apparatus 2000 acquiresimage data 10 input by an input operation by a user. In addition, for example, thegeneration unit 2020 acquiresimage data 10 by receiving theimage data 10 transmitted by another apparatus. - A
partial region 12 is a partial image region included in theimage data 10. Apartial region 12 is different from anotherpartial region 12 with respect to at least either one of a position and a size. - The
generation unit 2020 extracts eachpartial region 12 included in theimage data 10 and, by analyzing the extractedpartial region 12, generates likelihood data for thepartial region 12. For example, apartial region 12 can be extracted by use of a sliding window.FIG. 6 is a diagram illustrating a method of extracting apartial region 12 by use of a sliding window. Theinformation processing apparatus 2000 moves a sliding window with a predetermined size (width: Ws, height: Hs) at a predetermined stride d. A plurality of image regions with different sizes are extracted from the sliding window at various positions and each image region is handled as apartial region 12. Thus,partial regions 12 with varying positions and sizes can be extracted. Note that, for example, a technique using an Anchor box disclosed inPatent Document 1 can be used to extract as a thuspartial region 12. - A
partial region 12 may be extracted from a feature map generated from the image data instead of being directly extracted from theimage data 10. In this case, for example, aneural network 20 to be described later is constituted of a layer for extracting a feature map from the image data 10 (such as a convolutional layer in a convolutional neural network) and a layer for extracting apartial region 12 from a feature map output from the layer and generating likelihood data. - A shape of a
partial region 12 is not necessarily limited to a rectangle. For example, when a shape of apartial region 12 is a perfect circle, thepartial region 12 can be represented by center coordinates and a length of a radius. Further, when apartial region 12 is represented by a set of vertices, a polygon in any shape can be handled as apartial region 12. In this case, both a position and a size of thepartial region 12 is determined by a set of vertices of thepartial region 12. - The
generation unit 2020 generates parameters representing likelihood data for each of a plurality ofpartial regions 12 included in theimage data 10 and generates likelihood data (S104). For example, parameters representing likelihood data are generated by use of a neural network.FIG. 7 is a diagram illustrating a neural network used for generation of parameters representing likelihood data. In response to input of theimage data 10, aneural network 20 outputs, for eachpartial region 12 included in theimage data 10, a likelihood Li that a target object exists in an image region with the position and the size of thepartial region 12. Li is a likelihood output for an i-thpartial region 12. - For example, the
generation unit 2020 sets a distribution determined based on a likelihood Li as likelihood data. -
FIG. 8 is a diagram conceptually illustrating likelihood data generated based on a likelihood Li. In the upper part ofFIG. 8 , likelihood data represent a distribution having a variance of 0 and being generated based on a likelihood Li. The distribution is expressed as Li×δ function by use a δ function. - On the other hand, likelihood data in the lower part of
FIG. 8 represent a distribution with a nonzero variance. For example, a distribution conforming to a predetermined model such as a normal distribution is predetermined as a distribution as a reference (hereinafter referred to as a reference distribution). When a normal distribution is used, for example, a reference distribution may be determined as a distribution having 1 as the integral value, the position and the size of thepartial region 12 as the mean, and a predetermined value as the variance. Any value may be set to the variance. - The
generation unit 2020 generates likelihood data by multiplying a reference distribution by a likelihood Li. For example, in the lower part ofFIG. 8 , a reference distribution model is a normal distribution. Then, based on the position (xi, yi) of thepartial region 12 and the size (wi, hi) of thepartial region 12, the mean of the reference distribution is (xi, yi, wi, hi). Further, the variance of the reference distribution is vi. From the above, the reference distribution is N[(xi, yi, wi, hi), vi]. Furthermore, a likelihood output from theneural network 20 is Li. Then, thegeneration unit 2020 generates a distribution indicating the likelihood data by multiplying the reference distribution by Li. The integral value of a distribution of the acquired likelihood data is Li. - A reference distribution conforming to a distribution model may not be predetermined, and parameters of a distribution model may be output from the
neural network 20. For example, when a normal distribution is used, parameters of a distribution model are the aforementioned mean and variance. Then, theneural network 20 outputs a mean and a variance for eachpartial region 12. -
FIG. 9 is a diagram illustrating theneural network 20 outputting parameters of a normal distribution indicated by likelihood data. InFIG. 9 , “a likelihood Li, (xiu, yiu, wiu, hiu) representing the mean of a normal distribution, and the variance vi of the normal distribution” are output for eachpartial region 12. Then, by multiplying the normal distribution determined by the mean and the variance output from theneural network 20 by the likelihood Li for eachpartial region 12, thegeneration unit 2020 generates a distribution indicated by the likelihood data. - The position (xi, yi) output from the
neural network 20 may be different from the original position of a relating i-thpartial region 12. Similarly, the size (wi, hi) output from theneural network 20 may be different from the original size of the relating i-thpartial region 12. The reason is that, as will be described later, theneural network 20 adjusts and outputs the position and the size of thepartial region 12 in such a way as to increase a likelihood that a target object is included in thepartial region 12 by causing theneural network 20 to perform learning in such a way as to output an ideal PHD. - Note that the
neural network 20 does not necessarily output all parameters of the distribution model and may output only part of the parameters. For example, the mean of the normal distribution is output from theneural network 20, and a predetermined value is used as the variance. - In order to make the
neural network 20 perform the operation described above, it is necessary to cause theneural network 20 to previously perform learning in such a way that such an operation is performed. A learning method of theneural network 20 will be described later. Note that any structure may be used as an internal structure (such as the number and an order of layers, a type of each layer, and a connection relation between the layers) of the neural network. For example, the same structure as that of the region proposal network (RPN) described inPatent Document 1 may be adopted as the structure of theneural network 20. Alternatively, the network described inNon Patent Document 1 may be used. - Note that generation of likelihood data does not necessarily need to be performed by use of a neural network, and another existing technique of, for each of a plurality of partial regions in image data, computing a likelihood that a target object is included in the partial region may be used.
- The
extraction unit 2040 extracts one or more partial distributions from the PHD. A partial distribution is a probability distribution representing, with respect to a partial region including one target object, an existence probability of a target object with respect to the position and the size of the partial region. A partial distribution is a probability distribution, and the integral value thereof is 1. - First, the
extraction unit 2040 computes the number of target objects included in theimage data 10, based on the PHD. Specifically, theextraction unit 2040 computes the integral value of the PHD and determines the computed integral value to be the number of target objects included in theimage data 10. However, it is conceivable that the integral value of the PHD does not completely match the number of target objects due to an error or the like and is not a natural number. Then, in this case, theextraction unit 2040 handles an approximate value (such as a value acquired by dropping the fractional portion) of the integral value of the PHD as the number of target objects. - The
extraction unit 2040 extracts the computed number of partial distributions from the PHD. For example, theextraction unit 2040 extracts partial distributions from the PHD on the basis of the maximum value of the PHD.FIG. 10 is a flowchart illustrating a flow of processing of extracting partial distributions on the basis of the maximum value of the PHD. Loop processing illustrated in the flowchart inFIG. 10 is repeatedly executed while a counter i is less than the integral value S of the PHD. The counter i is initialized to 0 at first and is incremented by 1 every time the loop processing is executed. In this case, the number of partial distributions is a maximum integer equal to or less than S. - In S202, the
extraction unit 2040 determines whether the counter i is less than S. When i is less than S, the processing inFIG. 10 advances to S204. On the other hand, when i is equal to or greater than S, the processing inFIG. 10 ends. - The
extraction unit 2040 determines a position and a size relating to the maximum value of the PHD (S204). Theextraction unit 2040 extracts a partial distribution being centered on the position and the size and having the integral value of 1 from the PHD (removes the partial distribution from the PHD) (S206). Since S208 is the end of the loop processing, the processing returns to S202. - In addition to the method illustrated in
FIG. 10 , any space clustering technique may also be used as a method of extracting partial distributions from a PHD. For example, denoting each output result as Li and a preset probability density function as fi, a PHD can be written as the total sum Σi(Li×fi) of the output results. Hierarchical clustering of computing a distance between positions represented by all output results Li, adding output results at a short distance from each other, and decreasing the total number down to a predetermined number may be adopted. At this time, since it is desirable that Li be as close to 1 as possible, for example, processing of, when adding an output i and an output i′, comparing “the square mean of (1−Li) and (1−Li′)” with “the square of the difference between Li+Li′ and 1” and not performing the addition processing when the former is smaller may be performed. Alternatively, various clustering techniques may be performed and a result with the minimum square sum of (1−Li) may be selected. - For each extracted partial distribution, the
output unit 2060 outputs a position and a size of a target object represented by the partial distribution (S110). Specifically, theoutput unit 2060 determines the position and the size of the target object, based on a statistic of the partial distribution. For example, theoutput unit 2060 determines the mean of the partial distribution to be the position and the size of the target object. In addition, for example, theoutput unit 2060 may determine a position and a size relating to the maximum value of the partial distribution to be the position and the size of the target object. Then, theoutput unit 2060 outputs the determined position and size for each partial distribution. -
FIG. 11 is a diagram illustrating a position and a size of a target object determined based on a partial distribution. InFIG. 11 , two partial distributions D1 and D2 are extracted from a PHD. Theoutput unit 2060 determines a position (x1, y1) and a size (w1, h1) of a target object, based on the partial distribution D1. Similarly, theoutput unit 2060 determines a position (x2, y2) and a size (w2, h2) of a target object, based on the partial distribution D2. From the above, each of an image region at the position (x1, y1) with a width w1 and a height h1, and an image region at the position (x2, y2) with a width w2 and a height h2 represents a target object. - The
output unit 2060 outputs a position and a size of a target object in various forms. For example, theoutput unit 2060 stores, into a storage, data (such as a list) indicating, for each target object, a combination of “an identifier assigned to the target object, the position of the target object, and the size of the target object” in association with theimage data 10. Note that any method may be used as a method of assigning an identifier to an object detected from image data. - In addition, for example, the
output unit 2060 may output a display (such as frame) indicating a position and a size of a determined target object, the display being superposed on theimage data 10, as illustrated inFIG. 11 . The display may be output to any destination and may be output to, for example, a storage and/or a display apparatus. - Note that the
output unit 2060 may further output the number of target objects. A computation method of the number of target objects is as described above. - As described above, learning by the
neural network 20 needs to be performed in advance. The learning by theneural network 20 may be performed by theinformation processing apparatus 2000 or may be performed by an apparatus other than theinformation processing apparatus 2000. The description herein assumes that theinformation processing apparatus 2000 performs the learning by theneural network 20.FIG. 12 is a block diagram illustrating theinformation processing apparatus 2000 having a function of performing learning by theneural network 20. The learning by theinformation processing apparatus 2000 is executed by alearning unit 2080. - The
learning unit 2080 computes a predicted loss between a PHD based on an actual output of theneural network 20 and an ideal PHD. The ideal PHD may be expressed as the sum of normal distributions each of which being previously specified with a variance and being centered on a position of a rectangle representing an object being a correct answer. Alternatively, the ideal PHD may be handled as a δ function the variance of which is 0, or another function may be used. Next, learning by theneural network 20 is performed based on the predicted loss. More specifically, thelearning unit 2080 performs learning by theneural network 20 by updating parameters (a weight value and a bias value) of theneural network 20 by propagating the computed predicted loss in inverse order (back propagating) from an output node in theneural network 20. Various existing methods such as a gradient descent method may be used as a method of performing learning by a neural network by back propagation based on a predicted loss. A determination method and a computation method of a predicted loss used in learning by theneural network 20 will be described below. - The
learning unit 2080 computes a PHD relating to an actual output by use of the actual output acquired by inputting image data for learning (hereinafter referred to as learning image data) to theneural network 20. Thelearning unit 2080 further computes a predicted loss between the PHD relating to the actual output and an ideal PHD predetermined based on the learning image data. For example, the square error between the PHDs may be used as the predicted loss. Alternatively, since a PHD divided by the integral value can be handled as a probability density function the integral value of which is 1, any technique capable of handling a loss as an error between probability density functions may be used. For example, the minus value of the product of an ideal probability density function and a probability density function relating to the actual output may be determined as a loss. Alternatively, an error of the integral value may be handled as a loss, or several of the losses may be combined. - As a more specific example, denoting each output result as Li and a preset probability density function as fi, a PHD relating to an actual output can be written as Σi(Li×fi). Further, denoting a position of a rectangle of each object being a correct answer as yj and a distribution as a basis for computing a PHD as gj, an ideal PHD can be written as Σj(gj). As a technique of minimizing an error between the two, one or a plurality of neighboring outputs i are previously assigned to each correct answer j. Denoting the number of the assigned outputs as Nj, an error between Li for assigned i and (1/Nj), such as the square of (Li−1/Nj) may be minimized. This is a technique for learning Li in such a way that the integral values match.
- With respect to each image region in which a target object exists in learning image data, an ideal PHD indicates a distribution (δ function) having a likelihood of 1 at a position of the position and the size of the image region and having a variance of 0.
FIG. 13 is a diagram illustrating an ideal PHD. In learningimage data 30 inFIG. 13 , target objects are included in two image regions 40-1 and 40-2. The position and the size of the image region 40-1 are (x1, y1) and (w1, h1), respectively. Therefore, an ideal PHD indicates a δ function with a peak at (x1, y1, w1, h1). Further, the position and the size of the image region 40-2 are (x2, y2) and (w2, h2), respectively. Therefore, an ideal PHD indicates a δ function with a peak at (x2, y2, w2, h2). - For example, an ideal PHD relating to learning image data is previously generated by hand and is stored in a storage in association with the learning image data. The
learning unit 2080 performs learning by theneural network 20 by use of one or more of thus prepared combinations of learning image data and an ideal PHD. - An
information processing apparatus 2000 according to anexample embodiment 2 distinctively handles a plurality of types of target objects. To do so, thegeneration unit 2020 according to theexample embodiment 2 generates likelihood data for each of mutually different types of target objects. Therefore, likelihood data are generated for each type of target object for onepartial region 12. - Further, an
extraction unit 2040 according to theexample embodiment 2 generates a PHD for each type of target object. This is achieved by adding up likelihood data for each type of target object. Then, theextraction unit 2040 extracts a partial distribution from each PHD. - An
output unit 2060 according to theexample embodiment 2 outputs a position and a size of a target object relating to each partial distribution. Each partial distribution relates to one type of target object. Then, theoutput unit 2060 outputs a position and a size of a target object relating to a partial distribution along with the type of the target object. - When the
information processing apparatus 2000 is provided by use of aneural network 20, for example, theinformation processing apparatus 2000 includes aneural network 20 for each type of target object. Eachneural network 20 previously performs learning in such a way as to detect a relating type of target object. For example, as for aneural network 20 handling a human as a target object, an ideal PHD is set to indicate a likelihood of 1 for a position and a size of an image region representing a human in learning image data and indicate a likelihood of 0 for a position and a size of another image region (an image region in which an object does not exist or an object other than a human exists). - Consequently, an ideal PHD is prepared for each type of target object for learning image data. A
learning unit 2080 causes aneural network 20 for detecting a certain type of target object to perform learning by use of a combination of “learning image data and an ideal PHD for the type of target object.” - For example, a hardware configuration of a computer providing the
information processing apparatus 2000 according to theexample embodiment 2 is illustrated byFIG. 4 , similarly to theexample embodiment 1. However, astorage device 1080 in acomputer 1000 providing theinformation processing apparatus 2000 according to the present example embodiment further stores a program module providing the function of theinformation processing apparatus 2000 according to the present example embodiment. - The
information processing apparatus 2000 according to the present example embodiment can detect a target object for each type thereof. Accordingly, positions of mutually different types of target objects can be recognized including the types thereof. - While the example embodiments of the present invention has been described above with reference to the drawings, the drawings are exemplifications of the present invention; and various configurations other than the above may be adopted.
Claims (11)
1. An information processing apparatus comprising:
at least one memory configured to store instructions; and
at least one processor configured to execute the instructions to perform:
training a neural network by use of one or more combinations of prepared learning image data and an ideal PHD for each of mutually different types of target objects;
acquiring image data;
generating likelihood data for each of a plurality of partial regions included in the image data by inputting the acquired image data to the trained neural network;
computing a distribution of a likelihood of existence of the target objects with respect to a position and a size by computing a total sum of the likelihood data, and extracting, from the computed distribution, one or more partial distributions each of which relates to one target object; and
outputting, for each of the one or more partial distribution, a position and a size of the one target object relating to the partial distribution, based on a statistic of the partial distribution.
2. The information processing apparatus according to claim 1 , wherein
the likelihood data is represented by a distribution conforming to a predetermined model, and
for the each partial region, the trained neural network outputs a likelihood that a target object exists in the partial region and a parameter value of the predetermined model.
3. The information processing apparatus according to claim 1 , wherein
the at least one processor is configured to execute the instructions to perform:
computing a number of target objects included in the image data, based on an integral value of the distribution represented by the total sum of the likelihood data, and
extracting as many as the number of the partial distributions from the distribution represented by the total sum of the likelihood data.
4. The information processing apparatus according to claim 1 , wherein
the at least one processor is configured to execute the instructions to perform:
extracting the partial distributions an integral value of each of which is 1 from the distribution represented by the total sum of the likelihood data.
5. The information processing apparatus according to claim 1 , wherein
the at least one processor is configured to execute the instructions to perform:
generating the likelihood data for each of mutually different types of the target objects;
computing, for each of mutually different types of the target objects, a distribution of a likelihood of existence of the target objects and extracting the partial distribution from the distribution; and
outputting a position and a size of a target object relating to the each partial distribution along with a type of the target objects relating to the partial distribution.
6. A control method executed by at least one computer, the control method comprising:
training a neural network by use of one or more combinations of prepared learning image data and an ideal PHD for each of mutually different types of target objects;
acquiring image data;
generating likelihood data for each of a plurality of partial regions included in the image data by inputting the acquired image data to the trained neural network;
computing a distribution of a likelihood of existence of the target objects with respect to a position and a size by computing a total sum of the likelihood data, and extracting, from the computed distribution, one or more partial distributions each of which relates to one target object; and
outputting, for each of the one or more partial distribution, a position and a size of the one target object relating to the partial distribution, based on a statistic of the partial distribution.
7. The control method according to claim 6 , wherein,
the control method comprises:
the likelihood data is represented by a distribution conforming to a predetermined model, and
for the each partial region, the trained neural network outputs a likelihood that a target object exists in the partial region and a parameter value of the predetermined model.
8. The control method according to claim 6 , wherein
the control method comprises:
computing a number of target objects included in the image data, based on an integral value of the distribution represented by the total sum of the likelihood data; and
extracting as many as the number of the partial distributions from a distribution represented by the total sum of the likelihood data.
9. The control method according to claim 6 , wherein
the control method comprises:
extracting the partial distributions an integral value of each of which is 1 from a distribution represented by the total sum of the likelihood data.
10. The control method according to claim 6 , wherein
the control method comprises:
generating the likelihood data for each of mutually different types of the target objects;
computing, for each of mutually different types of the target objects, a distribution of a likelihood of existence of the target objects and extracting the partial distribution from the distribution; and
outputting a position and a size of a target object relating to the each partial distribution along with a type of the target objects relating to the partial distribution.
11. A non-transitory recording medium storing a program causing at least one computer to execute:
training a neural network by use of one or more combinations of prepared learning image data and an ideal PHD for each of mutually different types of target objects;
acquiring image data;
generating likelihood data for each of a plurality of partial regions included in the image data by inputting the acquired image data to the trained neural network;
computing a distribution of a likelihood of existence of the target objects with respect to a position and a size by computing a total sum of the likelihood data, and extracting, from the computed distribution, one or more partial distributions each of which relates to one target object; and
outputting, for each of the one or more partial distribution, a position and a size of the one target object relating to the partial distribution, based on a statistic of the partial distribution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/227,699 US20230368033A1 (en) | 2018-06-01 | 2023-07-28 | Information processing device, control method, and program |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2018/021207 WO2019229979A1 (en) | 2018-06-01 | 2018-06-01 | Information processing device, control method, and program |
US202017059678A | 2020-11-30 | 2020-11-30 | |
US18/227,699 US20230368033A1 (en) | 2018-06-01 | 2023-07-28 | Information processing device, control method, and program |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/059,678 Continuation US20210209396A1 (en) | 2018-06-01 | 2018-06-01 | Information processing device, control method, and program |
PCT/JP2018/021207 Continuation WO2019229979A1 (en) | 2018-06-01 | 2018-06-01 | Information processing device, control method, and program |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230368033A1 true US20230368033A1 (en) | 2023-11-16 |
Family
ID=68696866
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/059,678 Pending US20210209396A1 (en) | 2018-06-01 | 2018-06-01 | Information processing device, control method, and program |
US18/227,699 Pending US20230368033A1 (en) | 2018-06-01 | 2023-07-28 | Information processing device, control method, and program |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/059,678 Pending US20210209396A1 (en) | 2018-06-01 | 2018-06-01 | Information processing device, control method, and program |
Country Status (3)
Country | Link |
---|---|
US (2) | US20210209396A1 (en) |
JP (1) | JP7006782B2 (en) |
WO (1) | WO2019229979A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020240809A1 (en) * | 2019-05-31 | 2020-12-03 | 楽天株式会社 | Learning device, classification device, learning method, classification method, learning program, and classification program |
JP2021103347A (en) * | 2019-12-24 | 2021-07-15 | キヤノン株式会社 | Information processing device, information processing method and program |
WO2024024048A1 (en) * | 2022-07-28 | 2024-02-01 | 日本電信電話株式会社 | Object detection device, object detection method, and object detection program |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5687082B2 (en) * | 2011-01-31 | 2015-03-18 | セコム株式会社 | Moving object tracking device |
JP5841390B2 (en) * | 2011-09-30 | 2016-01-13 | セコム株式会社 | Moving object tracking device |
US9946935B2 (en) * | 2013-07-17 | 2018-04-17 | Nec Corporation | Object tracking device, object tracking method, and object tracking program |
KR20150051711A (en) * | 2013-11-05 | 2015-05-13 | 한국전자통신연구원 | Apparatus and method for extracting skin area for blocking harmful content image |
US10325160B2 (en) * | 2015-01-14 | 2019-06-18 | Nec Corporation | Movement state estimation device, movement state estimation method and program recording medium |
JP2016162072A (en) * | 2015-02-27 | 2016-09-05 | 株式会社東芝 | Feature quantity extraction apparatus |
US9858496B2 (en) * | 2016-01-20 | 2018-01-02 | Microsoft Technology Licensing, Llc | Object detection and classification in images |
-
2018
- 2018-06-01 JP JP2020522543A patent/JP7006782B2/en active Active
- 2018-06-01 US US17/059,678 patent/US20210209396A1/en active Pending
- 2018-06-01 WO PCT/JP2018/021207 patent/WO2019229979A1/en active Application Filing
-
2023
- 2023-07-28 US US18/227,699 patent/US20230368033A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2019229979A1 (en) | 2019-12-05 |
US20210209396A1 (en) | 2021-07-08 |
JPWO2019229979A1 (en) | 2021-05-13 |
JP7006782B2 (en) | 2022-01-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230368033A1 (en) | Information processing device, control method, and program | |
US10885365B2 (en) | Method and apparatus for detecting object keypoint, and electronic device | |
CN107358149B (en) | Human body posture detection method and device | |
US10380788B2 (en) | Fast and precise object alignment and 3D shape reconstruction from a single 2D image | |
KR100647322B1 (en) | Apparatus and method of generating shape model of object and apparatus and method of automatically searching feature points of object employing the same | |
US11853882B2 (en) | Methods, apparatus, and storage medium for classifying graph nodes | |
US8369574B2 (en) | Person tracking method, person tracking apparatus, and person tracking program storage medium | |
US9928405B2 (en) | System and method for detecting and tracking facial features in images | |
EP3928248A1 (en) | Neural network for skeletons from input images | |
US8374392B2 (en) | Person tracking method, person tracking apparatus, and person tracking program storage medium | |
US8355576B2 (en) | Method and system for crowd segmentation | |
KR101930940B1 (en) | Apparatus and method for analyzing image | |
US20220254134A1 (en) | Region recognition method, apparatus and device, and readable storage medium | |
KR20100098641A (en) | Invariant visual scene and object recognition | |
CN106971401A (en) | Multiple target tracking apparatus and method | |
US11763086B1 (en) | Anomaly detection in text | |
WO2022152009A1 (en) | Target detection method and apparatus, and device and storage medium | |
CN111353325A (en) | Key point detection model training method and device | |
CN115471863A (en) | Three-dimensional posture acquisition method, model training method and related equipment | |
JP7385416B2 (en) | Image processing device, image processing system, image processing method, and image processing program | |
WO2021140590A1 (en) | Human detection device, human detection method, and recording medium | |
CN111986230A (en) | Method and device for tracking posture of target object in video | |
JP7369247B2 (en) | Information processing device, information processing method and program | |
KR102622941B1 (en) | Apparatus and method of image processing to improve detection and recognition performance for samll objects | |
KR102619275B1 (en) | Object search model and learning method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |