US20230368033A1 - Information processing device, control method, and program - Google Patents

Information processing device, control method, and program Download PDF

Info

Publication number
US20230368033A1
US20230368033A1 US18/227,699 US202318227699A US2023368033A1 US 20230368033 A1 US20230368033 A1 US 20230368033A1 US 202318227699 A US202318227699 A US 202318227699A US 2023368033 A1 US2023368033 A1 US 2023368033A1
Authority
US
United States
Prior art keywords
distribution
partial
likelihood
image data
target object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/227,699
Inventor
Hiroyoshi Miyano
Tetsuaki Suzuki
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to US18/227,699 priority Critical patent/US20230368033A1/en
Publication of US20230368033A1 publication Critical patent/US20230368033A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • the present invention relates to a technology of detecting an object from an image.
  • Patent Document 1 discloses a technology of performing object detection by use of a deep neural network.
  • a system in Patent Document 1 generates a feature map of image data by use of a convolutional neural network and, by inputting the generated feature map to a neural network called a region proposal network (RPN), outputs many proposals of rectangular regions (region proposals) each of which including an object.
  • the system further estimates a class of an object included in a region proposal by performing classification in a layer called a box-classification layer.
  • the system also adjusts a position and a size of a region proposal by performing regression in a layer called a box-regression convolutional layer.
  • Non Patent Document 1 generates a plurality of feature maps by use of a convolutional neural network and outputs many object proposals from each feature map.
  • the each object proposal includes rectangular coordinates and a likelihood of an object class.
  • Patent Document 1 and Non Patent Document 1 a case of significantly overlapping objects is eliminated as erroneous detection, and therefore a case of significant overlap is conversely not considered; and it is conceivable that a plurality of overlapping objects are erroneously detected as a single object in such a case.
  • the present invention has been made in view of the aforementioned problem and provides a technology capable of distinctively detecting objects even when the objects overlap each another in image data.
  • An information processing apparatus includes: 1) a generation unit configured to acquire image data and generate likelihood data representing a likelihood of existence of a target object with respect to a position and a size for each of a plurality of partial regions included in the image data; 2) an extraction unit configured to compute a distribution of a likelihood of existence of a target object with respect to a position and a size by computing a total sum of likelihood data each piece of which is generated for each partial region and extract, from the computed distribution, one or more partial distributions each of which relates to one target object; and 3) an output unit configured to, for each extracted partial distribution, output a position and a size of a target object relating to the partial distribution, based on a statistic of the partial distribution.
  • a control method is executed by a computer.
  • the control method includes: 1) a generation step of acquiring image data and generating likelihood data representing a likelihood of existence of a target object with respect to a position and a size for each of a plurality of partial regions included in the image data; 2) an extraction step of computing a distribution of a likelihood of existence of a target object with respect to a position and a size by computing a total sum of likelihood data each piece of which is generated for each partial region and extracting, from the computed distribution, one or more partial distributions each of which relates to one target object; and 3) an output step of, for each extracted partial distribution, outputting a position and a size of a target object relating to the partial distribution, based on a statistic of the partial distribution.
  • a program according to the present invention causes a computer to execute each step included in the control method according to the present invention.
  • the present invention provides a technology capable of distinctively detecting objects even when the objects overlap each another in image data.
  • FIG. 1 is a diagram conceptually illustrating processing performed by an information processing apparatus according to the example embodiment 1.
  • FIG. 2 is a diagram illustrating image data including target objects significantly overlapping each other.
  • FIG. 3 is a diagram illustrating a functional configuration of the information processing apparatus according to the example embodiment 1.
  • FIG. 4 is a diagram illustrating a computer for providing the information processing apparatus.
  • FIG. 5 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to the example embodiment 1.
  • FIG. 6 is a diagram illustrating a method of extracting a partial region by use of a sliding window.
  • FIG. 7 is a diagram illustrating a neural network used for generation of likelihood data.
  • FIG. 8 is a diagram conceptually illustrating likelihood data generated based on a likelihood Li.
  • FIG. 9 is a diagram illustrating a neural network outputting parameters of a normal distribution indicated by likelihood data.
  • FIG. 10 is a flowchart illustrating a flow of processing of extracting a partial distribution on the basis of the maximum value of a PHD.
  • FIG. 11 is a diagram illustrating a position and a size of a target object determined based on a partial distribution.
  • FIG. 12 is a block diagram illustrating an information processing apparatus having a function of learning by a neural network.
  • FIG. 13 is a diagram illustrating an ideal PHD.
  • FIG. 1 is a diagram conceptually illustrating processing performed by an information processing apparatus 2000 according to the present example embodiment.
  • the information processing apparatus 2000 acquires image data 10 and detects a target object from the image data 10 .
  • Detection of a target object means determination of a position and a size of an image region (such as a circumscribed rectangle) including the target object from the image data 10 .
  • Any object may be handled as a target object, or only a specific type of object (such as only a human) may be handled as a target object.
  • the information processing apparatus 2000 detects an object by a method described below. First, the information processing apparatus 2000 generates parameters representing likelihood data for each of a plurality of partial regions 12 in the image data 10 .
  • the likelihood data are data being associated with a position and a size on the image data 10 and indicating a distribution of a likelihood that a target object exists in an image region at the position with the size. Specifically, denoting a predetermined probability density function the integral of which is 1 as f and a generated parameter as L, likelihood data is expressed by L ⁇ f.
  • a normal distribution the position and the variance of which vary for each partial region may be used as the probability density function f, or a ⁇ function may be used for expressing existence at a specific position only, or another probability density function may be adopted.
  • a ⁇ function represents a function taking infinity only at a specific value, taking 0 at the other values, and having an integral value of 1.
  • the integral value of the likelihood data L ⁇ f matches the value of the generated parameter L.
  • the likelihood data in FIG. 1 indicate such a distribution. Further details of the likelihood data will be described later.
  • the information processing apparatus 2000 computes a distribution of an existence likelihood of a target object with respect to a position and a size by computing the total sum of likelihood data each piece of which is generated for each partial region 12 .
  • the distribution is a so-called probability hypothesis density (PHD).
  • PHD is a distribution function having a characteristic that the integrated value matches the number of existing objects.
  • the information processing apparatus 2000 extracts, from the PHD, partial distributions each of which relates to one target object (hereinafter referred to as partial distributions). Ideally, each of the partial distributions is extracted in such a way that the integral value thereof is 1, and each partial distribution relates to one target object.
  • Three partial distributions are extracted from a PHD in FIG. 1 .
  • the integrated value of the PHD is 3, and partial distributions are extracted in such a way that the integral of each partial distributions is 1.
  • the partial distributions may be extracted in such a way as to overlap each other.
  • each integral value becomes 1 when a shape of a partial distribution is limited to a normal distribution
  • the partial distributions may be determined in such a way as to minimize the error between the sum of the partial distributions and the PHD.
  • each partial distribution may be limited to a normal distribution ⁇ a weight.
  • the integral value matches the weight in the case of the limitation, and therefore the partial distributions may be determined in such a way as to minimize the total sum of the error between the sum of the partial distributions and the PHD, and the error between the weight value and 1.
  • a distribution other than a normal distribution may be adopted as a limited distribution shape.
  • the information processing apparatus 2000 For each extracted partial distribution, the information processing apparatus 2000 outputs a position and a size of a target object represented by the partial distribution, based on a statistic such as the mean of the partial distribution.
  • a position of a target object is represented by coordinates of a predetermined position (such as an upper-left corner) of a circumscribed rectangle representing the target object.
  • a size of a target object can be represented by a width and a height of a rectangular region representing the target object.
  • each distribution illustrated in FIG. 1 is depicted two-dimensionally (horizontal axis: position/size ⁇ vertical axis: likelihood) for convenience of illustration, the distribution is actually a distribution on a three-or-more-dimensional space.
  • a position of an image region is represented by coordinates
  • the shape of the image region is a rectangle
  • the size of the rectangle is represented by a width and a height.
  • each distribution illustrated in FIG. 1 is expressed on a five-dimensional (X coordinate, Y coordinate, width, height ⁇ likelihood) space.
  • the information processing apparatus 2000 detects a target object by a method of computing a PHD by adding up likelihood data each piece of which is computed for each partial region, and extracting a partial distribution representing one target object.
  • the method enables highly precise distinction even between significantly overlapping target objects and detection of the target objects as separate target objects. The reason will be described below with reference to FIG. 2 .
  • FIG. 2 is a diagram illustrating image data 10 including significantly overlapping target objects.
  • the image data 10 is a captured image of a scene in which two persons pass each other. When persons are correctly detected from the image data 10 , two persons are detected. However, it is difficult to distinctively detect persons being significantly overlapping objects by existing techniques, and the probability of the two persons being collectively detected as one person is high.
  • the information processing apparatus 2000 generates a PHD acquired by adding up likelihood data each piece of which is generated for each partial region 12 .
  • the integrated value in any section of the PHD represents the number of target objects in the section.
  • information about the number of target objects is included in a PHD being information acquired by integrating information acquired from each partial region 12 .
  • a partial distribution the integral value of which is 1 is extracted from a PHD.
  • This enables separation of significantly overlapping target objects and acquisition of a probability distribution of a position and a size of an image region relating to each target object.
  • a shaded partial distribution and a dotted partial distribution are extracted from a PHD in FIG. 2 . Then, by determining a position and a size of a target object for each extracted partial distribution, each target object can be detected.
  • FIG. 1 and FIG. 2 is an exemplification for ease of understanding of the information processing apparatus 2000 and does not limit the functions of the information processing apparatus 2000 .
  • the information processing apparatus 2000 according to the present example embodiment will be described in more detail below.
  • FIG. 3 is a diagram illustrating a functional configuration of the information processing apparatus 2000 according to the example embodiment 1.
  • the information processing apparatus 2000 includes a generation unit 2020 , an extraction unit 2040 , and an output unit 2060 .
  • the generation unit 2020 acquires image data 10 and generates likelihood data for each of a plurality of partial regions 12 included in the image data 10 .
  • the extraction unit 2040 computes a PHD by computing the total sum of likelihood data each piece of which is generated for each partial region 12 .
  • the extraction unit 2040 extracts, from the computed PHD, one or more partial distributions each of which relates to one target object.
  • the output unit 2060 outputs a position and a size of a target object represented by the partial distribution, based on a statistic of the partial distribution.
  • Each functional configuration unit in the information processing apparatus 2000 may be provided by hardware (such as a hardwired electronic circuit) providing each functional configuration unit or may be provided by a combination of hardware and software (such as a combination of an electronic circuit and a program controlling the circuit).
  • hardware such as a hardwired electronic circuit
  • software such as a combination of an electronic circuit and a program controlling the circuit.
  • FIG. 4 is a diagram illustrating a computer 1000 for providing the information processing apparatus 2000 .
  • the computer 1000 may be any computer. Examples of the computer 1000 include a personal computer (PC) and a server machine.
  • the computer 1000 may be a dedicated computer designed for providing the information processing apparatus 2000 or may be a general-purpose computer.
  • the computer 1000 includes a bus 1020 , a processor 1040 , a memory 1060 , a storage device 1080 , an input-output interface 1100 , and a network interface 1120 .
  • the bus 1020 is a data transmission channel for the processor 1040 , the memory 1060 , the storage device 1080 , the input-output interface 1100 , and the network interface 1120 to mutually transmit and receive data.
  • a method of connecting the processor 1040 and the like to each another is not limited to the bus connection.
  • the processor 1040 includes various processors such as a central processing unit (CPU), a graphics processing unit (GPU), and a field-programmable gate array (FPGA).
  • the memory 1060 is a main storage provided by use of a random access memory (RAM) and/or the like.
  • the storage device 1080 is an auxiliary storage provided by use of a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), and/or the like.
  • the input-output interface 1100 is an interface for connecting the computer 1000 to an input/output device.
  • the input-output interface 1100 is connected to an input apparatus such as a keyboard and an output apparatus such as a display apparatus.
  • the network interface 1120 is an interface for connecting the computer 1000 to a communication network. Examples of the communication network include a local area network (LAN) and a wide area network (WAN).
  • a method of connecting the network interface 1120 to the communication network may be a wireless connection or a wired connection.
  • the storage device 1080 stores a program module providing each functional configuration unit in the information processing apparatus 2000 .
  • the processor 1040 provides a function relating to each program module by reading the program module into the memory 1060 and executing the program module.
  • FIG. 5 is a flowchart illustrating a flow of processing executed by the information processing apparatus 2000 according to the example embodiment 1.
  • the generation unit 2020 acquires image data 10 (S 102 ).
  • the generation unit 2020 generates likelihood data for each of a plurality of partial regions 12 included in the image data 10 (S 104 ).
  • the extraction unit 2040 computes a PHD by adding up likelihoods represented by the likelihood data (S 106 ).
  • the extraction unit 2040 extracts one or more partial distributions from the PHD (S 108 ). For each partial distribution, the output unit 2060 outputs a position and a size of a target object relating to the partial distribution (S 110 ).
  • the information processing apparatus 2000 may execute a series of processes illustrated in FIG. 5 in response to any trigger.
  • the information processing apparatus 2000 executes the aforementioned series of processes in response to input of the image data 10 .
  • the information processing apparatus 2000 may execute the aforementioned series of processes in response to a predetermined input operation by a user.
  • the generation unit 2020 acquires image data 10 (S 102 ). Any image data may be used as the image data 10 .
  • the image data 10 are a captured image generated by a camera.
  • the camera may be a still camera or a video camera.
  • a captured image generated by a camera may be a captured image generated by a camera itself or an image acquired by applying some processing on a captured image generated by a camera.
  • the information processing apparatus 2000 may be provided inside a camera generating the image data 10 .
  • a camera generating the image data 10 .
  • an object can be detected in real time from a surveillance video generated by the surveillance camera.
  • types of camera called an intelligent camera, an Internet Protocol (IP) camera, and a network camera can be used as a camera incorporating the function of the information processing apparatus 2000 .
  • IP Internet Protocol
  • the generation unit 2020 may acquire image data 10 by any method.
  • the generation unit 2020 acquires image data 10 from a storage storing the image data 10 .
  • the storage storing the image data 10 may be provided inside the information processing apparatus 2000 or may be provided outside.
  • the information processing apparatus 2000 acquires image data 10 input by an input operation by a user.
  • the generation unit 2020 acquires image data 10 by receiving the image data 10 transmitted by another apparatus.
  • a partial region 12 is a partial image region included in the image data 10 .
  • a partial region 12 is different from another partial region 12 with respect to at least either one of a position and a size.
  • the generation unit 2020 extracts each partial region 12 included in the image data 10 and, by analyzing the extracted partial region 12 , generates likelihood data for the partial region 12 .
  • a partial region 12 can be extracted by use of a sliding window.
  • FIG. 6 is a diagram illustrating a method of extracting a partial region 12 by use of a sliding window.
  • the information processing apparatus 2000 moves a sliding window with a predetermined size (width: Ws, height: Hs) at a predetermined stride d.
  • a plurality of image regions with different sizes are extracted from the sliding window at various positions and each image region is handled as a partial region 12 .
  • partial regions 12 with varying positions and sizes can be extracted.
  • a technique using an Anchor box disclosed in Patent Document 1 can be used to extract as a thus partial region 12 .
  • a partial region 12 may be extracted from a feature map generated from the image data instead of being directly extracted from the image data 10 .
  • a neural network 20 to be described later is constituted of a layer for extracting a feature map from the image data 10 (such as a convolutional layer in a convolutional neural network) and a layer for extracting a partial region 12 from a feature map output from the layer and generating likelihood data.
  • a shape of a partial region 12 is not necessarily limited to a rectangle.
  • the partial region 12 can be represented by center coordinates and a length of a radius.
  • a polygon in any shape can be handled as a partial region 12 . In this case, both a position and a size of the partial region 12 is determined by a set of vertices of the partial region 12 .
  • the generation unit 2020 generates parameters representing likelihood data for each of a plurality of partial regions 12 included in the image data 10 and generates likelihood data (S 104 ).
  • parameters representing likelihood data are generated by use of a neural network.
  • FIG. 7 is a diagram illustrating a neural network used for generation of parameters representing likelihood data.
  • a neural network 20 outputs, for each partial region 12 included in the image data 10 , a likelihood Li that a target object exists in an image region with the position and the size of the partial region 12 .
  • Li is a likelihood output for an i-th partial region 12 .
  • the generation unit 2020 sets a distribution determined based on a likelihood Li as likelihood data.
  • FIG. 8 is a diagram conceptually illustrating likelihood data generated based on a likelihood Li.
  • likelihood data represent a distribution having a variance of 0 and being generated based on a likelihood Li.
  • the distribution is expressed as Li ⁇ function by use a ⁇ function.
  • likelihood data in the lower part of FIG. 8 represent a distribution with a nonzero variance.
  • a distribution conforming to a predetermined model such as a normal distribution is predetermined as a distribution as a reference (hereinafter referred to as a reference distribution).
  • a reference distribution may be determined as a distribution having 1 as the integral value, the position and the size of the partial region 12 as the mean, and a predetermined value as the variance. Any value may be set to the variance.
  • the generation unit 2020 generates likelihood data by multiplying a reference distribution by a likelihood Li.
  • a reference distribution model is a normal distribution. Then, based on the position (xi, yi) of the partial region 12 and the size (wi, hi) of the partial region 12 , the mean of the reference distribution is (xi, yi, wi, hi). Further, the variance of the reference distribution is vi. From the above, the reference distribution is N[(xi, yi, wi, hi), vi]. Furthermore, a likelihood output from the neural network 20 is Li. Then, the generation unit 2020 generates a distribution indicating the likelihood data by multiplying the reference distribution by Li. The integral value of a distribution of the acquired likelihood data is Li.
  • a reference distribution conforming to a distribution model may not be predetermined, and parameters of a distribution model may be output from the neural network 20 .
  • parameters of a distribution model are the aforementioned mean and variance. Then, the neural network 20 outputs a mean and a variance for each partial region 12 .
  • FIG. 9 is a diagram illustrating the neural network 20 outputting parameters of a normal distribution indicated by likelihood data.
  • a likelihood Li, (xiu, yiu, wiu, hiu) representing the mean of a normal distribution, and the variance vi of the normal distribution are output for each partial region 12 .
  • the generation unit 2020 generates a distribution indicated by the likelihood data.
  • the position (xi, yi) output from the neural network 20 may be different from the original position of a relating i-th partial region 12 .
  • the size (wi, hi) output from the neural network 20 may be different from the original size of the relating i-th partial region 12 .
  • the neural network 20 adjusts and outputs the position and the size of the partial region 12 in such a way as to increase a likelihood that a target object is included in the partial region 12 by causing the neural network 20 to perform learning in such a way as to output an ideal PHD.
  • the neural network 20 does not necessarily output all parameters of the distribution model and may output only part of the parameters.
  • the mean of the normal distribution is output from the neural network 20 , and a predetermined value is used as the variance.
  • any structure may be used as an internal structure (such as the number and an order of layers, a type of each layer, and a connection relation between the layers) of the neural network.
  • the same structure as that of the region proposal network (RPN) described in Patent Document 1 may be adopted as the structure of the neural network 20 .
  • the network described in Non Patent Document 1 may be used.
  • generation of likelihood data does not necessarily need to be performed by use of a neural network, and another existing technique of, for each of a plurality of partial regions in image data, computing a likelihood that a target object is included in the partial region may be used.
  • the extraction unit 2040 extracts one or more partial distributions from the PHD.
  • a partial distribution is a probability distribution representing, with respect to a partial region including one target object, an existence probability of a target object with respect to the position and the size of the partial region.
  • a partial distribution is a probability distribution, and the integral value thereof is 1.
  • the extraction unit 2040 computes the number of target objects included in the image data 10 , based on the PHD. Specifically, the extraction unit 2040 computes the integral value of the PHD and determines the computed integral value to be the number of target objects included in the image data 10 . However, it is conceivable that the integral value of the PHD does not completely match the number of target objects due to an error or the like and is not a natural number. Then, in this case, the extraction unit 2040 handles an approximate value (such as a value acquired by dropping the fractional portion) of the integral value of the PHD as the number of target objects.
  • the extraction unit 2040 extracts the computed number of partial distributions from the PHD. For example, the extraction unit 2040 extracts partial distributions from the PHD on the basis of the maximum value of the PHD.
  • FIG. 10 is a flowchart illustrating a flow of processing of extracting partial distributions on the basis of the maximum value of the PHD. Loop processing illustrated in the flowchart in FIG. 10 is repeatedly executed while a counter i is less than the integral value S of the PHD. The counter i is initialized to 0 at first and is incremented by 1 every time the loop processing is executed. In this case, the number of partial distributions is a maximum integer equal to or less than S.
  • the extraction unit 2040 determines whether the counter i is less than S. When i is less than S, the processing in FIG. 10 advances to S 204 . On the other hand, when i is equal to or greater than S, the processing in FIG. 10 ends.
  • the extraction unit 2040 determines a position and a size relating to the maximum value of the PHD (S 204 ).
  • the extraction unit 2040 extracts a partial distribution being centered on the position and the size and having the integral value of 1 from the PHD (removes the partial distribution from the PHD) (S 206 ). Since S 208 is the end of the loop processing, the processing returns to S 202 .
  • any space clustering technique may also be used as a method of extracting partial distributions from a PHD.
  • a PHD can be written as the total sum ⁇ i(Li ⁇ fi) of the output results.
  • Hierarchical clustering of computing a distance between positions represented by all output results Li, adding output results at a short distance from each other, and decreasing the total number down to a predetermined number may be adopted.
  • the output unit 2060 For each extracted partial distribution, the output unit 2060 outputs a position and a size of a target object represented by the partial distribution (S 110 ). Specifically, the output unit 2060 determines the position and the size of the target object, based on a statistic of the partial distribution. For example, the output unit 2060 determines the mean of the partial distribution to be the position and the size of the target object. In addition, for example, the output unit 2060 may determine a position and a size relating to the maximum value of the partial distribution to be the position and the size of the target object. Then, the output unit 2060 outputs the determined position and size for each partial distribution.
  • FIG. 11 is a diagram illustrating a position and a size of a target object determined based on a partial distribution.
  • two partial distributions D 1 and D 2 are extracted from a PHD.
  • the output unit 2060 determines a position (x1, y1) and a size (w1, h1) of a target object, based on the partial distribution D 1 .
  • the output unit 2060 determines a position (x2, y2) and a size (w2, h2) of a target object, based on the partial distribution D 2 .
  • each of an image region at the position (x1, y1) with a width w1 and a height h1, and an image region at the position (x2, y2) with a width w2 and a height h2 represents a target object.
  • the output unit 2060 outputs a position and a size of a target object in various forms.
  • the output unit 2060 stores, into a storage, data (such as a list) indicating, for each target object, a combination of “an identifier assigned to the target object, the position of the target object, and the size of the target object” in association with the image data 10 .
  • data such as a list
  • any method may be used as a method of assigning an identifier to an object detected from image data.
  • the output unit 2060 may output a display (such as frame) indicating a position and a size of a determined target object, the display being superposed on the image data 10 , as illustrated in FIG. 11 .
  • the display may be output to any destination and may be output to, for example, a storage and/or a display apparatus.
  • the output unit 2060 may further output the number of target objects.
  • a computation method of the number of target objects is as described above.
  • FIG. 12 is a block diagram illustrating the information processing apparatus 2000 having a function of performing learning by the neural network 20 .
  • the learning by the information processing apparatus 2000 is executed by a learning unit 2080 .
  • the learning unit 2080 computes a predicted loss between a PHD based on an actual output of the neural network 20 and an ideal PHD.
  • the ideal PHD may be expressed as the sum of normal distributions each of which being previously specified with a variance and being centered on a position of a rectangle representing an object being a correct answer. Alternatively, the ideal PHD may be handled as a ⁇ function the variance of which is 0, or another function may be used.
  • learning by the neural network 20 is performed based on the predicted loss. More specifically, the learning unit 2080 performs learning by the neural network 20 by updating parameters (a weight value and a bias value) of the neural network 20 by propagating the computed predicted loss in inverse order (back propagating) from an output node in the neural network 20 .
  • Various existing methods such as a gradient descent method may be used as a method of performing learning by a neural network by back propagation based on a predicted loss.
  • a determination method and a computation method of a predicted loss used in learning by the neural network 20 will be described below.
  • the learning unit 2080 computes a PHD relating to an actual output by use of the actual output acquired by inputting image data for learning (hereinafter referred to as learning image data) to the neural network 20 .
  • the learning unit 2080 further computes a predicted loss between the PHD relating to the actual output and an ideal PHD predetermined based on the learning image data. For example, the square error between the PHDs may be used as the predicted loss.
  • a PHD divided by the integral value can be handled as a probability density function the integral value of which is 1, any technique capable of handling a loss as an error between probability density functions may be used.
  • the minus value of the product of an ideal probability density function and a probability density function relating to the actual output may be determined as a loss.
  • an error of the integral value may be handled as a loss, or several of the losses may be combined.
  • a PHD relating to an actual output can be written as ⁇ i(Li ⁇ fi).
  • an ideal PHD can be written as ⁇ j(gj).
  • Nj the number of the assigned outputs
  • an error between Li for assigned i and (1/Nj) such as the square of (Li ⁇ 1/Nj) may be minimized. This is a technique for learning Li in such a way that the integral values match.
  • an ideal PHD indicates a distribution ( ⁇ function) having a likelihood of 1 at a position of the position and the size of the image region and having a variance of 0.
  • FIG. 13 is a diagram illustrating an ideal PHD.
  • target objects are included in two image regions 40 - 1 and 40 - 2 .
  • the position and the size of the image region 40 - 1 are (x1, y1) and (w1, h1), respectively. Therefore, an ideal PHD indicates a ⁇ function with a peak at (x1, y1, w1, h1).
  • the position and the size of the image region 40 - 2 are (x2, y2) and (w2, h2), respectively. Therefore, an ideal PHD indicates a ⁇ function with a peak at (x2, y2, w2, h2).
  • an ideal PHD relating to learning image data is previously generated by hand and is stored in a storage in association with the learning image data.
  • the learning unit 2080 performs learning by the neural network 20 by use of one or more of thus prepared combinations of learning image data and an ideal PHD.
  • An information processing apparatus 2000 according to an example embodiment 2 distinctively handles a plurality of types of target objects. To do so, the generation unit 2020 according to the example embodiment 2 generates likelihood data for each of mutually different types of target objects. Therefore, likelihood data are generated for each type of target object for one partial region 12 .
  • an extraction unit 2040 generates a PHD for each type of target object. This is achieved by adding up likelihood data for each type of target object. Then, the extraction unit 2040 extracts a partial distribution from each PHD.
  • An output unit 2060 outputs a position and a size of a target object relating to each partial distribution. Each partial distribution relates to one type of target object. Then, the output unit 2060 outputs a position and a size of a target object relating to a partial distribution along with the type of the target object.
  • the information processing apparatus 2000 includes a neural network 20 for each type of target object.
  • Each neural network 20 previously performs learning in such a way as to detect a relating type of target object.
  • an ideal PHD is set to indicate a likelihood of 1 for a position and a size of an image region representing a human in learning image data and indicate a likelihood of 0 for a position and a size of another image region (an image region in which an object does not exist or an object other than a human exists).
  • a learning unit 2080 causes a neural network 20 for detecting a certain type of target object to perform learning by use of a combination of “learning image data and an ideal PHD for the type of target object.”
  • FIG. 4 a hardware configuration of a computer providing the information processing apparatus 2000 according to the example embodiment 2 is illustrated by FIG. 4 , similarly to the example embodiment 1.
  • a storage device 1080 in a computer 1000 providing the information processing apparatus 2000 according to the present example embodiment further stores a program module providing the function of the information processing apparatus 2000 according to the present example embodiment.
  • the information processing apparatus 2000 can detect a target object for each type thereof. Accordingly, positions of mutually different types of target objects can be recognized including the types thereof.

Abstract

An information processing apparatus (2000) generates likelihood data for each of a plurality of partial regions (12) in image data (10). The likelihood data are data being associated with a position and a size on the image data (10) and indicating a likelihood that a target object exists in an image region at the position with the size. The information processing apparatus (2000) computes a distribution (probability hypothesis density: PHD) of an existence likelihood of a target object with respect to a position and a size by computing the total sum of likelihood data each piece of which is generated for each partial region (12). The information processing apparatus (2000) extracts, from the PHD, partial distributions each of which relates to one target object. For each extracted partial distribution, the information processing apparatus (2000) outputs a position and a size of a target object represented by the partial distribution, based on a statistic of the partial distribution.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • The present application is a continuation application of U.S. patent application Ser. No. 17/059,678 filed on Nov. 30, 2020, which is a National Stage Entry of PCT/JP2018/021207 filed on Jun. 1, 2018, the contents of all of which are incorporated herein by reference, in their entirety.
  • TECHNICAL FIELD
  • The present invention relates to a technology of detecting an object from an image.
  • BACKGROUND ART
  • Technologies of detecting an object from image data have been developed. For example, Patent Document 1 discloses a technology of performing object detection by use of a deep neural network. A system in Patent Document 1 generates a feature map of image data by use of a convolutional neural network and, by inputting the generated feature map to a neural network called a region proposal network (RPN), outputs many proposals of rectangular regions (region proposals) each of which including an object. The system further estimates a class of an object included in a region proposal by performing classification in a layer called a box-classification layer. The system also adjusts a position and a size of a region proposal by performing regression in a layer called a box-regression convolutional layer.
  • Further, a system in Non Patent Document 1 generates a plurality of feature maps by use of a convolutional neural network and outputs many object proposals from each feature map. The each object proposal includes rectangular coordinates and a likelihood of an object class.
  • Many erroneous outputs not being correct answers are included in the aforementioned outputs in both the technique in Patent Document 1 and the technique in Non Patent Document 1. Therefore, a detection result to be finally output is acquired out of many object proposals by performing processing of reducing neighboring and significantly overlapping region proposals, the processing being called non-maximum suppression.
  • RELATED DOCUMENT Patent Document
    • [Patent Document 1] United States Patent Application Publication No. 2017/0206431, Specification
    Non-Patent Document
    • [Non Patent Document 1] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg, “Single Shot MultiBox Detector,” ECCV 2016
    SUMMARY OF THE INVENTION Technical Problem
  • In Patent Document 1 and Non Patent Document 1, a case of significantly overlapping objects is eliminated as erroneous detection, and therefore a case of significant overlap is conversely not considered; and it is conceivable that a plurality of overlapping objects are erroneously detected as a single object in such a case.
  • The present invention has been made in view of the aforementioned problem and provides a technology capable of distinctively detecting objects even when the objects overlap each another in image data.
  • Solution to Problem
  • An information processing apparatus according to the present invention includes: 1) a generation unit configured to acquire image data and generate likelihood data representing a likelihood of existence of a target object with respect to a position and a size for each of a plurality of partial regions included in the image data; 2) an extraction unit configured to compute a distribution of a likelihood of existence of a target object with respect to a position and a size by computing a total sum of likelihood data each piece of which is generated for each partial region and extract, from the computed distribution, one or more partial distributions each of which relates to one target object; and 3) an output unit configured to, for each extracted partial distribution, output a position and a size of a target object relating to the partial distribution, based on a statistic of the partial distribution.
  • A control method according to the present invention is executed by a computer. The control method includes: 1) a generation step of acquiring image data and generating likelihood data representing a likelihood of existence of a target object with respect to a position and a size for each of a plurality of partial regions included in the image data; 2) an extraction step of computing a distribution of a likelihood of existence of a target object with respect to a position and a size by computing a total sum of likelihood data each piece of which is generated for each partial region and extracting, from the computed distribution, one or more partial distributions each of which relates to one target object; and 3) an output step of, for each extracted partial distribution, outputting a position and a size of a target object relating to the partial distribution, based on a statistic of the partial distribution.
  • A program according to the present invention causes a computer to execute each step included in the control method according to the present invention.
  • Advantageous Effects of the Invention
  • The present invention provides a technology capable of distinctively detecting objects even when the objects overlap each another in image data.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The aforementioned object, other objects, features and advantages will become more apparent by use of the following preferred example embodiments and accompanying drawings.
  • FIG. 1 is a diagram conceptually illustrating processing performed by an information processing apparatus according to the example embodiment 1.
  • FIG. 2 is a diagram illustrating image data including target objects significantly overlapping each other.
  • FIG. 3 is a diagram illustrating a functional configuration of the information processing apparatus according to the example embodiment 1.
  • FIG. 4 is a diagram illustrating a computer for providing the information processing apparatus.
  • FIG. 5 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to the example embodiment 1.
  • FIG. 6 is a diagram illustrating a method of extracting a partial region by use of a sliding window.
  • FIG. 7 is a diagram illustrating a neural network used for generation of likelihood data.
  • FIG. 8 is a diagram conceptually illustrating likelihood data generated based on a likelihood Li.
  • FIG. 9 is a diagram illustrating a neural network outputting parameters of a normal distribution indicated by likelihood data.
  • FIG. 10 is a flowchart illustrating a flow of processing of extracting a partial distribution on the basis of the maximum value of a PHD.
  • FIG. 11 is a diagram illustrating a position and a size of a target object determined based on a partial distribution.
  • FIG. 12 is a block diagram illustrating an information processing apparatus having a function of learning by a neural network.
  • FIG. 13 is a diagram illustrating an ideal PHD.
  • DESCRIPTION OF EMBODIMENTS
  • Example embodiments of the present invention will be described below by use of drawings. Note that, in all drawings, a similar sign is given to similar components, and description thereof is omitted as appropriate. Further, each block in each block diagram represents a function-based configuration rather than a hardware-based configuration unless otherwise described.
  • Example Embodiment 1 Outline
  • FIG. 1 is a diagram conceptually illustrating processing performed by an information processing apparatus 2000 according to the present example embodiment. The information processing apparatus 2000 acquires image data 10 and detects a target object from the image data 10. Detection of a target object means determination of a position and a size of an image region (such as a circumscribed rectangle) including the target object from the image data 10. Any object may be handled as a target object, or only a specific type of object (such as only a human) may be handled as a target object.
  • The information processing apparatus 2000 detects an object by a method described below. First, the information processing apparatus 2000 generates parameters representing likelihood data for each of a plurality of partial regions 12 in the image data 10. The likelihood data are data being associated with a position and a size on the image data 10 and indicating a distribution of a likelihood that a target object exists in an image region at the position with the size. Specifically, denoting a predetermined probability density function the integral of which is 1 as f and a generated parameter as L, likelihood data is expressed by L×f.
  • For example, a normal distribution the position and the variance of which vary for each partial region may be used as the probability density function f, or a δ function may be used for expressing existence at a specific position only, or another probability density function may be adopted. Note that a δ function represents a function taking infinity only at a specific value, taking 0 at the other values, and having an integral value of 1.
  • The integral value of the likelihood data L×f matches the value of the generated parameter L. The likelihood data in FIG. 1 indicate such a distribution. Further details of the likelihood data will be described later.
  • The information processing apparatus 2000 computes a distribution of an existence likelihood of a target object with respect to a position and a size by computing the total sum of likelihood data each piece of which is generated for each partial region 12. The distribution is a so-called probability hypothesis density (PHD). The PHD is a distribution function having a characteristic that the integrated value matches the number of existing objects. The information processing apparatus 2000 extracts, from the PHD, partial distributions each of which relates to one target object (hereinafter referred to as partial distributions). Ideally, each of the partial distributions is extracted in such a way that the integral value thereof is 1, and each partial distribution relates to one target object.
  • Three partial distributions are extracted from a PHD in FIG. 1 . The integrated value of the PHD is 3, and partial distributions are extracted in such a way that the integral of each partial distributions is 1. Note that while the three partial distributions are extracted in such a way as not to overlap each other in FIG. 1 , the partial distributions may be extracted in such a way as to overlap each other. For example, while each integral value becomes 1 when a shape of a partial distribution is limited to a normal distribution, the partial distributions may be determined in such a way as to minimize the error between the sum of the partial distributions and the PHD. Alternatively, each partial distribution may be limited to a normal distribution×a weight. The integral value matches the weight in the case of the limitation, and therefore the partial distributions may be determined in such a way as to minimize the total sum of the error between the sum of the partial distributions and the PHD, and the error between the weight value and 1. Alternatively, a distribution other than a normal distribution may be adopted as a limited distribution shape.
  • For each extracted partial distribution, the information processing apparatus 2000 outputs a position and a size of a target object represented by the partial distribution, based on a statistic such as the mean of the partial distribution. For example, a position of a target object is represented by coordinates of a predetermined position (such as an upper-left corner) of a circumscribed rectangle representing the target object. For example, a size of a target object can be represented by a width and a height of a rectangular region representing the target object.
  • Note that while each distribution illustrated in FIG. 1 is depicted two-dimensionally (horizontal axis: position/size×vertical axis: likelihood) for convenience of illustration, the distribution is actually a distribution on a three-or-more-dimensional space. For example, it is assumed that a position of an image region is represented by coordinates, the shape of the image region is a rectangle, and the size of the rectangle is represented by a width and a height. In this case, each distribution illustrated in FIG. 1 is expressed on a five-dimensional (X coordinate, Y coordinate, width, height×likelihood) space.
  • Advantageous Effects
  • As described above, the information processing apparatus 2000 according to the present example embodiment detects a target object by a method of computing a PHD by adding up likelihood data each piece of which is computed for each partial region, and extracting a partial distribution representing one target object. The method enables highly precise distinction even between significantly overlapping target objects and detection of the target objects as separate target objects. The reason will be described below with reference to FIG. 2 .
  • FIG. 2 is a diagram illustrating image data 10 including significantly overlapping target objects. The image data 10 is a captured image of a scene in which two persons pass each other. When persons are correctly detected from the image data 10, two persons are detected. However, it is difficult to distinctively detect persons being significantly overlapping objects by existing techniques, and the probability of the two persons being collectively detected as one person is high.
  • With regard to this point, the information processing apparatus 2000 according to the present example embodiment generates a PHD acquired by adding up likelihood data each piece of which is generated for each partial region 12. The integrated value in any section of the PHD represents the number of target objects in the section. Thus, in the information processing apparatus 2000, information about the number of target objects is included in a PHD being information acquired by integrating information acquired from each partial region 12. By thus checking an integral value of a PHD including information about the number of target objects, each target object can be precisely detected even from image data including significantly overlapping target objects.
  • Specifically, a partial distribution the integral value of which is 1 is extracted from a PHD. This enables separation of significantly overlapping target objects and acquisition of a probability distribution of a position and a size of an image region relating to each target object. For example, a shaded partial distribution and a dotted partial distribution are extracted from a PHD in FIG. 2 . Then, by determining a position and a size of a target object for each extracted partial distribution, each target object can be detected.
  • Note that the aforementioned description with reference to FIG. 1 and FIG. 2 is an exemplification for ease of understanding of the information processing apparatus 2000 and does not limit the functions of the information processing apparatus 2000. The information processing apparatus 2000 according to the present example embodiment will be described in more detail below.
  • Example of Functional Configuration of Information Processing Apparatus 2000 FIG. 3 is a diagram illustrating a functional configuration of the information processing apparatus 2000 according to the example embodiment 1. The information processing apparatus 2000 includes a generation unit 2020, an extraction unit 2040, and an output unit 2060. The generation unit 2020 acquires image data 10 and generates likelihood data for each of a plurality of partial regions 12 included in the image data 10. The extraction unit 2040 computes a PHD by computing the total sum of likelihood data each piece of which is generated for each partial region 12. The extraction unit 2040 extracts, from the computed PHD, one or more partial distributions each of which relates to one target object. For each extracted partial distribution, the output unit 2060 outputs a position and a size of a target object represented by the partial distribution, based on a statistic of the partial distribution.
  • Hardware Configuration of Information Processing Apparatus 2000
  • Each functional configuration unit in the information processing apparatus 2000 may be provided by hardware (such as a hardwired electronic circuit) providing each functional configuration unit or may be provided by a combination of hardware and software (such as a combination of an electronic circuit and a program controlling the circuit). The case of each functional configuration unit in the information processing apparatus 2000 being provided by a combination of hardware and software will be further described below.
  • FIG. 4 is a diagram illustrating a computer 1000 for providing the information processing apparatus 2000. The computer 1000 may be any computer. Examples of the computer 1000 include a personal computer (PC) and a server machine. The computer 1000 may be a dedicated computer designed for providing the information processing apparatus 2000 or may be a general-purpose computer.
  • The computer 1000 includes a bus 1020, a processor 1040, a memory 1060, a storage device 1080, an input-output interface 1100, and a network interface 1120. The bus 1020 is a data transmission channel for the processor 1040, the memory 1060, the storage device 1080, the input-output interface 1100, and the network interface 1120 to mutually transmit and receive data. However, a method of connecting the processor 1040 and the like to each another is not limited to the bus connection.
  • The processor 1040 includes various processors such as a central processing unit (CPU), a graphics processing unit (GPU), and a field-programmable gate array (FPGA). The memory 1060 is a main storage provided by use of a random access memory (RAM) and/or the like. The storage device 1080 is an auxiliary storage provided by use of a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), and/or the like.
  • The input-output interface 1100 is an interface for connecting the computer 1000 to an input/output device. For example, the input-output interface 1100 is connected to an input apparatus such as a keyboard and an output apparatus such as a display apparatus. The network interface 1120 is an interface for connecting the computer 1000 to a communication network. Examples of the communication network include a local area network (LAN) and a wide area network (WAN). A method of connecting the network interface 1120 to the communication network may be a wireless connection or a wired connection.
  • The storage device 1080 stores a program module providing each functional configuration unit in the information processing apparatus 2000. The processor 1040 provides a function relating to each program module by reading the program module into the memory 1060 and executing the program module.
  • Processing Flow
  • FIG. 5 is a flowchart illustrating a flow of processing executed by the information processing apparatus 2000 according to the example embodiment 1. The generation unit 2020 acquires image data 10 (S102). The generation unit 2020 generates likelihood data for each of a plurality of partial regions 12 included in the image data 10 (S104). The extraction unit 2040 computes a PHD by adding up likelihoods represented by the likelihood data (S106). The extraction unit 2040 extracts one or more partial distributions from the PHD (S108). For each partial distribution, the output unit 2060 outputs a position and a size of a target object relating to the partial distribution (S110).
  • The information processing apparatus 2000 may execute a series of processes illustrated in FIG. 5 in response to any trigger. For example, the information processing apparatus 2000 executes the aforementioned series of processes in response to input of the image data 10. In addition, for example, the information processing apparatus 2000 may execute the aforementioned series of processes in response to a predetermined input operation by a user.
  • Acquisition of Image Data 10: S102
  • The generation unit 2020 acquires image data 10 (S102). Any image data may be used as the image data 10. For example, the image data 10 are a captured image generated by a camera. The camera may be a still camera or a video camera. Note that “a captured image generated by a camera” may be a captured image generated by a camera itself or an image acquired by applying some processing on a captured image generated by a camera.
  • When a captured image is used as the image data 10, the information processing apparatus 2000 may be provided inside a camera generating the image data 10. For example, by providing the information processing apparatus 2000 inside a surveillance camera, an object can be detected in real time from a surveillance video generated by the surveillance camera. For example, types of camera called an intelligent camera, an Internet Protocol (IP) camera, and a network camera can be used as a camera incorporating the function of the information processing apparatus 2000.
  • The generation unit 2020 may acquire image data 10 by any method. For example, the generation unit 2020 acquires image data 10 from a storage storing the image data 10. The storage storing the image data 10 may be provided inside the information processing apparatus 2000 or may be provided outside. In addition, for example, the information processing apparatus 2000 acquires image data 10 input by an input operation by a user. In addition, for example, the generation unit 2020 acquires image data 10 by receiving the image data 10 transmitted by another apparatus.
  • Partial Region 12
  • A partial region 12 is a partial image region included in the image data 10. A partial region 12 is different from another partial region 12 with respect to at least either one of a position and a size.
  • The generation unit 2020 extracts each partial region 12 included in the image data 10 and, by analyzing the extracted partial region 12, generates likelihood data for the partial region 12. For example, a partial region 12 can be extracted by use of a sliding window. FIG. 6 is a diagram illustrating a method of extracting a partial region 12 by use of a sliding window. The information processing apparatus 2000 moves a sliding window with a predetermined size (width: Ws, height: Hs) at a predetermined stride d. A plurality of image regions with different sizes are extracted from the sliding window at various positions and each image region is handled as a partial region 12. Thus, partial regions 12 with varying positions and sizes can be extracted. Note that, for example, a technique using an Anchor box disclosed in Patent Document 1 can be used to extract as a thus partial region 12.
  • A partial region 12 may be extracted from a feature map generated from the image data instead of being directly extracted from the image data 10. In this case, for example, a neural network 20 to be described later is constituted of a layer for extracting a feature map from the image data 10 (such as a convolutional layer in a convolutional neural network) and a layer for extracting a partial region 12 from a feature map output from the layer and generating likelihood data.
  • A shape of a partial region 12 is not necessarily limited to a rectangle. For example, when a shape of a partial region 12 is a perfect circle, the partial region 12 can be represented by center coordinates and a length of a radius. Further, when a partial region 12 is represented by a set of vertices, a polygon in any shape can be handled as a partial region 12. In this case, both a position and a size of the partial region 12 is determined by a set of vertices of the partial region 12.
  • Generation of Likelihood Data: S104
  • The generation unit 2020 generates parameters representing likelihood data for each of a plurality of partial regions 12 included in the image data 10 and generates likelihood data (S104). For example, parameters representing likelihood data are generated by use of a neural network. FIG. 7 is a diagram illustrating a neural network used for generation of parameters representing likelihood data. In response to input of the image data 10, a neural network 20 outputs, for each partial region 12 included in the image data 10, a likelihood Li that a target object exists in an image region with the position and the size of the partial region 12. Li is a likelihood output for an i-th partial region 12.
  • For example, the generation unit 2020 sets a distribution determined based on a likelihood Li as likelihood data.
  • FIG. 8 is a diagram conceptually illustrating likelihood data generated based on a likelihood Li. In the upper part of FIG. 8 , likelihood data represent a distribution having a variance of 0 and being generated based on a likelihood Li. The distribution is expressed as Li×δ function by use a δ function.
  • On the other hand, likelihood data in the lower part of FIG. 8 represent a distribution with a nonzero variance. For example, a distribution conforming to a predetermined model such as a normal distribution is predetermined as a distribution as a reference (hereinafter referred to as a reference distribution). When a normal distribution is used, for example, a reference distribution may be determined as a distribution having 1 as the integral value, the position and the size of the partial region 12 as the mean, and a predetermined value as the variance. Any value may be set to the variance.
  • The generation unit 2020 generates likelihood data by multiplying a reference distribution by a likelihood Li. For example, in the lower part of FIG. 8 , a reference distribution model is a normal distribution. Then, based on the position (xi, yi) of the partial region 12 and the size (wi, hi) of the partial region 12, the mean of the reference distribution is (xi, yi, wi, hi). Further, the variance of the reference distribution is vi. From the above, the reference distribution is N[(xi, yi, wi, hi), vi]. Furthermore, a likelihood output from the neural network 20 is Li. Then, the generation unit 2020 generates a distribution indicating the likelihood data by multiplying the reference distribution by Li. The integral value of a distribution of the acquired likelihood data is Li.
  • A reference distribution conforming to a distribution model may not be predetermined, and parameters of a distribution model may be output from the neural network 20. For example, when a normal distribution is used, parameters of a distribution model are the aforementioned mean and variance. Then, the neural network 20 outputs a mean and a variance for each partial region 12.
  • FIG. 9 is a diagram illustrating the neural network 20 outputting parameters of a normal distribution indicated by likelihood data. In FIG. 9 , “a likelihood Li, (xiu, yiu, wiu, hiu) representing the mean of a normal distribution, and the variance vi of the normal distribution” are output for each partial region 12. Then, by multiplying the normal distribution determined by the mean and the variance output from the neural network 20 by the likelihood Li for each partial region 12, the generation unit 2020 generates a distribution indicated by the likelihood data.
  • The position (xi, yi) output from the neural network 20 may be different from the original position of a relating i-th partial region 12. Similarly, the size (wi, hi) output from the neural network 20 may be different from the original size of the relating i-th partial region 12. The reason is that, as will be described later, the neural network 20 adjusts and outputs the position and the size of the partial region 12 in such a way as to increase a likelihood that a target object is included in the partial region 12 by causing the neural network 20 to perform learning in such a way as to output an ideal PHD.
  • Note that the neural network 20 does not necessarily output all parameters of the distribution model and may output only part of the parameters. For example, the mean of the normal distribution is output from the neural network 20, and a predetermined value is used as the variance.
  • In order to make the neural network 20 perform the operation described above, it is necessary to cause the neural network 20 to previously perform learning in such a way that such an operation is performed. A learning method of the neural network 20 will be described later. Note that any structure may be used as an internal structure (such as the number and an order of layers, a type of each layer, and a connection relation between the layers) of the neural network. For example, the same structure as that of the region proposal network (RPN) described in Patent Document 1 may be adopted as the structure of the neural network 20. Alternatively, the network described in Non Patent Document 1 may be used.
  • Note that generation of likelihood data does not necessarily need to be performed by use of a neural network, and another existing technique of, for each of a plurality of partial regions in image data, computing a likelihood that a target object is included in the partial region may be used.
  • Extraction of Partial Distribution: S108
  • The extraction unit 2040 extracts one or more partial distributions from the PHD. A partial distribution is a probability distribution representing, with respect to a partial region including one target object, an existence probability of a target object with respect to the position and the size of the partial region. A partial distribution is a probability distribution, and the integral value thereof is 1.
  • First, the extraction unit 2040 computes the number of target objects included in the image data 10, based on the PHD. Specifically, the extraction unit 2040 computes the integral value of the PHD and determines the computed integral value to be the number of target objects included in the image data 10. However, it is conceivable that the integral value of the PHD does not completely match the number of target objects due to an error or the like and is not a natural number. Then, in this case, the extraction unit 2040 handles an approximate value (such as a value acquired by dropping the fractional portion) of the integral value of the PHD as the number of target objects.
  • The extraction unit 2040 extracts the computed number of partial distributions from the PHD. For example, the extraction unit 2040 extracts partial distributions from the PHD on the basis of the maximum value of the PHD. FIG. 10 is a flowchart illustrating a flow of processing of extracting partial distributions on the basis of the maximum value of the PHD. Loop processing illustrated in the flowchart in FIG. 10 is repeatedly executed while a counter i is less than the integral value S of the PHD. The counter i is initialized to 0 at first and is incremented by 1 every time the loop processing is executed. In this case, the number of partial distributions is a maximum integer equal to or less than S.
  • In S202, the extraction unit 2040 determines whether the counter i is less than S. When i is less than S, the processing in FIG. 10 advances to S204. On the other hand, when i is equal to or greater than S, the processing in FIG. 10 ends.
  • The extraction unit 2040 determines a position and a size relating to the maximum value of the PHD (S204). The extraction unit 2040 extracts a partial distribution being centered on the position and the size and having the integral value of 1 from the PHD (removes the partial distribution from the PHD) (S206). Since S208 is the end of the loop processing, the processing returns to S202.
  • In addition to the method illustrated in FIG. 10 , any space clustering technique may also be used as a method of extracting partial distributions from a PHD. For example, denoting each output result as Li and a preset probability density function as fi, a PHD can be written as the total sum Σi(Li×fi) of the output results. Hierarchical clustering of computing a distance between positions represented by all output results Li, adding output results at a short distance from each other, and decreasing the total number down to a predetermined number may be adopted. At this time, since it is desirable that Li be as close to 1 as possible, for example, processing of, when adding an output i and an output i′, comparing “the square mean of (1−Li) and (1−Li′)” with “the square of the difference between Li+Li′ and 1” and not performing the addition processing when the former is smaller may be performed. Alternatively, various clustering techniques may be performed and a result with the minimum square sum of (1−Li) may be selected.
  • Output of Result: S110
  • For each extracted partial distribution, the output unit 2060 outputs a position and a size of a target object represented by the partial distribution (S110). Specifically, the output unit 2060 determines the position and the size of the target object, based on a statistic of the partial distribution. For example, the output unit 2060 determines the mean of the partial distribution to be the position and the size of the target object. In addition, for example, the output unit 2060 may determine a position and a size relating to the maximum value of the partial distribution to be the position and the size of the target object. Then, the output unit 2060 outputs the determined position and size for each partial distribution.
  • FIG. 11 is a diagram illustrating a position and a size of a target object determined based on a partial distribution. In FIG. 11 , two partial distributions D1 and D2 are extracted from a PHD. The output unit 2060 determines a position (x1, y1) and a size (w1, h1) of a target object, based on the partial distribution D1. Similarly, the output unit 2060 determines a position (x2, y2) and a size (w2, h2) of a target object, based on the partial distribution D2. From the above, each of an image region at the position (x1, y1) with a width w1 and a height h1, and an image region at the position (x2, y2) with a width w2 and a height h2 represents a target object.
  • The output unit 2060 outputs a position and a size of a target object in various forms. For example, the output unit 2060 stores, into a storage, data (such as a list) indicating, for each target object, a combination of “an identifier assigned to the target object, the position of the target object, and the size of the target object” in association with the image data 10. Note that any method may be used as a method of assigning an identifier to an object detected from image data.
  • In addition, for example, the output unit 2060 may output a display (such as frame) indicating a position and a size of a determined target object, the display being superposed on the image data 10, as illustrated in FIG. 11 . The display may be output to any destination and may be output to, for example, a storage and/or a display apparatus.
  • Note that the output unit 2060 may further output the number of target objects. A computation method of the number of target objects is as described above.
  • Learning by Neural Network 20
  • As described above, learning by the neural network 20 needs to be performed in advance. The learning by the neural network 20 may be performed by the information processing apparatus 2000 or may be performed by an apparatus other than the information processing apparatus 2000. The description herein assumes that the information processing apparatus 2000 performs the learning by the neural network 20. FIG. 12 is a block diagram illustrating the information processing apparatus 2000 having a function of performing learning by the neural network 20. The learning by the information processing apparatus 2000 is executed by a learning unit 2080.
  • The learning unit 2080 computes a predicted loss between a PHD based on an actual output of the neural network 20 and an ideal PHD. The ideal PHD may be expressed as the sum of normal distributions each of which being previously specified with a variance and being centered on a position of a rectangle representing an object being a correct answer. Alternatively, the ideal PHD may be handled as a δ function the variance of which is 0, or another function may be used. Next, learning by the neural network 20 is performed based on the predicted loss. More specifically, the learning unit 2080 performs learning by the neural network 20 by updating parameters (a weight value and a bias value) of the neural network 20 by propagating the computed predicted loss in inverse order (back propagating) from an output node in the neural network 20. Various existing methods such as a gradient descent method may be used as a method of performing learning by a neural network by back propagation based on a predicted loss. A determination method and a computation method of a predicted loss used in learning by the neural network 20 will be described below.
  • The learning unit 2080 computes a PHD relating to an actual output by use of the actual output acquired by inputting image data for learning (hereinafter referred to as learning image data) to the neural network 20. The learning unit 2080 further computes a predicted loss between the PHD relating to the actual output and an ideal PHD predetermined based on the learning image data. For example, the square error between the PHDs may be used as the predicted loss. Alternatively, since a PHD divided by the integral value can be handled as a probability density function the integral value of which is 1, any technique capable of handling a loss as an error between probability density functions may be used. For example, the minus value of the product of an ideal probability density function and a probability density function relating to the actual output may be determined as a loss. Alternatively, an error of the integral value may be handled as a loss, or several of the losses may be combined.
  • As a more specific example, denoting each output result as Li and a preset probability density function as fi, a PHD relating to an actual output can be written as Σi(Li×fi). Further, denoting a position of a rectangle of each object being a correct answer as yj and a distribution as a basis for computing a PHD as gj, an ideal PHD can be written as Σj(gj). As a technique of minimizing an error between the two, one or a plurality of neighboring outputs i are previously assigned to each correct answer j. Denoting the number of the assigned outputs as Nj, an error between Li for assigned i and (1/Nj), such as the square of (Li−1/Nj) may be minimized. This is a technique for learning Li in such a way that the integral values match.
  • With respect to each image region in which a target object exists in learning image data, an ideal PHD indicates a distribution (δ function) having a likelihood of 1 at a position of the position and the size of the image region and having a variance of 0. FIG. 13 is a diagram illustrating an ideal PHD. In learning image data 30 in FIG. 13 , target objects are included in two image regions 40-1 and 40-2. The position and the size of the image region 40-1 are (x1, y1) and (w1, h1), respectively. Therefore, an ideal PHD indicates a δ function with a peak at (x1, y1, w1, h1). Further, the position and the size of the image region 40-2 are (x2, y2) and (w2, h2), respectively. Therefore, an ideal PHD indicates a δ function with a peak at (x2, y2, w2, h2).
  • For example, an ideal PHD relating to learning image data is previously generated by hand and is stored in a storage in association with the learning image data. The learning unit 2080 performs learning by the neural network 20 by use of one or more of thus prepared combinations of learning image data and an ideal PHD.
  • Example Embodiment 2
  • An information processing apparatus 2000 according to an example embodiment 2 distinctively handles a plurality of types of target objects. To do so, the generation unit 2020 according to the example embodiment 2 generates likelihood data for each of mutually different types of target objects. Therefore, likelihood data are generated for each type of target object for one partial region 12.
  • Further, an extraction unit 2040 according to the example embodiment 2 generates a PHD for each type of target object. This is achieved by adding up likelihood data for each type of target object. Then, the extraction unit 2040 extracts a partial distribution from each PHD.
  • An output unit 2060 according to the example embodiment 2 outputs a position and a size of a target object relating to each partial distribution. Each partial distribution relates to one type of target object. Then, the output unit 2060 outputs a position and a size of a target object relating to a partial distribution along with the type of the target object.
  • When the information processing apparatus 2000 is provided by use of a neural network 20, for example, the information processing apparatus 2000 includes a neural network 20 for each type of target object. Each neural network 20 previously performs learning in such a way as to detect a relating type of target object. For example, as for a neural network 20 handling a human as a target object, an ideal PHD is set to indicate a likelihood of 1 for a position and a size of an image region representing a human in learning image data and indicate a likelihood of 0 for a position and a size of another image region (an image region in which an object does not exist or an object other than a human exists).
  • Consequently, an ideal PHD is prepared for each type of target object for learning image data. A learning unit 2080 causes a neural network 20 for detecting a certain type of target object to perform learning by use of a combination of “learning image data and an ideal PHD for the type of target object.”
  • Hardware Configuration Example
  • For example, a hardware configuration of a computer providing the information processing apparatus 2000 according to the example embodiment 2 is illustrated by FIG. 4 , similarly to the example embodiment 1. However, a storage device 1080 in a computer 1000 providing the information processing apparatus 2000 according to the present example embodiment further stores a program module providing the function of the information processing apparatus 2000 according to the present example embodiment.
  • Advantageous Effects
  • The information processing apparatus 2000 according to the present example embodiment can detect a target object for each type thereof. Accordingly, positions of mutually different types of target objects can be recognized including the types thereof.
  • While the example embodiments of the present invention has been described above with reference to the drawings, the drawings are exemplifications of the present invention; and various configurations other than the above may be adopted.

Claims (11)

1. An information processing apparatus comprising:
at least one memory configured to store instructions; and
at least one processor configured to execute the instructions to perform:
training a neural network by use of one or more combinations of prepared learning image data and an ideal PHD for each of mutually different types of target objects;
acquiring image data;
generating likelihood data for each of a plurality of partial regions included in the image data by inputting the acquired image data to the trained neural network;
computing a distribution of a likelihood of existence of the target objects with respect to a position and a size by computing a total sum of the likelihood data, and extracting, from the computed distribution, one or more partial distributions each of which relates to one target object; and
outputting, for each of the one or more partial distribution, a position and a size of the one target object relating to the partial distribution, based on a statistic of the partial distribution.
2. The information processing apparatus according to claim 1, wherein
the likelihood data is represented by a distribution conforming to a predetermined model, and
for the each partial region, the trained neural network outputs a likelihood that a target object exists in the partial region and a parameter value of the predetermined model.
3. The information processing apparatus according to claim 1, wherein
the at least one processor is configured to execute the instructions to perform:
computing a number of target objects included in the image data, based on an integral value of the distribution represented by the total sum of the likelihood data, and
extracting as many as the number of the partial distributions from the distribution represented by the total sum of the likelihood data.
4. The information processing apparatus according to claim 1, wherein
the at least one processor is configured to execute the instructions to perform:
extracting the partial distributions an integral value of each of which is 1 from the distribution represented by the total sum of the likelihood data.
5. The information processing apparatus according to claim 1, wherein
the at least one processor is configured to execute the instructions to perform:
generating the likelihood data for each of mutually different types of the target objects;
computing, for each of mutually different types of the target objects, a distribution of a likelihood of existence of the target objects and extracting the partial distribution from the distribution; and
outputting a position and a size of a target object relating to the each partial distribution along with a type of the target objects relating to the partial distribution.
6. A control method executed by at least one computer, the control method comprising:
training a neural network by use of one or more combinations of prepared learning image data and an ideal PHD for each of mutually different types of target objects;
acquiring image data;
generating likelihood data for each of a plurality of partial regions included in the image data by inputting the acquired image data to the trained neural network;
computing a distribution of a likelihood of existence of the target objects with respect to a position and a size by computing a total sum of the likelihood data, and extracting, from the computed distribution, one or more partial distributions each of which relates to one target object; and
outputting, for each of the one or more partial distribution, a position and a size of the one target object relating to the partial distribution, based on a statistic of the partial distribution.
7. The control method according to claim 6, wherein,
the control method comprises:
the likelihood data is represented by a distribution conforming to a predetermined model, and
for the each partial region, the trained neural network outputs a likelihood that a target object exists in the partial region and a parameter value of the predetermined model.
8. The control method according to claim 6, wherein
the control method comprises:
computing a number of target objects included in the image data, based on an integral value of the distribution represented by the total sum of the likelihood data; and
extracting as many as the number of the partial distributions from a distribution represented by the total sum of the likelihood data.
9. The control method according to claim 6, wherein
the control method comprises:
extracting the partial distributions an integral value of each of which is 1 from a distribution represented by the total sum of the likelihood data.
10. The control method according to claim 6, wherein
the control method comprises:
generating the likelihood data for each of mutually different types of the target objects;
computing, for each of mutually different types of the target objects, a distribution of a likelihood of existence of the target objects and extracting the partial distribution from the distribution; and
outputting a position and a size of a target object relating to the each partial distribution along with a type of the target objects relating to the partial distribution.
11. A non-transitory recording medium storing a program causing at least one computer to execute:
training a neural network by use of one or more combinations of prepared learning image data and an ideal PHD for each of mutually different types of target objects;
acquiring image data;
generating likelihood data for each of a plurality of partial regions included in the image data by inputting the acquired image data to the trained neural network;
computing a distribution of a likelihood of existence of the target objects with respect to a position and a size by computing a total sum of the likelihood data, and extracting, from the computed distribution, one or more partial distributions each of which relates to one target object; and
outputting, for each of the one or more partial distribution, a position and a size of the one target object relating to the partial distribution, based on a statistic of the partial distribution.
US18/227,699 2018-06-01 2023-07-28 Information processing device, control method, and program Pending US20230368033A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/227,699 US20230368033A1 (en) 2018-06-01 2023-07-28 Information processing device, control method, and program

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
PCT/JP2018/021207 WO2019229979A1 (en) 2018-06-01 2018-06-01 Information processing device, control method, and program
US202017059678A 2020-11-30 2020-11-30
US18/227,699 US20230368033A1 (en) 2018-06-01 2023-07-28 Information processing device, control method, and program

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US17/059,678 Continuation US20210209396A1 (en) 2018-06-01 2018-06-01 Information processing device, control method, and program
PCT/JP2018/021207 Continuation WO2019229979A1 (en) 2018-06-01 2018-06-01 Information processing device, control method, and program

Publications (1)

Publication Number Publication Date
US20230368033A1 true US20230368033A1 (en) 2023-11-16

Family

ID=68696866

Family Applications (2)

Application Number Title Priority Date Filing Date
US17/059,678 Pending US20210209396A1 (en) 2018-06-01 2018-06-01 Information processing device, control method, and program
US18/227,699 Pending US20230368033A1 (en) 2018-06-01 2023-07-28 Information processing device, control method, and program

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US17/059,678 Pending US20210209396A1 (en) 2018-06-01 2018-06-01 Information processing device, control method, and program

Country Status (3)

Country Link
US (2) US20210209396A1 (en)
JP (1) JP7006782B2 (en)
WO (1) WO2019229979A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020240809A1 (en) * 2019-05-31 2020-12-03 楽天株式会社 Learning device, classification device, learning method, classification method, learning program, and classification program
JP2021103347A (en) * 2019-12-24 2021-07-15 キヤノン株式会社 Information processing device, information processing method and program
WO2024024048A1 (en) * 2022-07-28 2024-02-01 日本電信電話株式会社 Object detection device, object detection method, and object detection program

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5687082B2 (en) * 2011-01-31 2015-03-18 セコム株式会社 Moving object tracking device
JP5841390B2 (en) * 2011-09-30 2016-01-13 セコム株式会社 Moving object tracking device
US9946935B2 (en) * 2013-07-17 2018-04-17 Nec Corporation Object tracking device, object tracking method, and object tracking program
KR20150051711A (en) * 2013-11-05 2015-05-13 한국전자통신연구원 Apparatus and method for extracting skin area for blocking harmful content image
US10325160B2 (en) * 2015-01-14 2019-06-18 Nec Corporation Movement state estimation device, movement state estimation method and program recording medium
JP2016162072A (en) * 2015-02-27 2016-09-05 株式会社東芝 Feature quantity extraction apparatus
US9858496B2 (en) * 2016-01-20 2018-01-02 Microsoft Technology Licensing, Llc Object detection and classification in images

Also Published As

Publication number Publication date
WO2019229979A1 (en) 2019-12-05
US20210209396A1 (en) 2021-07-08
JPWO2019229979A1 (en) 2021-05-13
JP7006782B2 (en) 2022-01-24

Similar Documents

Publication Publication Date Title
US20230368033A1 (en) Information processing device, control method, and program
US10885365B2 (en) Method and apparatus for detecting object keypoint, and electronic device
CN107358149B (en) Human body posture detection method and device
US10380788B2 (en) Fast and precise object alignment and 3D shape reconstruction from a single 2D image
KR100647322B1 (en) Apparatus and method of generating shape model of object and apparatus and method of automatically searching feature points of object employing the same
US11853882B2 (en) Methods, apparatus, and storage medium for classifying graph nodes
US8369574B2 (en) Person tracking method, person tracking apparatus, and person tracking program storage medium
US9928405B2 (en) System and method for detecting and tracking facial features in images
EP3928248A1 (en) Neural network for skeletons from input images
US8374392B2 (en) Person tracking method, person tracking apparatus, and person tracking program storage medium
US8355576B2 (en) Method and system for crowd segmentation
KR101930940B1 (en) Apparatus and method for analyzing image
US20220254134A1 (en) Region recognition method, apparatus and device, and readable storage medium
KR20100098641A (en) Invariant visual scene and object recognition
CN106971401A (en) Multiple target tracking apparatus and method
US11763086B1 (en) Anomaly detection in text
WO2022152009A1 (en) Target detection method and apparatus, and device and storage medium
CN111353325A (en) Key point detection model training method and device
CN115471863A (en) Three-dimensional posture acquisition method, model training method and related equipment
JP7385416B2 (en) Image processing device, image processing system, image processing method, and image processing program
WO2021140590A1 (en) Human detection device, human detection method, and recording medium
CN111986230A (en) Method and device for tracking posture of target object in video
JP7369247B2 (en) Information processing device, information processing method and program
KR102622941B1 (en) Apparatus and method of image processing to improve detection and recognition performance for samll objects
KR102619275B1 (en) Object search model and learning method thereof

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION