WO2018163786A2 - Target subject analysis apparatus, target subject analysis method, learning apparatus, and learning method - Google Patents

Target subject analysis apparatus, target subject analysis method, learning apparatus, and learning method Download PDF

Info

Publication number
WO2018163786A2
WO2018163786A2 PCT/JP2018/005819 JP2018005819W WO2018163786A2 WO 2018163786 A2 WO2018163786 A2 WO 2018163786A2 JP 2018005819 W JP2018005819 W JP 2018005819W WO 2018163786 A2 WO2018163786 A2 WO 2018163786A2
Authority
WO
WIPO (PCT)
Prior art keywords
target subject
neural network
data
attribute
learning
Prior art date
Application number
PCT/JP2018/005819
Other languages
French (fr)
Other versions
WO2018163786A3 (en
Inventor
Tanichi Ando
Original Assignee
Omron Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Omron Corporation filed Critical Omron Corporation
Publication of WO2018163786A2 publication Critical patent/WO2018163786A2/en
Publication of WO2018163786A3 publication Critical patent/WO2018163786A3/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present invention relates to a target subject analysis apparatus, a target subject analysis method, a learning apparatus, and a learning method.
  • JP H06-124120A proposes a motion controller that inputs image information obtained by a CCD camera and range information obtained by an ultrasonic range sensor to the same neural network, and drives a motion device based on an output signal from this neural network.
  • JP 2005-346297A proposes a three-dimensional object recognition device that identifies a three-dimensional object in a range image, which is generated using a neural network based on a pair of images shot by a stereo camera. Specifically, this three-dimensional object recognition device generates a range image utilizing parallax that occurs between the pair of images obtained by the stereo camera, and groups pieces of range data that represent the same three-dimensional object in the generated range image.
  • the three-dimensional object recognition device sets, in the range image, a smallest area that contains the grouped pieces of range data of the three-dimensional object, and sets an input value with typical range data serving as an element, for each sub-area, which is obtained by dividing the aforementioned smallest area by a set division number.
  • the three-dimensional object recognition apparatus then identifies the type of the three-dimensional object based on the pattern of output values that are obtained by inputting the set input values to the neural network.
  • the method in JP H06-124120A uses range information obtained by an ultrasonic range sensor.
  • the values of the distance represented by this range information do not directly correspond to the pixels of the image obtained by the CCD camera.
  • the area in which the values of the distance can be acquired by the ultrasonic range sensor is a portion of a shooting area that appears in the image. That is to say, the ultrasonic range sensor cannot acquire the values of the distance to a target subject that appears on all of the pixels constituting the image. For this reason, with the method in JP H06-124120A, it is difficult to increase recognition accuracy with respect to attributes of a target subject.
  • JP 2005-346297A image processing, such as stereo mapping for searching for corresponding points between a pair of images and grouping of pieces of range data that represent the same three-dimensional object, is performed before data is input to the neural network. For this reason, the method in JP 2005-346297A makes the system configuration complex.
  • the present invention has been made in view of the foregoing points in an aspect, and aims to provide a technique that is able to increase recognition accuracy with respect to attributes of a target subject, with a simple configuration.
  • the present invention employs the following configuration.
  • a target subject analysis apparatus includes: a data acquisition unit configured to acquire image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image; a neural network computing unit configured to obtain an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network; and an attribute specifying unit configured to specify the attribute of the target subject based on the output value obtained from the neural network.
  • the above-described target subject analysis apparatus uses, as the input to the neural network, the image data that represents an image including a figure of a target subject, and also the range data that represents the value of the distance at each pixel constituting the image. Accordingly, since the values of the distance are obtained for the respective pixels in the image, the recognition accuracy of the neural network with respect to the attribute of the target subject can be increased.
  • the above-described target subject analysis apparatus can specify the attribute of the target subject simply by inputting the image data and the range data to the neural network, without performing advanced image processing on the image. It is therefore possible to realize processing for analyzing the attributes of the target subject with a simple configuration, reduce the processing load on a CPU, and reduce the capacity of a memory to be used.
  • the recognition accuracy with respect to the attributes of the target subject can be increased with a simple configuration.
  • the target subject may include anything that can be shot by an image capturing device.
  • the attribute of the target subject to be specified may include any kind of feature of the target subject that appears in the image.
  • the attribute specifying unit may specify, as the attribute of the target subject, at least one of an uneven state, material quality, a three-dimensional shape, and a flat state of the target subject.
  • the uneven state indicates the shape, size, or the like of a raised portion and a recessed portion that are present on the target subject.
  • the raised portion includes a protrusion.
  • the recessed portion includes an opening and a hole.
  • the flat state indicates the degree of expansion of a face of the target subject, the degree of tilt thereof, or the like.
  • the uneven state, material quality, three-dimensional shape, and flat state are attributes that are difficult to analyze using a single image. This configuration uses not only the image data but also the range data, and it is therefore possible to relatively accurately identify these attributes that are difficult to analyze using a single image.
  • the attribute specifying unit may specify, as the attribute of the target subject, a plurality of physical characteristics of the target subject.
  • the target subject can be identified relatively accurately.
  • the physical characteristics refer to features that physically appear on the target subject.
  • the physical characteristics include geometric features such as the size, shape, and posture of the target subject, and features concerning material quality such as the composition of the target subject.
  • the target subject analysis apparatus may further include a neural network selection unit configured to select, in accordance with designation by a user, a neural network to be used by the neural network computing unit from among a plurality of trained neural networks that have been trained to determine attributes of different target subjects.
  • a neural network selection unit configured to select, in accordance with designation by a user, a neural network to be used by the neural network computing unit from among a plurality of trained neural networks that have been trained to determine attributes of different target subjects.
  • the image data and the range data may be obtained by shooting a situation outside a vehicle as the target subject, and the attribute specifying unit may specify, as the attribute of the target subject, at least one of a state of a road surface, presence of an obstacle, and a type of obstacle, based on the output value obtained from the neural network.
  • This configuration can provide a target subject analysis apparatus able to accurately identify the situation outside the vehicle.
  • the image data and the range data may be obtained by shooting, as the target subject, a product manufactured on a production line
  • the attribute specifying unit may specify, as the attribute of the target subject, at least one of a size of the product, a shape of the product, and presence of damage on the product, based on the output value obtained from the neural network.
  • the image data and the range data may be obtained by shooting a human as the target subject, and the attribute specifying unit may specify, as the attribute of the target subject, at least one of a body shape of the human, a facial expression of the human, and a posture of the human, based on the output value obtained from the neural network.
  • a learning apparatus includes: a learning data acquisition unit configured to acquire, as learning data, a set of image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject; and a learning processing unit configured to train a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data upon the image data and the range data being input.
  • a trained neural network to be used in the above-described target subject analysis apparatus can be constructed in accordance with a desired target of analysis.
  • modes of the target subject analysis apparatus and the learning apparatus according to the above respective modes may also include an information processing method and a program for realizing the above respective configurations, and a storage medium that records such a program and can be read by a computer or other kind of apparatus, machine, or the like.
  • a storage medium that can be read by a computer or the like is a medium that accumulates information such as a program by means of an electric, magnetic, optical, mechanical, or chemical effect.
  • a target subject analysis method is an information processing method wherein a computer executes: a data acquisition step of acquiring image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image; a computation processing step of obtaining an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network; and an attribute specifying step of specifying the attribute of the target subject based on the output value obtained from the neural network.
  • the computer may specify, as the attribute of the target subject, at least one of an uneven state, material quality, a three-dimensional shape, and a flat state of the target subject.
  • the computer may specify a plurality of physical characteristics of the target subject as the attribute of the target subject.
  • the computer may further execute a selection step of selecting, in accordance with designation by a user, a neural network to be used in the computation processing step from among a plurality of trained neural networks that have been trained to determine attributes of different target subjects.
  • the image data and the range data may be obtained by shooting a situation outside a vehicle as the target subject, and in the attribute specifying step, the computer may specify at least one of a state of a road surface, presence of an obstacle, and a type of obstacle as the attribute of the target subject, based on the output value obtained from the neural network.
  • the image data and the range data may be obtained by shooting, as the target subject, a product manufactured on a production line, and in the attribute specifying step, the computer may specify at least one of a size of the product, a shape of the product, and presence of damage on the product as the attribute of the target subject, based on the output value obtained from the neural network.
  • the image data and the range data may be obtained by shooting a human as the target subject, and in the attribute specifying step, the computer may specify at least one of a body shape of the human, a facial expression of the human, and a posture of the human as the attribute of the target subject, based on the output value obtained from the neural network.
  • a target subject analysis program is a program for causing a computer to execute: a data acquisition step of acquiring image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image; a computation processing step of obtaining an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network; and an attribute specifying step of specifying the attribute of the target subject based on the output value obtained from the neural network.
  • a learning method is an information processing method for executing a learning data acquisition step of acquiring, as learning data, a set of image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject; and a learning processing step of training a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data if the image data and the range data are input.
  • a learning program is a program for causing a computer to execute a learning data acquisition step of acquiring, as learning data, a set of image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject; and a learning processing step of training a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data if the image data and the range data are input.
  • recognition accuracy with respect to attributes of a target subject can be increased with a simple configuration.
  • FIG. 1 schematically shows an example of an instance where a target subject analysis apparatus and a learning apparatus according to an embodiment are applied.
  • FIG. 2 schematically shows an example of a hardware configuration of the target subject analysis apparatus according to an embodiment.
  • FIG. 3 schematically shows an example of a hardware configuration of the learning apparatus according to an embodiment.
  • FIG. 4 schematically shows an example of a functional configuration of the target subject analysis apparatus according to an embodiment.
  • FIG. 5 is a diagram illustrating image data and range data according to an embodiment.
  • FIG. 6 schematically shows an example of a functional configuration of the learning apparatus according to an embodiment.
  • FIG. 7 shows an example of a processing procedure of the target subject analysis apparatus.
  • FIG. 8A shows an example of an instance where attributes of a target subject are analyzed.
  • FIG. 8B shows an example of an instance where attributes of a target subject are analyzed.
  • FIG. 8C shows an example of an instance where attributes of a target subject are analyzed.
  • FIG. 9 shows an example of a processing procedure of
  • FIG. 1 schematically shows an example of an instance where a target subject analysis apparatus 1 and a learning apparatus 2 according to the embodiment are applied.
  • the target subject analysis apparatus 1 according to the embodiment is an information processing apparatus for analyzing attributes of a target subject 6 using a neural network.
  • the target subject analysis apparatus 1 acquires, via a camera 3, image data that represents an image including a figure of the target subject 6, and range data that represents values of the distance at respective pixels constituting the image.
  • the target subject 6 may include anything that can be shot by a photographic device, and may be, for example, a scene such as a situation outside a vehicle, a product manufactured on a production line, or a predetermined object such as a person.
  • the camera 3 may not be particularly limited as long as it is a photographic device able to take a general image (e.g. monochrome image, color image) and measure the distance at the respective pixels constituting the image, and may be selected as appropriate, as per an embodiment.
  • the camera 3 may be OPTEX’s ZC-1000L-HP series, Microsoft’s Kinect, ASUS’s Xtion, or LYTRO Japan’s ILLUM, for example.
  • the target subject analysis apparatus 1 performs computation processing with a trained neural network that is for determining attributes of the target subject 6 using the acquired image data and range data as the input to this neural network, thereby obtaining output values from the neural network.
  • the target subject analysis apparatus 1 specifies the attributes of the target subject 6 based on the output values obtained from the neural network.
  • the attributes of the target subject 6 to be specified may not be particularly limited as long as the attributes are features of the target subject 6 that appear in the image, and may be selected as appropriate, as per an embodiment.
  • the learning apparatus 2 is an information processing apparatus for creating a neural network to be used by the target subject analysis apparatus 1, i.e. an apparatus for training the neural network. Specifically, the learning apparatus 2 acquires, as learning data, a set of data including the image data that represents an image including a figure of the target subject 6, the range data that represents the values of the distance at the respective pixels constituting the image, and attribute data that represents the attributes of the target subject 6. The learning data is created as appropriate in accordance with the attributes of the target subject 6 for which learning is desired.
  • the learning apparatus 2 trains the neural network, using the learning data, to output output values that correspond to the attributes represented by the attribute data upon the image data and the range data being input.
  • a trained neural network to be used in the target subject analysis apparatus 1 is constructed.
  • the target subject analysis apparatus 1 may also acquire, via a network 10, a trained neural network that is constructed by the learning apparatus 2.
  • the type of the network 10 may be selected as appropriate from among the Internet, a wireless communication network, a telecommunication network, a telephone network, a dedicated network, and the like, for example.
  • the image data that represents the image including the figure of the target subject 6, as well as the range data that represents the values of the distance at the respective pixels constituting the image are used as the input to the neural network during analysis of the attributes of the target subject 6. That is to say, the image including the figure of the target subject 6 as well as the values of the distance obtained for the respective pixels in the image are used during the analysis of the attributes of the target subject 6 by the neural network.
  • recognition accuracy of the neural network with respect to the attributes of the target subject 6 can be increased.
  • the attributes of the target subject 6 can be analyzed simply by inputting the image data and the range data to the neural network, without performing advanced image processing on the image.
  • FIG. 2 schematically shows an example of a hardware configuration of the target subject analysis apparatus 1 according to the embodiment.
  • the target subject analysis apparatus 1 is a computer in which a control unit 11, a storage unit 12, a communication interface 13, an input device 14, an output device 15, an external interface 16, and a drive 17 are electrically connected to one another.
  • the communication interface and the external interface are denoted as “communication I/F” and “external I/F”, respectively.
  • the control unit 11 includes a CPU (Central Processing Unit), which is a hardware processor, a RAM (Random Access Memory), a ROM (Read Only Memory), and the like, and controls the constituent elements in accordance with information processing.
  • the storage unit 12 is an auxiliary storage device such as a hard disk drive or a solid-state drive, for example, and stores a target subject analysis program 121, which is to be executed by the control unit 11, learning result data 122 that represents information regarding the trained neural network, and the like.
  • the storage unit 12 corresponds to a memory.
  • the target subject analysis program 121 is a program for causing the target subject analysis apparatus 1 to execute later-described processing for analyzing the attributes of the target subject 6 (FIG. 7).
  • the learning result data 122 includes a configuration of the neural network, a connection weight between neurons, and information that represents a threshold value for each neuron, and is used for setting the trained neural network to be used in processing for analyzing the attributes of the target subject 6.
  • the storage unit 12 stores a plurality of pieces of learning result data 122.
  • the communication interface 13 is a wired LAN (Local Area Network) module, a wireless LAN module, or the like, for example, and is an interface for performing wired or wireless communication via a network.
  • the input device 14 is a device for input, such as a mouse or a keyboard, for example.
  • the output device 15 is a device for output, such as a display or a speaker, for example.
  • the external interface 16 is a USB (Universal Serial Bus) port or the like, and is an interface for connecting to an external device such as the camera 3.
  • USB Universal Serial Bus
  • the drive 17 is a CD (Compact Disk) drive, a DVD (Digital Versatile Disk) drive, or the like, for example, and is a device for loading the program stored in a storage medium 91.
  • the type of the drive 17 may be selected as appropriate in accordance with the type of the storage medium 91.
  • the target subject analysis program 121 and/or the learning result data 122 may also be stored in this storage medium 91.
  • the storage medium 91 is a medium that accumulates information such as that of a program by means of an electric, magnetic, optical, mechanical, or chemical effect, so that a computer, other kind of apparatus, machine, or the like can read information such as that of the program.
  • the target subject analysis apparatus 1 may also acquire the target subject analysis program 121 and/or the learning result data 122 from this storage medium 91.
  • FIG. 2 shows a storage medium of a disk type such as a CD or DVD as an example of the storage medium 91.
  • the type of the storage medium 91 is not limited to a disk type, and may also be a type other than a disk type.
  • Examples of a storage medium of a type other than the disk type may include a semiconductor memory such as a flash memory.
  • the control unit 11 may also include a plurality of hardware processors.
  • Each of the hardware processors may be constituted by a microprocessor, an FPGA (field-programmable gate array), or the like.
  • the target subject analysis apparatus 1 may also be constituted by a plurality of information processing apparatuses.
  • the target subject analysis apparatus 1 may be an information processing apparatus designed exclusively for a service to be provided, as well as a general-purpose desktop PC (Personal Computer), a tablet PC, or the like.
  • FIG. 3 schematically shows an example of a hardware configuration of the learning apparatus 2 according to the embodiment.
  • the learning apparatus 2 is a computer in which a control unit 21, a storage unit 22, a communication interface 23, an input device 24, an output device 25, an external interface 26, and a drive 27 are electrically connected to one another.
  • the communication interface and the external interface are denoted as “communication I/F” and “external I/F”, respectively, similar to FIG. 2.
  • the constituent elements ranging from the control unit 21 to the drive 27 and a storage medium 92 are similar to the constituent elements ranging from the control unit 11 to the drive 17 and the storage medium 91 in the target subject analysis apparatus 1.
  • a camera 5, which is connected via the external interface 26, is similar to the camera 3 that is connected to the target subject analysis apparatus 1.
  • the storage unit 22 in the learning apparatus 2 stores a learning program 221, which is to be executed by the control unit 21, learning data 222, which is to be used in the training of the neural network, and the like.
  • the learning program 221 is a program for causing the learning apparatus 2 to execute later-described learning processing of the neural network (FIG. 9).
  • the learning data 222 is data for training the neural network to be able to analyze a desired attribute of the target subject 6, and includes image data, range data, and attribute data. The details of the learning data 222 will be described later.
  • the learning program 221 and/or the learning data 222 may also be stored in the storage medium 92, as in the target subject analysis apparatus 1. Accordingly, the learning apparatus 2 may also acquire the learning program 221 and/or the learning data 222 to be used from the storage medium 92.
  • the learning apparatus 2 may also include an information processing apparatus designed exclusively for a service to be served, as well as a general-purpose server device, a desktop PC, or the like. (Functional configuration) ⁇ Target subject analysis apparatus>
  • FIG. 4 schematically shows an example of the functional configuration of the target subject analysis apparatus 1 according to the embodiment.
  • the control unit 11 in the target subject analysis apparatus 1 loads, to the RAM, the target subject analysis program 121 stored in the storage unit 12.
  • the control unit 11 interprets and executes, using a CPU, the target subject analysis program 121 loaded to the RAM, and controls the constituent elements.
  • the target subject analysis apparatus 1 functions as a computer including a data acquisition unit 111, a neural network computing unit 112, an attribute specifying unit 113, and a neural network selection unit 114.
  • the data acquisition unit 111 acquires image data 123, which represents an image including a figure of the target subject 6, and range data 124, which represents values of the distance at respective pixels constituting the image.
  • the neural network computing unit 112 obtains output values from the neural network 7 by performing computation processing with this neural network 7 using the image data 123 and the range data 124 as the input to the trained neural network 7 for determining attributes of the target subject 6.
  • the attribute specifying unit 113 specifies the attributes of the target subject 6 based on the output values obtained from the neural network 7.
  • the neural network selection unit 114 selects a neural network 7 to be used by the neural network computing unit 112 from among a plurality of trained neural networks that have been trained to determine attributes of different target subjects, in accordance with designation by a user. Note that the neural network 7 to be used is set based on the learning result data 122.
  • FIG. 5 is a diagram illustrating the image data 123 and the range data 124 that are acquired by the camera 3.
  • the camera 3 according to the embodiment is configured to be able to form a figure of the target subject to form an image, and measure the distance to the target subject from the respective pixels of the formed image.
  • the camera 3 is a shooting device that includes a phototransmitter unit for radiating infrared light, such as an infrared LED (Light Emitting Diode), and a photodetector unit for receiving infrared light and visible light, such as a CMOS (Complementary MOS) image sensor.
  • CMOS Complementary MOS
  • the camera 3 can acquire the image data 123, which expresses the colors of the pixels constituting the image using pixel values, by forming an image of the visible light reflected from the target subject using the photodetector unit.
  • the pixel values of the pixels may also be expressed in an RGB color space, or may also be expressed in a gray scale color space, for example.
  • a method of expressing the pixel values of the pixels can be selected as appropriate, as per an embodiment.
  • the camera 3 can also acquire the range data 124, which represents, for the respective pixels, values of the distance (depth) d from the camera 3 to a figure that appears in the pixels by measuring, for the respective pixels, the amount of time taken until the infrared light projected from the phototransmitter unit reaches the target subject and then returns to the photodetector unit (TOF method: Time Of Flight).
  • This distance d may also be expressed as a linear distance d1 between the camera 3 and the target subject, or may also be expressed as a distance d2 on a horizontal axis from the camera 3 to a vertical line extending from the subject. Since the distance d1 and the distance d2 can be converted to each other based on the Pythagorean theorem or the like, the same description is applied to both cases of employing the distance d1 and employing the distance d2.
  • the camera 3 acquires the pixel values and the values of the distance d with respect to the respective pixels constituting the image.
  • the values of the distance d to the target subject can be acquired throughout the entire area to acquire the image.
  • the pixel values and the values of the distance d can be acquired in one-to-one correspondence with the pixels.
  • the values of the distance d may not necessarily be acquired for all pixels constituting the image. That is to say, in the acquired range data, there may be pixels for which the values of the distance d to the target subject cannot be acquired for some reason, e.g. because the reflection of infrared light is interrupted.
  • the range data 124 represents the values of the distance at the respective pixels constituting the image
  • an image can also be represented by this range data 124.
  • This image represented by the range data 124 may also be called a “range image”, distinguished from the image represented by the image data 123.
  • the neural network 7 to be used is a multi-layer neural network that is used in so-called deep learning, and includes an input layer 71, an intermediate layer (hidden layer) 72, and an output layer 73 in this order from the input side.
  • the neural network 7 includes one intermediate layer 72, the output from the input layer 71 is input to the intermediate layer 72, and the output from the intermediate layer 72 is input to the output layer 73.
  • the number of intermediate layers 72 may not be limited to one, and the neural network 7 may also include two or more intermediate layers 72.
  • Each of the layers 71 to 73 includes one or more neurons.
  • the number of neurons in the input layer 71 can be set in accordance with the number of pixels in the image data 123 and the range data 124.
  • the number of neurons in the intermediate layer 72 can be set as appropriate, as per an embodiment.
  • the number of neurons in the output layer 73 can be set in accordance with the number of types of attributes of the target subject 6 that are to be targets of analysis.
  • Neurons in adjacent layers are coupled as appropriate, and a weight (connection weight) is set for each connection.
  • each neuron is coupled to all neurons in an adjacent layer.
  • the connection of neurons may not be limited to this example, and may be set as appropriate, as per an embodiment.
  • a threshold value is set for each neuron. Basically, the output of each neuron is determined based on whether or not the sum of products of the input and the weight exceeds a threshold value.
  • the target subject analysis apparatus 1 specifies the attributes of the target subject 6 based on the output values obtained from the output layer 73 by inputting the image data 123 and the range data 124 to the input layer 71 in this neural network 7. Learning apparatus
  • FIG. 6 schematically shows an example of the functional configuration of the learning apparatus 2 according to the embodiment.
  • the control unit 21 in the learning apparatus 2 loads, to the RAM, the learning program 221 stored in the storage unit 22.
  • the control unit 21 interprets and executes, using a CPU, the learning program 221 loaded to the RAM, and controls the constituent elements.
  • the learning apparatus 2 functions as a computer including a learning data acquisition unit 211 and a learning processing unit 212.
  • the learning data acquisition unit 211 acquires, as the learning data 222, a set of data that includes image data 223 that represents an image including a figure of the target subject 6, range data 224 that represents the values of the distance at the respective pixels constituting the image, and attribute data 225 that represents attributes of the target subject 6.
  • the learning processing unit 212 trains a neural network 8 to output output values that correspond to the attributes represented by the attribute data 225 upon the image data 223 and the range data 224 being input.
  • the neural network 8 which is to be trained, includes an input layer 81, an intermediate layer (hidden layer) 82, and an output layer 83, and is configured similar to the aforementioned neural network 7.
  • the layers 81 to 83 are similar to the layers 71 to 73, respectively.
  • the neural network 8 is constructed to output output values corresponding to the attributes of the target subject 6 upon the image data and the range data being input.
  • the learning processing unit 212 stores, in the storage unit 22, information indicating the configuration of the constructed neural network 8, connection weights between neurons, and threshold values for the respective neurons as the learning result data 122. (Others)
  • the functions of the target subject analysis apparatus 1 and the learning apparatus 2 are described in detail in a later-described operation example.
  • the embodiment describes an example in which all of the functions of the target subject analysis apparatus 1 and the learning apparatus 2 are realized by a general-purpose CPU.
  • some or all of the aforementioned functions may also be realized by one or more dedicated processors.
  • functions may be omitted, replaced, and added as appropriate, as per an embodiment. ⁇ 3 Operation example
  • FIG. 7 is a flowchart showing an example of a processing procedure of the target subject analysis apparatus 1. Note that the processing procedure described below is merely an example, and processing may be modified as much as possible. In the processing procedure described below, steps may be omitted, replaced, and added as appropriate, as per an embodiment.
  • Step S101 the control unit 11 functions as the neural network selection unit 114, and selects a neural network 7 to be used in later-described step S103 from among a plurality of trained neural networks that have been trained to determine attributes of different target subjects, in accordance with designation by a user.
  • the target subject analysis apparatus 1 holds a plurality of sets of learning result data 122 for each target subject 6 that is a target of analysis and each type of attributes thereof, in the storage unit 12.
  • the control unit 11 outputs the target subject 6 that is the target of analysis and the types of attributes thereof to the output device 15, and accepts designation of the target of analysis performed by the user using the input device 14.
  • the control unit 11 selects learning result data 122 to be used, in accordance with the designation accepted from the user, and sets the neural network 7 using the selected learning result data 122.
  • the learning result data 122 includes the information indicating the configuration of the neural network 7, the connection weights between neurons, and the threshold values for the respective neurons so as to be able to set the neural network 7 that outputs output values corresponding to desired attributes of a target subject 6 of a desired type upon the image data and the range data being input.
  • the control unit 11 Based on the information indicating the configuration of the neural network 7, the control unit 11 sets the structure of the neural network 7, the number of neurons included in the respective layers 71 to 73, a connection state between neurons in adjacent layers, and the like. Also, the control unit 11 sets the values of parameters of the neural network 7 based on the information indicating the connection weights between neurons and the threshold values for the respective neurons. The setting of the neural network 7 to be used is thus completed.
  • the target subject analysis apparatus 1 may also acquire each piece of the learning result data 122 from the learning apparatus 2 via the network 10, or may also acquire each piece from the storage medium 91 via the drive 17, in accordance with an operation made by the user to the input device 14.
  • the target subject analysis apparatus 1 may also acquire each piece of the learning result data 122 by accepting distribution thereof from the learning apparatus 2.
  • each piece of the learning result data 122 may also be stored in another information processing apparatus (storage device) such as a NAS (Network Attached Storage).
  • the target subject analysis apparatus 1 may also acquire each piece of the learning result data 122 by accessing the other information processing apparatus when performing processing in step S101.
  • Step S102 the control unit 11 functions as the data acquisition unit 111, and acquires the image data 123 and the range data 124.
  • the camera 3 is configured to be able to acquire the image data 123 and the range data 124. For this reason, the control unit 11 acquires the image data 123 and the range data 124 from the camera 3 via the external interface 16.
  • Step S103 In the next step S103, the control unit 11 functions as the neural network computing unit 112, and obtains output values from the neural network 7 by performing computation processing with this neural network 7 using the image data 123 and the range data 124 as the input to the trained neural network 7 for determining attributes of the target subject 6.
  • control unit 11 inputs the pixel values of the respective pixels included in the image data 123 and the values of the distance at the respective pixels included in the range data 124 to the neurons included in the input layer 71 in the neural network 7 that is set in step S101.
  • a correspondence relationship between the respective values and the neurons may be set as appropriate, as per an embodiment.
  • the control unit 11 performs determination regarding ignition of the neurons included in the layers 71 to 73, in the direction of forward propagation. The control unit 11 can thus obtain an output value from each neuron included in the output layer 73 of the neural network 7.
  • Step S104 In the next step S104, the control unit 11 functions as the attribute specifying unit 113, and specifies attributes of the target subject 6 based on the output values obtained from the neural network 7 in step S103.
  • the neural network 7 has already been trained to output output values corresponding to desired attributes of the target subject 6 of a desired type upon the image data 123 and the range data 124 being input. As many output values as there are neurons included in the output layer 73 can be obtained, and one or more of the obtained output values can be associated with one attribute (attribute value) of the target subject 6. Information indicating a correspondence relationship between the attributes (attribute values) of the target subject 6 and the output values of the neural network 7 can be given by data in a table form, for example.
  • the control unit 11 specifies the attributes (attribute values) of the target subject 6 based on the output values obtained in step S103, by referencing the information indicating the correspondence relationship between the attributes (attribute values) of the target subject 6 and the output values of the neural network 7.
  • the number of attributes of the target subject 6 to be specified may be selected as appropriate, as per an embodiment. Specific examples will be described later.
  • the control unit 11 ends processing for analyzing the target subject 6 according to this operation example. (Analysis example)
  • FIGS. 8A to 8C Three specific examples of attribute analysis of the target subject 6 using the target subject analysis apparatus 1 are described using FIGS. 8A to 8C.
  • FIG. 8A schematically shows, as the first specific example, an instance where the target subject analysis apparatus 1 is used for analyzing a situation outside a vehicle such as an automobile.
  • the camera 3 is an in-vehicle camera that is installed so as to be able to shoot a situation outside the vehicle (e.g. front side of the vehicle).
  • the target subject analysis apparatus 1 is an information processing apparatus such as an in-vehicle device that can be connected to the camera 3 or a general-purpose PC.
  • step S101 in response to accepting designation of analysis of a situation outside the vehicle from a user, the control unit 11 sets a neural network 7A based on corresponding learning result data 122. Subsequently, in the above-described step S102, the control unit 11 acquires, from the camera 3, image data 123A and range data 124A, which are obtained by shooting the situation outside the vehicle as an target subject 6A.
  • the control unit 11 inputs the pixel values of the respective pixels included in the image data 123A and the values of the distance at the respective pixels included in the range data 124A, to neurons included in an input layer 71A of the neural network 7A. Furthermore, the control unit 11 performs computation for determination regarding ignition of the neurons included in the input layer 71A, an intermediate layer 72A, and an output layer 73A in the direction of forward propagation, and obtains output values from the neurons included in the output layer 73A.
  • control unit 11 specifies, as an attribute of the target subject 6A, at least one of the state of a road surface, the presence of an obstacle, and the type of obstacle, based on the output values obtained from the neural network 7A.
  • the type of the situation outside the vehicle that is to be a target of analysis may be selected as appropriate, as per an embodiment.
  • the number of neurons included in the output layer 73A may also be three or more.
  • the target subject analysis apparatus 1 may set at least two types of attributes, namely the state of the road surface and the presence of an obstacle(s), as targets of analysis.
  • the respective states of the road surface may be associated with output values (0, 0), (0, 1), (1, 0), and (1, 1).
  • attribute values of the presence of an obstacle(s) that is to be a target of analysis namely “no obstacle present” and “an obstacle(s) present”, and the respective attribute values may be associated with output values 0 and 1.
  • control unit 11 can recognize that the state of the road surface within a shooting area is flat, uneven, upslope, or downslope, based on the output value corresponding to the state of the road surface from the neural network 7A being (0, 0), (0, 1), (1, 0), or (1, 1).
  • the control unit 11 can also recognize that no obstacle is present or an obstacle(s) is present within the shooting area, based on the output value corresponding to the presence of an obstacle(s) from the neural network 7A being 0 or 1.
  • the control unit 11 may also control the driving speed of the vehicle based on the result of recognition of the state of the road surface. Also, for example, when it is recognized that an obstacle(s) is present, the control unit 11 may also control the vehicle so as to stop before reaching the obstacle.
  • FIG. 8B schematically shows, as the second specific example, an instance where the target subject analysis apparatus 1 is used for analyzing a state of a product manufactured on a production line.
  • the camera 3 is installed so as to be able to shoot a product that is flowing on the production line.
  • step S101 in response to designation of analysis of a product manufactured on the production line accepted from a user, the control unit 11 sets a neural network 7B based on corresponding learning result data 122. Subsequently, in the above-described step S102, the control unit 11 acquires, from the camera 3, image data 123B and range data 124B that are obtained by shooting the product manufactured on the production line as a target subject 6B.
  • the control unit 11 inputs the pixel values of the respective pixels included in the image data 123B and the values of the distance at the respective pixels included in the range data 124B, to neurons included in an input layer 71B of the neural network 7B. Furthermore, the control unit 11 performs computation for determination regarding ignition of the neurons included in the input layer 71B, an intermediate layer 72B, and an output layer 73B in the direction of forward propagation, and obtains output values from the neurons included in the output layer 73B.
  • the control unit 11 specifies at least one of the size and the shape of the product, and the presence of damage thereon as an attribute of the target subject 6B, based on the output values obtained from the neural network 7B.
  • the type of state of the product that is to be the target of analysis may be selected as appropriate, as per an embodiment, as in the above-described first specific example.
  • FIG. 8B shows an instance where at least two types of attributes, namely the shape of the product and the presence of damage are targets of analysis, based on the output values from the output layer 73B.
  • a correspondence relationship between the respective attribute values and the output values may be set as appropriate, as per an embodiment.
  • the results of recognizing the state of the product may also be used to determine whether or not there is any abnormality on the produced product. For example, if the control unit 11 recognizes that there is damage on the product, it may also determine that there is an abnormality on the product that appears in the image data 123B and the range data 124B. If not, the control unit 11 may also determine that there is no abnormality on the target product. The control unit 11 may also control the production line so as to carry the product for which it has been determined that there is an abnormality to a line other than the line for products with no abnormality.
  • FIG. 8C schematically shows, as the third specific example, an instance where the target subject analysis apparatus 1 is used for analyzing human features.
  • the camera 3 is installed so as to be able to shoot a person, who is to be a target of feature analysis.
  • step S101 in response to designation of feature analysis of a target person accepted from a user, the control unit 11 sets a neural network 7C based on corresponding learning result data 122. Subsequently, in the above-described step S102, the control unit 11 acquires, from the camera 3, image data 123C and range data 124C that are obtained by shooting the target person as a target subject 6C.
  • the control unit 11 inputs the pixel values of the respective pixels included in the image data 123C and the values of the distance at the respective pixels included in the range data 124C, to neurons included in an input layer 71C of the neural network 7C. Furthermore, the control unit 11 performs computation for determination regarding ignition of the neurons included in the input layer 71C, an intermediate layer 72C, and an output layer 73C in the direction of forward propagation, and obtains output values from the neurons included in the output layer 73C.
  • the control unit 11 specifies at least one of the body shape, facial expression, and posture of the human as the attributes of the target subject 6C, based on the output values obtained from the neural network 7C.
  • the types of human features that are to be targets of analysis may be selected as appropriate, as per an embodiment, as in the above-described first and second specific examples.
  • FIG. 8C shows an instance where at least two types of attributes, namely the body shape and the posture are the targets of analysis, based on the output values from the output layer 73C.
  • a correspondence relationship between the respective attribute values and the output values may be set as appropriate, as per an embodiment.
  • results of recognizing the features of the person may also be used in determination of the state of health or the like.
  • the control unit 11 may also give a warning to pay attention to being obese.
  • Each of the above-described three specific examples is an example of usage of the target subject analysis apparatus 1.
  • the usage of the target subject analysis apparatus 1 may be modified as appropriate, as per an embodiment.
  • the target subject analysis apparatus 1 may also be configured to specify a three-dimensional shape of the target subject 6 based on the output values obtained from the neural network 7 by inputting image data and range data that partially render the target subject 6.
  • the target subject analysis apparatus 1 may also be configured to specify a plurality of physical characteristics of the target subject 6 as the attributes of the target subject 6 in the above-described step S104.
  • the physical characteristics refer to features that physically appear on the target subject 6, such as the size, shape, and orientation of the target subject 6.
  • the physical characteristics include geometric features such as the size, shape, and orientation of the target subject 6, and features concerning material quality such as the composition of the target subject 6.
  • the range data 124 that configures a range image having values of the distances from the respective pixels is used as the input to the neural network 7. In this range image, physical characteristics of the target subject 6 (particularly, geometric features) are relatively likely to appear. For this reason, by configuring the target subject analysis apparatus 1 so as to specify a plurality of physical characteristics of the target subject 6, the target subject analysis apparatus 1 can identify the target subject 6 relatively accurately.
  • the target subject analysis apparatus 1 may also be configured to specify at least one of an uneven state, material quality, a three-dimensional shape, and a flat state of the target subject 6 as the physical characteristics of the target subject 6.
  • the uneven state indicates the shapes, sizes, or the like of a raised portion, such as a protrusion that is present on the target subject 6, and a recessed portion, such as an opening or a hole.
  • the flat state indicates the degree of expansion of a face of the target subject 6, the degree of tilt thereof, or the like.
  • the target subject analysis apparatus 1 can specify, as the uneven state, whether the shape of the road surface is raised or recessed relative to the camera 3.
  • the target subject analysis apparatus 1 can specify, as an attribute of the target subject 6, whether a transparent object placed on the road surface is glass or a puddle.
  • the target subject analysis apparatus 1 can specify, as the three-dimensional shape of the target subject 6, whether the target subject 6 is a flat object such as a sign board, or a three-dimensional object that is relatively thick.
  • the uneven state, material quality, three-dimensional shape, and flat state are attributes that are difficult to analyze using a single image. According to the embodiment, due to using not only the image data 123 but also the range data 124, these attributes, which are difficult to analyze using a single image, can be identified relatively accurately. (Learning apparatus)
  • FIG. 9 is a flowchart showing an example of a processing procedure of the learning apparatus 2. Note that the processing procedure described below is merely an example, and processing may be modified as much as possible. In the processing procedure described below, steps may be omitted, replaced, and added as appropriate, as per an embodiment.
  • Step S201 the control unit 21 functions as the learning data acquisition unit 211, and acquires, as the learning data 222, a set of data that includes the image data 223 that represents an image including a figure of the target subject 6, the range data 224 that represents the values of the distance at the respective pixels constituting the image, and the attribute data 225 that represents attributes of the target subject 6.
  • the learning data 222 is data for training the neural network 8 to be able to analyze attributes of a desired target subject 6.
  • This learning data 222 can be created by shooting a prepared target subject 6 using the camera 5 under various shooting conditions, and associating obtained images with the shooting conditions.
  • the control unit 21 shoots the target subject 6 in a state where the attributes that are targets of analysis appear, using the camera 5. Since the camera 5 is configured similar to the above-described camera 3, the control unit 21 can acquire, through this shooting, the image data 223 that represents an image of the target subject 6 in which the attributes that are the targets of analysis appear, as well as the range data 224. The control unit 21 can then create each piece of the learning data 222 by accepting, as appropriate, the input of the attribute data 225 (i.e. training data) that represents the attributes of the target subject 6 that appear in the image, and associating the input attribute data 225 with the acquired image data 223 and range data 224.
  • the learning data 222 may also be created manually by an operator or the like, or may also be created automatically by a robot or the like.
  • the learning data 222 may also be created by the learning apparatus 2 as described above, or may also be created by an information processing apparatus other than the learning apparatus 2.
  • the control unit 21 can acquire the learning data 222 by executing processing for creating the learning data 222 in step S201.
  • the learning apparatus 2 can acquire the learning data 222 that is created by another information processing apparatus via a network, the storage medium 92, or the like.
  • the number of pieces of learning data 222 to be acquired in step S201 may be determined as appropriate, as per an embodiment, so that the neural network 8 can be trained.
  • Step S202 In the next step S202, the control unit 21 functions as the learning processing unit 212, and trains the neural network 8, using the learning data 222 acquired in step S201, to output output values corresponding to the attributes represented by the attribute data 225 upon the image data 223 and the range data 224 being input.
  • the control unit 21 prepares the neural network 8 for which learning processing is to be performed.
  • the configuration of the neural network 8 to be prepared, initial values of the connection weights between neurons, and initial values of the threshold values for the respective neurons may also be given by a template, or may also be given by input made by the operator.
  • the control unit 21 may also prepare the neural network 8, based on the learning result data 122 that is to be re-trained.
  • control unit 21 causes the neural network 8 to undergo training using, as input data, the image data 223 and the range data 224 included in each piece of the learning data 222 acquired in step S201, and also using the attribute data 225 as training data.
  • the neural network 8 may be trained using a gradient descent method, stochastic gradient descent method, or the like.
  • control unit 21 calculates errors between the output values that are output from the output layer 83 as a result of inputting the image data 223 and the range data 224 to the input layer 81 and desired values corresponding to the attributes represented by the attribute data 225. Subsequently, the control unit 21 calculates errors between the connection weights between neurons and threshold values for the respective neurons, using an error back-propagation method. Based on the calculated errors, the control unit 21 then updates the values of the connection weights between neurons and the threshold values for the respective neurons.
  • the control unit 21 trains the neural network 8 by repeating this series of processing for each piece of the learning data 222 until the output values output from the neural network 8 coincides with the desired values corresponding to the attributes represented by the attribute data 225.
  • a neural network 8 can be constructed that outputs output values corresponding to the attributes represented by the attribute data 225 when the image data 223 and the range data 224 are input thereto.
  • Step S203 In the next step S203, the control unit 21 functions as the learning processing unit 212, and stores, in the storage unit 22, information indicating the configuration of the constructed neural network 8, the connection weights between neurons, and the threshold values for the respective neurons as the learning result data 122.
  • the control unit 21 ends learning processing for the neural network 8 according to this operation example.
  • the image data 123 that represents an image including a figure of the target subject 6, as well as the range data 124 that represents the distance at the respective pixels constituting the image are acquired through the above-described step S102. Then, through the above-described steps S103 and S104, the image data 123 and the range data 124 are input to the neural network 7, and attributes of the target subject 6 are specified based on the output values obtained from the neural network 7.
  • a two-dimensional image (image data 123) in which the target subject 6 appears, as well as a range image (range data 124) in which the target subject 6 also appears are used as the input to the neural network 7.
  • the recognition accuracy of the neural network 7 with respect to the attributes of the target subject 6 can be increased.
  • the attributes of the target subject 6 can be analyzed simply by inputting the image data 123 and the range data 124 to the neural network 7 in step S103 as in the above-described analysis processing, without performing advanced image processing on the images in which the target subject 6 appears. Accordingly, according to the embodiment, it is possible to increase the recognition accuracy with respect to the attributes of the target subject 6 with a simple configuration, reduce the processing load on the CPU, and reduce the capacity of the memory to be used.
  • the target subject analysis apparatus 1 holds a plurality of pieces of learning result data 122, and sets the neural network 7 to be used in accordance with designation by a user, through the processing in step S101. Accordingly, according to the embodiment, it is possible to prepare, in advance, the learning result data 122 suitable for the respective attributes of the target subject 6, and thus realize analysis processing suitable for the respective attributes of the target subject 6. ⁇ 4 Modifications
  • the target subject analysis apparatus 1 is configured to be able to hold a plurality of pieces of learning result data 122, and set a neural network 7 for analyzing attributes of a desired target subject 6 in accordance with designation by a user.
  • the configuration of the target subject analysis apparatus 1 may not be limited to this example.
  • the target subject analysis apparatus 1 may also be configured to hold one piece of learning result data 122.
  • the above-described neural network selection unit 114 and step S101 may also be omitted.
  • the target subject analysis apparatus 1 that analyzes attributes of a target subject and the learning apparatus 2 that trains a neural network are configured by different computers.
  • the configurations of the target subject analysis apparatus 1 and the learning apparatus 2 may not be limited to this example.
  • a system having functions of both the target subject analysis apparatus 1 and the learning apparatus 2 may be realized by one or more computers.
  • the types of the neural networks (7, 8) are general, forward-propagation multi-layer neural networks.
  • the types of the neural networks (7, 8) may not be limited to this example, and may be selected as appropriate, as per an embodiment.
  • the neural networks (7, 8) may also be convolutional neural networks that use the input layer 71 and the intermediate layer 72 as a convolution layer and a pooling layer, respectively.
  • the neural networks (7, 8) may be recurrent neural networks having connections that recur from the output side to the input side, e.g. from the intermediate layer 72 to the input layer 71.
  • a target subject analysis apparatus including: a hardware processor; and a memory configured to hold a program to be executed by the hardware processor, wherein the hardware processor is configured to execute the program to perform: a data acquisition step of acquiring image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image; a computation processing step of obtaining an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network; and an attribute specifying step of specifying the attribute of the target subject based on the output value obtained from the neural network.
  • a target subject analysis method including: a data acquisition step of acquiring image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image, by a hardware processor; a computation processing step of obtaining an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network, by the hardware processor; and an attribute specifying step of specifying the attribute of the target subject based on the output value obtained from the neural network, by the hardware processor.
  • a learning apparatus including: a hardware processor; and a memory configured to hold a program to be executed by the hardware processor, wherein the hardware processor is configured to execute the program to perform: a learning data acquisition step of acquiring, as learning data, a set of data that includes image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject; and a learning processing step of training a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data if the image data and the range data are input.
  • a learning method including: a learning data acquisition step of acquiring, as learning data, a set of image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject, by a hardware processor; and a learning processing step of training a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data if the image data and the range data are input, by the hardware processor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

A technique is provided that is able to increase recognition accuracy with respect to attributes of a target subject with a simple configuration. A target subject analysis apparatus according to an aspect of the present invention includes: a data acquisition unit configured to acquire image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image; a neural network computing unit configured to obtain an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network; and an attribute specifying unit configured to specify the attribute of the target subject based on the output value obtained from the neural network.

Description

TARGET SUBJECT ANALYSIS APPARATUS, TARGET SUBJECT ANALYSIS METHOD, LEARNING APPARATUS, AND LEARNING METHOD
The present invention relates to a target subject analysis apparatus, a target subject analysis method, a learning apparatus, and a learning method.
Recently, with an improvement in computer processing capabilities, deep learning using a multi-layer neural network is increasingly being used for various industrial applications. For example, JP H06-124120A proposes a motion controller that inputs image information obtained by a CCD camera and range information obtained by an ultrasonic range sensor to the same neural network, and drives a motion device based on an output signal from this neural network.
Also, for example, JP 2005-346297A proposes a three-dimensional object recognition device that identifies a three-dimensional object in a range image, which is generated using a neural network based on a pair of images shot by a stereo camera. Specifically, this three-dimensional object recognition device generates a range image utilizing parallax that occurs between the pair of images obtained by the stereo camera, and groups pieces of range data that represent the same three-dimensional object in the generated range image. Next, the three-dimensional object recognition device sets, in the range image, a smallest area that contains the grouped pieces of range data of the three-dimensional object, and sets an input value with typical range data serving as an element, for each sub-area, which is obtained by dividing the aforementioned smallest area by a set division number. The three-dimensional object recognition apparatus then identifies the type of the three-dimensional object based on the pattern of output values that are obtained by inputting the set input values to the neural network.
JP H06-124120A JP 2005-346297A
The method in JP H06-124120A uses range information obtained by an ultrasonic range sensor. The values of the distance represented by this range information do not directly correspond to the pixels of the image obtained by the CCD camera. In addition, the area in which the values of the distance can be acquired by the ultrasonic range sensor is a portion of a shooting area that appears in the image. That is to say, the ultrasonic range sensor cannot acquire the values of the distance to a target subject that appears on all of the pixels constituting the image. For this reason, with the method in JP H06-124120A, it is difficult to increase recognition accuracy with respect to attributes of a target subject.
In the method in JP 2005-346297A, image processing, such as stereo mapping for searching for corresponding points between a pair of images and grouping of pieces of range data that represent the same three-dimensional object, is performed before data is input to the neural network. For this reason, the method in JP 2005-346297A makes the system configuration complex.
The present invention has been made in view of the foregoing points in an aspect, and aims to provide a technique that is able to increase recognition accuracy with respect to attributes of a target subject, with a simple configuration.
To achieve the above-stated object, the present invention employs the following configuration.
That is to say, a target subject analysis apparatus according to an aspect of the present invention includes: a data acquisition unit configured to acquire image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image; a neural network computing unit configured to obtain an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network; and an attribute specifying unit configured to specify the attribute of the target subject based on the output value obtained from the neural network.
The above-described target subject analysis apparatus uses, as the input to the neural network, the image data that represents an image including a figure of a target subject, and also the range data that represents the value of the distance at each pixel constituting the image. Accordingly, since the values of the distance are obtained for the respective pixels in the image, the recognition accuracy of the neural network with respect to the attribute of the target subject can be increased. In addition, the above-described target subject analysis apparatus can specify the attribute of the target subject simply by inputting the image data and the range data to the neural network, without performing advanced image processing on the image. It is therefore possible to realize processing for analyzing the attributes of the target subject with a simple configuration, reduce the processing load on a CPU, and reduce the capacity of a memory to be used. Accordingly, with the above-described target subject analysis apparatus, the recognition accuracy with respect to the attributes of the target subject can be increased with a simple configuration. Note that the target subject may include anything that can be shot by an image capturing device. The attribute of the target subject to be specified may include any kind of feature of the target subject that appears in the image.
In another mode of the target subject analysis apparatus according to the above-described aspect, the attribute specifying unit may specify, as the attribute of the target subject, at least one of an uneven state, material quality, a three-dimensional shape, and a flat state of the target subject. The uneven state indicates the shape, size, or the like of a raised portion and a recessed portion that are present on the target subject. The raised portion includes a protrusion. The recessed portion includes an opening and a hole. The flat state indicates the degree of expansion of a face of the target subject, the degree of tilt thereof, or the like. The uneven state, material quality, three-dimensional shape, and flat state are attributes that are difficult to analyze using a single image. This configuration uses not only the image data but also the range data, and it is therefore possible to relatively accurately identify these attributes that are difficult to analyze using a single image.
In another mode of the target subject analysis apparatus according to the above-described aspect, the attribute specifying unit may specify, as the attribute of the target subject, a plurality of physical characteristics of the target subject. With this configuration, the target subject can be identified relatively accurately. Note that the physical characteristics refer to features that physically appear on the target subject. The physical characteristics include geometric features such as the size, shape, and posture of the target subject, and features concerning material quality such as the composition of the target subject.
In another mode of the target subject analysis apparatus according to the above-described aspect, the target subject analysis apparatus may further include a neural network selection unit configured to select, in accordance with designation by a user, a neural network to be used by the neural network computing unit from among a plurality of trained neural networks that have been trained to determine attributes of different target subjects. With this configuration, analytical processing suitable for the type of the target subject can be realized.
In another mode of the target subject analysis apparatus according to the above-described aspect, the image data and the range data may be obtained by shooting a situation outside a vehicle as the target subject, and the attribute specifying unit may specify, as the attribute of the target subject, at least one of a state of a road surface, presence of an obstacle, and a type of obstacle, based on the output value obtained from the neural network. This configuration can provide a target subject analysis apparatus able to accurately identify the situation outside the vehicle.
In another mode of the target subject analysis apparatus according to the above-described aspect, the image data and the range data may be obtained by shooting, as the target subject, a product manufactured on a production line, and the attribute specifying unit may specify, as the attribute of the target subject, at least one of a size of the product, a shape of the product, and presence of damage on the product, based on the output value obtained from the neural network. This configuration can provide a target subject analysis apparatus able to accurately identify the quality of the product manufactured on the production line.
In another mode of the target subject analysis apparatus according to the above-described aspect, the image data and the range data may be obtained by shooting a human as the target subject, and the attribute specifying unit may specify, as the attribute of the target subject, at least one of a body shape of the human, a facial expression of the human, and a posture of the human, based on the output value obtained from the neural network. This configuration can provide a target subject analysis apparatus able to accurately identify a person.
A learning apparatus according to an aspect of the present invention includes: a learning data acquisition unit configured to acquire, as learning data, a set of image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject; and a learning processing unit configured to train a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data upon the image data and the range data being input. With this configuration, a trained neural network to be used in the above-described target subject analysis apparatus can be constructed in accordance with a desired target of analysis.
Note that other modes of the target subject analysis apparatus and the learning apparatus according to the above respective modes may also include an information processing method and a program for realizing the above respective configurations, and a storage medium that records such a program and can be read by a computer or other kind of apparatus, machine, or the like. Here, a storage medium that can be read by a computer or the like is a medium that accumulates information such as a program by means of an electric, magnetic, optical, mechanical, or chemical effect.
For example, a target subject analysis method according to an aspect of the present invention is an information processing method wherein a computer executes: a data acquisition step of acquiring image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image; a computation processing step of obtaining an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network; and an attribute specifying step of specifying the attribute of the target subject based on the output value obtained from the neural network.
In the target subject analysis method according to the above-described aspect, in the attribute specifying step, the computer may specify, as the attribute of the target subject, at least one of an uneven state, material quality, a three-dimensional shape, and a flat state of the target subject.
In the target subject analysis method according to the above-described aspect, in the attribute specifying step, the computer may specify a plurality of physical characteristics of the target subject as the attribute of the target subject.
In the target subject analysis method according to the above-described aspect, the computer may further execute a selection step of selecting, in accordance with designation by a user, a neural network to be used in the computation processing step from among a plurality of trained neural networks that have been trained to determine attributes of different target subjects.
In the target subject analysis method according to the above-described aspect, the image data and the range data may be obtained by shooting a situation outside a vehicle as the target subject, and in the attribute specifying step, the computer may specify at least one of a state of a road surface, presence of an obstacle, and a type of obstacle as the attribute of the target subject, based on the output value obtained from the neural network.
In the target subject analysis method according to the above-described aspect, the image data and the range data may be obtained by shooting, as the target subject, a product manufactured on a production line, and in the attribute specifying step, the computer may specify at least one of a size of the product, a shape of the product, and presence of damage on the product as the attribute of the target subject, based on the output value obtained from the neural network.
In the target subject analysis method according to the above-described aspect, the image data and the range data may be obtained by shooting a human as the target subject, and in the attribute specifying step, the computer may specify at least one of a body shape of the human, a facial expression of the human, and a posture of the human as the attribute of the target subject, based on the output value obtained from the neural network.
For example, a target subject analysis program according to an aspect of the present invention is a program for causing a computer to execute: a data acquisition step of acquiring image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image; a computation processing step of obtaining an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network; and an attribute specifying step of specifying the attribute of the target subject based on the output value obtained from the neural network.
For example, a learning method according to an aspect of the present invention is an information processing method for executing a learning data acquisition step of acquiring, as learning data, a set of image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject; and a learning processing step of training a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data if the image data and the range data are input.
For example, a learning program according to an aspect of the present invention is a program for causing a computer to execute a learning data acquisition step of acquiring, as learning data, a set of image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject; and a learning processing step of training a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data if the image data and the range data are input.
According to the present invention, recognition accuracy with respect to attributes of a target subject can be increased with a simple configuration.
FIG. 1 schematically shows an example of an instance where a target subject analysis apparatus and a learning apparatus according to an embodiment are applied. FIG. 2 schematically shows an example of a hardware configuration of the target subject analysis apparatus according to an embodiment. FIG. 3 schematically shows an example of a hardware configuration of the learning apparatus according to an embodiment. FIG. 4 schematically shows an example of a functional configuration of the target subject analysis apparatus according to an embodiment. FIG. 5 is a diagram illustrating image data and range data according to an embodiment. FIG. 6 schematically shows an example of a functional configuration of the learning apparatus according to an embodiment. FIG. 7 shows an example of a processing procedure of the target subject analysis apparatus. FIG. 8A shows an example of an instance where attributes of a target subject are analyzed. FIG. 8B shows an example of an instance where attributes of a target subject are analyzed. FIG. 8C shows an example of an instance where attributes of a target subject are analyzed. FIG. 9 shows an example of a processing procedure of the learning apparatus.
Hereinafter, an embodiment according to an aspect of the present invention (also referred to as “the embodiment” below) is described based on the drawings. However, the embodiment described below is merely an example of the present invention in every respect. Needless to say, various improvements and modifications may be made without departing from the scope of the present invention. That is to say, to implement the present invention, a specific configuration corresponding to the embodiment may also be employed as appropriate. Note that, although data that is used in the embodiment is described using natural language, more specifically, the data is designated using pseudo-language, commands, parameters, machine language, or the like that can be recognized by a computer.
§1 Application examples
First, an example of an instance where the present invention is applied is described using FIG. 1. FIG. 1 schematically shows an example of an instance where a target subject analysis apparatus 1 and a learning apparatus 2 according to the embodiment are applied. The target subject analysis apparatus 1 according to the embodiment is an information processing apparatus for analyzing attributes of a target subject 6 using a neural network.
As shown in FIG. 1, the target subject analysis apparatus 1 acquires, via a camera 3, image data that represents an image including a figure of the target subject 6, and range data that represents values of the distance at respective pixels constituting the image. The target subject 6 may include anything that can be shot by a photographic device, and may be, for example, a scene such as a situation outside a vehicle, a product manufactured on a production line, or a predetermined object such as a person.
As will be described later, the camera 3 may not be particularly limited as long as it is a photographic device able to take a general image (e.g. monochrome image, color image) and measure the distance at the respective pixels constituting the image, and may be selected as appropriate, as per an embodiment. The camera 3 may be OPTEX’s ZC-1000L-HP series, Microsoft’s Kinect, ASUS’s Xtion, or LYTRO Japan’s ILLUM, for example.
Subsequently, the target subject analysis apparatus 1 performs computation processing with a trained neural network that is for determining attributes of the target subject 6 using the acquired image data and range data as the input to this neural network, thereby obtaining output values from the neural network. The target subject analysis apparatus 1 then specifies the attributes of the target subject 6 based on the output values obtained from the neural network. The attributes of the target subject 6 to be specified may not be particularly limited as long as the attributes are features of the target subject 6 that appear in the image, and may be selected as appropriate, as per an embodiment.
Meanwhile, the learning apparatus 2 according to the embodiment is an information processing apparatus for creating a neural network to be used by the target subject analysis apparatus 1, i.e. an apparatus for training the neural network. Specifically, the learning apparatus 2 acquires, as learning data, a set of data including the image data that represents an image including a figure of the target subject 6, the range data that represents the values of the distance at the respective pixels constituting the image, and attribute data that represents the attributes of the target subject 6. The learning data is created as appropriate in accordance with the attributes of the target subject 6 for which learning is desired.
Subsequently, the learning apparatus 2 trains the neural network, using the learning data, to output output values that correspond to the attributes represented by the attribute data upon the image data and the range data being input. Thus, a trained neural network to be used in the target subject analysis apparatus 1 is constructed. Note that, for example, the target subject analysis apparatus 1 may also acquire, via a network 10, a trained neural network that is constructed by the learning apparatus 2. The type of the network 10 may be selected as appropriate from among the Internet, a wireless communication network, a telecommunication network, a telephone network, a dedicated network, and the like, for example.
As described above, in the embodiment, the image data that represents the image including the figure of the target subject 6, as well as the range data that represents the values of the distance at the respective pixels constituting the image are used as the input to the neural network during analysis of the attributes of the target subject 6. That is to say, the image including the figure of the target subject 6 as well as the values of the distance obtained for the respective pixels in the image are used during the analysis of the attributes of the target subject 6 by the neural network. Thus, recognition accuracy of the neural network with respect to the attributes of the target subject 6 can be increased. In addition, in the embodiment, the attributes of the target subject 6 can be analyzed simply by inputting the image data and the range data to the neural network, without performing advanced image processing on the image. It is thus possible to realize processing for analyzing the attributes of the target subject 6 with a simple configuration, reduce the processing load on a CPU, and reduce the capacity of a memory to be used. Accordingly, the embodiment can increase the recognition accuracy with respect to the attributes of the target subject 6 with a simple configuration.
§2 Configuration example
(Hardware configuration)
<Target subject analysis apparatus>
Next, an example of a hardware configuration of the target subject analysis apparatus 1 according to the embodiment is described using FIG. 2. FIG. 2 schematically shows an example of a hardware configuration of the target subject analysis apparatus 1 according to the embodiment.
As shown in FIG. 2, the target subject analysis apparatus 1 according to the embodiment is a computer in which a control unit 11, a storage unit 12, a communication interface 13, an input device 14, an output device 15, an external interface 16, and a drive 17 are electrically connected to one another. However, in FIG. 2, the communication interface and the external interface are denoted as “communication I/F” and “external I/F”, respectively.
The control unit 11 includes a CPU (Central Processing Unit), which is a hardware processor, a RAM (Random Access Memory), a ROM (Read Only Memory), and the like, and controls the constituent elements in accordance with information processing. The storage unit 12 is an auxiliary storage device such as a hard disk drive or a solid-state drive, for example, and stores a target subject analysis program 121, which is to be executed by the control unit 11, learning result data 122 that represents information regarding the trained neural network, and the like. The storage unit 12 corresponds to a memory.
The target subject analysis program 121 is a program for causing the target subject analysis apparatus 1 to execute later-described processing for analyzing the attributes of the target subject 6 (FIG. 7). The learning result data 122 includes a configuration of the neural network, a connection weight between neurons, and information that represents a threshold value for each neuron, and is used for setting the trained neural network to be used in processing for analyzing the attributes of the target subject 6. Note that, in the embodiment, the storage unit 12 stores a plurality of pieces of learning result data 122.
The communication interface 13 is a wired LAN (Local Area Network) module, a wireless LAN module, or the like, for example, and is an interface for performing wired or wireless communication via a network. The input device 14 is a device for input, such as a mouse or a keyboard, for example. The output device 15 is a device for output, such as a display or a speaker, for example. The external interface 16 is a USB (Universal Serial Bus) port or the like, and is an interface for connecting to an external device such as the camera 3.
The drive 17 is a CD (Compact Disk) drive, a DVD (Digital Versatile Disk) drive, or the like, for example, and is a device for loading the program stored in a storage medium 91. The type of the drive 17 may be selected as appropriate in accordance with the type of the storage medium 91. The target subject analysis program 121 and/or the learning result data 122 may also be stored in this storage medium 91.
The storage medium 91 is a medium that accumulates information such as that of a program by means of an electric, magnetic, optical, mechanical, or chemical effect, so that a computer, other kind of apparatus, machine, or the like can read information such as that of the program. The target subject analysis apparatus 1 may also acquire the target subject analysis program 121 and/or the learning result data 122 from this storage medium 91.
Here, FIG. 2 shows a storage medium of a disk type such as a CD or DVD as an example of the storage medium 91. However, the type of the storage medium 91 is not limited to a disk type, and may also be a type other than a disk type. Examples of a storage medium of a type other than the disk type may include a semiconductor memory such as a flash memory.
Note that, regarding the specific hardware configuration of the target subject analysis apparatus 1, constituent elements may be omitted, replaced, and added as appropriate, as per an embodiment. For example, the control unit 11 may also include a plurality of hardware processors. Each of the hardware processors may be constituted by a microprocessor, an FPGA (field-programmable gate array), or the like. The target subject analysis apparatus 1 may also be constituted by a plurality of information processing apparatuses. The target subject analysis apparatus 1 may be an information processing apparatus designed exclusively for a service to be provided, as well as a general-purpose desktop PC (Personal Computer), a tablet PC, or the like.
<Learning apparatus>
Next, an example of a hardware configuration of the learning apparatus 2 according to the embodiment is described using FIG. 3. FIG. 3 schematically shows an example of a hardware configuration of the learning apparatus 2 according to the embodiment.
As shown in FIG. 3, the learning apparatus 2 according to the embodiment is a computer in which a control unit 21, a storage unit 22, a communication interface 23, an input device 24, an output device 25, an external interface 26, and a drive 27 are electrically connected to one another. Note that, in FIG. 3, the communication interface and the external interface are denoted as “communication I/F” and “external I/F”, respectively, similar to FIG. 2.
The constituent elements ranging from the control unit 21 to the drive 27 and a storage medium 92 are similar to the constituent elements ranging from the control unit 11 to the drive 17 and the storage medium 91 in the target subject analysis apparatus 1. Also, a camera 5, which is connected via the external interface 26, is similar to the camera 3 that is connected to the target subject analysis apparatus 1. However, the storage unit 22 in the learning apparatus 2 stores a learning program 221, which is to be executed by the control unit 21, learning data 222, which is to be used in the training of the neural network, and the like.
The learning program 221 is a program for causing the learning apparatus 2 to execute later-described learning processing of the neural network (FIG. 9). The learning data 222 is data for training the neural network to be able to analyze a desired attribute of the target subject 6, and includes image data, range data, and attribute data. The details of the learning data 222 will be described later.
Note that the learning program 221 and/or the learning data 222 may also be stored in the storage medium 92, as in the target subject analysis apparatus 1. Accordingly, the learning apparatus 2 may also acquire the learning program 221 and/or the learning data 222 to be used from the storage medium 92.
Also, as in the target subject analysis apparatus 1, regarding the specific hardware configuration of the learning apparatus 2, constituent elements may be omitted, replaced, and added as appropriate, as per an embodiment. Furthermore, the learning apparatus 2 may also include an information processing apparatus designed exclusively for a service to be served, as well as a general-purpose server device, a desktop PC, or the like.
(Functional configuration)
<Target subject analysis apparatus>
Next, an example of a functional configuration of the target subject analysis apparatus 1 according to the embodiment is described using FIG. 4. FIG. 4 schematically shows an example of the functional configuration of the target subject analysis apparatus 1 according to the embodiment.
The control unit 11 in the target subject analysis apparatus 1 loads, to the RAM, the target subject analysis program 121 stored in the storage unit 12. The control unit 11 then interprets and executes, using a CPU, the target subject analysis program 121 loaded to the RAM, and controls the constituent elements. Thus, as shown in FIG. 4, the target subject analysis apparatus 1 according to the embodiment functions as a computer including a data acquisition unit 111, a neural network computing unit 112, an attribute specifying unit 113, and a neural network selection unit 114.
The data acquisition unit 111 acquires image data 123, which represents an image including a figure of the target subject 6, and range data 124, which represents values of the distance at respective pixels constituting the image. The neural network computing unit 112 obtains output values from the neural network 7 by performing computation processing with this neural network 7 using the image data 123 and the range data 124 as the input to the trained neural network 7 for determining attributes of the target subject 6. The attribute specifying unit 113 specifies the attributes of the target subject 6 based on the output values obtained from the neural network 7. The neural network selection unit 114 selects a neural network 7 to be used by the neural network computing unit 112 from among a plurality of trained neural networks that have been trained to determine attributes of different target subjects, in accordance with designation by a user. Note that the neural network 7 to be used is set based on the learning result data 122.
Now, the image data 123 and the range data 124 will be described also using FIG. 5. FIG. 5 is a diagram illustrating the image data 123 and the range data 124 that are acquired by the camera 3. The camera 3 according to the embodiment is configured to be able to form a figure of the target subject to form an image, and measure the distance to the target subject from the respective pixels of the formed image. For example, the camera 3 is a shooting device that includes a phototransmitter unit for radiating infrared light, such as an infrared LED (Light Emitting Diode), and a photodetector unit for receiving infrared light and visible light, such as a CMOS (Complementary MOS) image sensor.
With this configuration, the camera 3 can acquire the image data 123, which expresses the colors of the pixels constituting the image using pixel values, by forming an image of the visible light reflected from the target subject using the photodetector unit. The pixel values of the pixels may also be expressed in an RGB color space, or may also be expressed in a gray scale color space, for example. A method of expressing the pixel values of the pixels can be selected as appropriate, as per an embodiment.
The camera 3 can also acquire the range data 124, which represents, for the respective pixels, values of the distance (depth) d from the camera 3 to a figure that appears in the pixels by measuring, for the respective pixels, the amount of time taken until the infrared light projected from the phototransmitter unit reaches the target subject and then returns to the photodetector unit (TOF method: Time Of Flight). This distance d may also be expressed as a linear distance d1 between the camera 3 and the target subject, or may also be expressed as a distance d2 on a horizontal axis from the camera 3 to a vertical line extending from the subject. Since the distance d1 and the distance d2 can be converted to each other based on the Pythagorean theorem or the like, the same description is applied to both cases of employing the distance d1 and employing the distance d2.
As described above, in the embodiment, the camera 3 acquires the pixel values and the values of the distance d with respect to the respective pixels constituting the image. Thus, the values of the distance d to the target subject can be acquired throughout the entire area to acquire the image. In addition, the pixel values and the values of the distance d can be acquired in one-to-one correspondence with the pixels.
However, the values of the distance d may not necessarily be acquired for all pixels constituting the image. That is to say, in the acquired range data, there may be pixels for which the values of the distance d to the target subject cannot be acquired for some reason, e.g. because the reflection of infrared light is interrupted. Note that, since the range data 124 represents the values of the distance at the respective pixels constituting the image, an image can also be represented by this range data 124. This image represented by the range data 124 may also be called a “range image”, distinguished from the image represented by the image data 123.
Next, the neural network 7 will be described. As shown in FIG. 4, the neural network 7 to be used is a multi-layer neural network that is used in so-called deep learning, and includes an input layer 71, an intermediate layer (hidden layer) 72, and an output layer 73 in this order from the input side.
In FIG. 4, the neural network 7 includes one intermediate layer 72, the output from the input layer 71 is input to the intermediate layer 72, and the output from the intermediate layer 72 is input to the output layer 73. However, the number of intermediate layers 72 may not be limited to one, and the neural network 7 may also include two or more intermediate layers 72.
Each of the layers 71 to 73 includes one or more neurons. For example, the number of neurons in the input layer 71 can be set in accordance with the number of pixels in the image data 123 and the range data 124. The number of neurons in the intermediate layer 72 can be set as appropriate, as per an embodiment. The number of neurons in the output layer 73 can be set in accordance with the number of types of attributes of the target subject 6 that are to be targets of analysis.
Neurons in adjacent layers are coupled as appropriate, and a weight (connection weight) is set for each connection. In the example in FIG. 4, each neuron is coupled to all neurons in an adjacent layer. However, the connection of neurons may not be limited to this example, and may be set as appropriate, as per an embodiment.
A threshold value is set for each neuron. Basically, the output of each neuron is determined based on whether or not the sum of products of the input and the weight exceeds a threshold value. The target subject analysis apparatus 1 specifies the attributes of the target subject 6 based on the output values obtained from the output layer 73 by inputting the image data 123 and the range data 124 to the input layer 71 in this neural network 7.
Learning apparatus
Next, an example of a functional configuration of the learning apparatus 2 according to the embodiment is described using FIG. 6. FIG. 6 schematically shows an example of the functional configuration of the learning apparatus 2 according to the embodiment.
The control unit 21 in the learning apparatus 2 loads, to the RAM, the learning program 221 stored in the storage unit 22. The control unit 21 then interprets and executes, using a CPU, the learning program 221 loaded to the RAM, and controls the constituent elements. Thus, as shown in FIG. 6, the learning apparatus 2 according to the embodiment functions as a computer including a learning data acquisition unit 211 and a learning processing unit 212.
The learning data acquisition unit 211 acquires, as the learning data 222, a set of data that includes image data 223 that represents an image including a figure of the target subject 6, range data 224 that represents the values of the distance at the respective pixels constituting the image, and attribute data 225 that represents attributes of the target subject 6. Using the learning data 222, the learning processing unit 212 trains a neural network 8 to output output values that correspond to the attributes represented by the attribute data 225 upon the image data 223 and the range data 224 being input.
The neural network 8, which is to be trained, includes an input layer 81, an intermediate layer (hidden layer) 82, and an output layer 83, and is configured similar to the aforementioned neural network 7. The layers 81 to 83 are similar to the layers 71 to 73, respectively. With this configuration, the neural network 8 is constructed to output output values corresponding to the attributes of the target subject 6 upon the image data and the range data being input. The learning processing unit 212 stores, in the storage unit 22, information indicating the configuration of the constructed neural network 8, connection weights between neurons, and threshold values for the respective neurons as the learning result data 122.
(Others)
The functions of the target subject analysis apparatus 1 and the learning apparatus 2 are described in detail in a later-described operation example. Note that the embodiment describes an example in which all of the functions of the target subject analysis apparatus 1 and the learning apparatus 2 are realized by a general-purpose CPU. However, some or all of the aforementioned functions may also be realized by one or more dedicated processors. Regarding the functional configurations of the target subject analysis apparatus 1 and the learning apparatus 2, functions may be omitted, replaced, and added as appropriate, as per an embodiment.
§3 Operation example
Target subject analysis apparatus
Next, an operation example of the target subject analysis apparatus 1 is described using FIG. 7. FIG. 7 is a flowchart showing an example of a processing procedure of the target subject analysis apparatus 1. Note that the processing procedure described below is merely an example, and processing may be modified as much as possible. In the processing procedure described below, steps may be omitted, replaced, and added as appropriate, as per an embodiment.
(Step S101)
In step S101, the control unit 11 functions as the neural network selection unit 114, and selects a neural network 7 to be used in later-described step S103 from among a plurality of trained neural networks that have been trained to determine attributes of different target subjects, in accordance with designation by a user.
In the embodiment, the target subject analysis apparatus 1 holds a plurality of sets of learning result data 122 for each target subject 6 that is a target of analysis and each type of attributes thereof, in the storage unit 12. The control unit 11 outputs the target subject 6 that is the target of analysis and the types of attributes thereof to the output device 15, and accepts designation of the target of analysis performed by the user using the input device 14. The control unit 11 then selects learning result data 122 to be used, in accordance with the designation accepted from the user, and sets the neural network 7 using the selected learning result data 122.
Specifically, the learning result data 122 includes the information indicating the configuration of the neural network 7, the connection weights between neurons, and the threshold values for the respective neurons so as to be able to set the neural network 7 that outputs output values corresponding to desired attributes of a target subject 6 of a desired type upon the image data and the range data being input. Based on the information indicating the configuration of the neural network 7, the control unit 11 sets the structure of the neural network 7, the number of neurons included in the respective layers 71 to 73, a connection state between neurons in adjacent layers, and the like. Also, the control unit 11 sets the values of parameters of the neural network 7 based on the information indicating the connection weights between neurons and the threshold values for the respective neurons. The setting of the neural network 7 to be used is thus completed.
Note that the target subject analysis apparatus 1 may also acquire each piece of the learning result data 122 from the learning apparatus 2 via the network 10, or may also acquire each piece from the storage medium 91 via the drive 17, in accordance with an operation made by the user to the input device 14. The target subject analysis apparatus 1 may also acquire each piece of the learning result data 122 by accepting distribution thereof from the learning apparatus 2. Furthermore, each piece of the learning result data 122 may also be stored in another information processing apparatus (storage device) such as a NAS (Network Attached Storage). The target subject analysis apparatus 1 may also acquire each piece of the learning result data 122 by accessing the other information processing apparatus when performing processing in step S101.
(Step S102)
In the next step S102, the control unit 11 functions as the data acquisition unit 111, and acquires the image data 123 and the range data 124. As mentioned above, in the embodiment, the camera 3 is configured to be able to acquire the image data 123 and the range data 124. For this reason, the control unit 11 acquires the image data 123 and the range data 124 from the camera 3 via the external interface 16.
(Step S103)
In the next step S103, the control unit 11 functions as the neural network computing unit 112, and obtains output values from the neural network 7 by performing computation processing with this neural network 7 using the image data 123 and the range data 124 as the input to the trained neural network 7 for determining attributes of the target subject 6.
Specifically, the control unit 11 inputs the pixel values of the respective pixels included in the image data 123 and the values of the distance at the respective pixels included in the range data 124 to the neurons included in the input layer 71 in the neural network 7 that is set in step S101. A correspondence relationship between the respective values and the neurons may be set as appropriate, as per an embodiment. Next, the control unit 11 performs determination regarding ignition of the neurons included in the layers 71 to 73, in the direction of forward propagation. The control unit 11 can thus obtain an output value from each neuron included in the output layer 73 of the neural network 7.
(Step S104)
In the next step S104, the control unit 11 functions as the attribute specifying unit 113, and specifies attributes of the target subject 6 based on the output values obtained from the neural network 7 in step S103.
As mentioned above, the neural network 7 has already been trained to output output values corresponding to desired attributes of the target subject 6 of a desired type upon the image data 123 and the range data 124 being input. As many output values as there are neurons included in the output layer 73 can be obtained, and one or more of the obtained output values can be associated with one attribute (attribute value) of the target subject 6. Information indicating a correspondence relationship between the attributes (attribute values) of the target subject 6 and the output values of the neural network 7 can be given by data in a table form, for example.
The control unit 11 specifies the attributes (attribute values) of the target subject 6 based on the output values obtained in step S103, by referencing the information indicating the correspondence relationship between the attributes (attribute values) of the target subject 6 and the output values of the neural network 7. The number of attributes of the target subject 6 to be specified may be selected as appropriate, as per an embodiment. Specific examples will be described later. Here, the control unit 11 ends processing for analyzing the target subject 6 according to this operation example.
(Analysis example)
Next, three specific examples of attribute analysis of the target subject 6 using the target subject analysis apparatus 1 are described using FIGS. 8A to 8C.
(1) First specific example (analysis of situation outside vehicle)
First, a first specific example is described using FIG. 8A. FIG. 8A schematically shows, as the first specific example, an instance where the target subject analysis apparatus 1 is used for analyzing a situation outside a vehicle such as an automobile. In this case, as shown in FIG. 8A, the camera 3 is an in-vehicle camera that is installed so as to be able to shoot a situation outside the vehicle (e.g. front side of the vehicle). The target subject analysis apparatus 1 is an information processing apparatus such as an in-vehicle device that can be connected to the camera 3 or a general-purpose PC.
In the above-described step S101, in response to accepting designation of analysis of a situation outside the vehicle from a user, the control unit 11 sets a neural network 7A based on corresponding learning result data 122. Subsequently, in the above-described step S102, the control unit 11 acquires, from the camera 3, image data 123A and range data 124A, which are obtained by shooting the situation outside the vehicle as an target subject 6A.
Next, in the above-described step S103, the control unit 11 inputs the pixel values of the respective pixels included in the image data 123A and the values of the distance at the respective pixels included in the range data 124A, to neurons included in an input layer 71A of the neural network 7A. Furthermore, the control unit 11 performs computation for determination regarding ignition of the neurons included in the input layer 71A, an intermediate layer 72A, and an output layer 73A in the direction of forward propagation, and obtains output values from the neurons included in the output layer 73A.
In the above-described step S104, the control unit 11 specifies, as an attribute of the target subject 6A, at least one of the state of a road surface, the presence of an obstacle, and the type of obstacle, based on the output values obtained from the neural network 7A. The type of the situation outside the vehicle that is to be a target of analysis may be selected as appropriate, as per an embodiment.
For example, as shown in FIG. 8A, the number of neurons included in the output layer 73A may also be three or more. The target subject analysis apparatus 1 may set at least two types of attributes, namely the state of the road surface and the presence of an obstacle(s), as targets of analysis. In addition, there may be four types of attribute values of the state of the road surface that is to be a target of analysis, namely flat, uneven, upward slope, and downward slope. The respective states of the road surface may be associated with output values (0, 0), (0, 1), (1, 0), and (1, 1). Furthermore, there may be two types of attribute values of the presence of an obstacle(s) that is to be a target of analysis, namely “no obstacle present” and “an obstacle(s) present”, and the respective attribute values may be associated with output values 0 and 1.
In this case, the control unit 11 can recognize that the state of the road surface within a shooting area is flat, uneven, upslope, or downslope, based on the output value corresponding to the state of the road surface from the neural network 7A being (0, 0), (0, 1), (1, 0), or (1, 1). The control unit 11 can also recognize that no obstacle is present or an obstacle(s) is present within the shooting area, based on the output value corresponding to the presence of an obstacle(s) from the neural network 7A being 0 or 1.
Note that the results of recognition of the state of the road surface and the presence of an obstacle(s) may also be used in autonomous driving of a vehicle. For example, the control unit 11 may also control the driving speed of the vehicle based on the result of recognition of the state of the road surface. Also, for example, when it is recognized that an obstacle(s) is present, the control unit 11 may also control the vehicle so as to stop before reaching the obstacle.
(2) Second specific example (analysis of state of product manufactured on production line)
Next, a second specific example is described using FIG. 8B. FIG. 8B schematically shows, as the second specific example, an instance where the target subject analysis apparatus 1 is used for analyzing a state of a product manufactured on a production line. In this case, as shown in FIG. 8B, the camera 3 is installed so as to be able to shoot a product that is flowing on the production line.
In the above-described step S101, in response to designation of analysis of a product manufactured on the production line accepted from a user, the control unit 11 sets a neural network 7B based on corresponding learning result data 122. Subsequently, in the above-described step S102, the control unit 11 acquires, from the camera 3, image data 123B and range data 124B that are obtained by shooting the product manufactured on the production line as a target subject 6B.
Next, in the above-described step S103, the control unit 11 inputs the pixel values of the respective pixels included in the image data 123B and the values of the distance at the respective pixels included in the range data 124B, to neurons included in an input layer 71B of the neural network 7B. Furthermore, the control unit 11 performs computation for determination regarding ignition of the neurons included in the input layer 71B, an intermediate layer 72B, and an output layer 73B in the direction of forward propagation, and obtains output values from the neurons included in the output layer 73B.
In the above-described step S104, the control unit 11 specifies at least one of the size and the shape of the product, and the presence of damage thereon as an attribute of the target subject 6B, based on the output values obtained from the neural network 7B. The type of state of the product that is to be the target of analysis may be selected as appropriate, as per an embodiment, as in the above-described first specific example. For example, FIG. 8B shows an instance where at least two types of attributes, namely the shape of the product and the presence of damage are targets of analysis, based on the output values from the output layer 73B. A correspondence relationship between the respective attribute values and the output values may be set as appropriate, as per an embodiment.
Note that the results of recognizing the state of the product may also be used to determine whether or not there is any abnormality on the produced product. For example, if the control unit 11 recognizes that there is damage on the product, it may also determine that there is an abnormality on the product that appears in the image data 123B and the range data 124B. If not, the control unit 11 may also determine that there is no abnormality on the target product. The control unit 11 may also control the production line so as to carry the product for which it has been determined that there is an abnormality to a line other than the line for products with no abnormality.
(3) Third specific example (analysis of human features)
Next, a third specific example is described using FIG. 8C. FIG. 8C schematically shows, as the third specific example, an instance where the target subject analysis apparatus 1 is used for analyzing human features. In this case, as shown in FIG. 8C, the camera 3 is installed so as to be able to shoot a person, who is to be a target of feature analysis.
In the above-described step S101, in response to designation of feature analysis of a target person accepted from a user, the control unit 11 sets a neural network 7C based on corresponding learning result data 122. Subsequently, in the above-described step S102, the control unit 11 acquires, from the camera 3, image data 123C and range data 124C that are obtained by shooting the target person as a target subject 6C.
Next, in the above-described step S103, the control unit 11 inputs the pixel values of the respective pixels included in the image data 123C and the values of the distance at the respective pixels included in the range data 124C, to neurons included in an input layer 71C of the neural network 7C. Furthermore, the control unit 11 performs computation for determination regarding ignition of the neurons included in the input layer 71C, an intermediate layer 72C, and an output layer 73C in the direction of forward propagation, and obtains output values from the neurons included in the output layer 73C.
In the above-described step S104, the control unit 11 specifies at least one of the body shape, facial expression, and posture of the human as the attributes of the target subject 6C, based on the output values obtained from the neural network 7C. The types of human features that are to be targets of analysis may be selected as appropriate, as per an embodiment, as in the above-described first and second specific examples. For example, FIG. 8C shows an instance where at least two types of attributes, namely the body shape and the posture are the targets of analysis, based on the output values from the output layer 73C. A correspondence relationship between the respective attribute values and the output values may be set as appropriate, as per an embodiment.
Note that the results of recognizing the features of the person may also be used in determination of the state of health or the like. For example, when the control unit 11 recognizes that the body shape of the target person is of an obese type, it may also give a warning to pay attention to being obese.
(4) Others
Each of the above-described three specific examples is an example of usage of the target subject analysis apparatus 1. The usage of the target subject analysis apparatus 1 may be modified as appropriate, as per an embodiment. For example, the target subject analysis apparatus 1 may also be configured to specify a three-dimensional shape of the target subject 6 based on the output values obtained from the neural network 7 by inputting image data and range data that partially render the target subject 6.
Note that, as described in the above three specific examples, the target subject analysis apparatus 1 may also be configured to specify a plurality of physical characteristics of the target subject 6 as the attributes of the target subject 6 in the above-described step S104. The physical characteristics refer to features that physically appear on the target subject 6, such as the size, shape, and orientation of the target subject 6. The physical characteristics include geometric features such as the size, shape, and orientation of the target subject 6, and features concerning material quality such as the composition of the target subject 6. In the embodiment, the range data 124 that configures a range image having values of the distances from the respective pixels is used as the input to the neural network 7. In this range image, physical characteristics of the target subject 6 (particularly, geometric features) are relatively likely to appear. For this reason, by configuring the target subject analysis apparatus 1 so as to specify a plurality of physical characteristics of the target subject 6, the target subject analysis apparatus 1 can identify the target subject 6 relatively accurately.
In the above-described step S104, the target subject analysis apparatus 1 may also be configured to specify at least one of an uneven state, material quality, a three-dimensional shape, and a flat state of the target subject 6 as the physical characteristics of the target subject 6. The uneven state indicates the shapes, sizes, or the like of a raised portion, such as a protrusion that is present on the target subject 6, and a recessed portion, such as an opening or a hole. The flat state indicates the degree of expansion of a face of the target subject 6, the degree of tilt thereof, or the like.
For example, in the case of the above-described first specific example, the target subject analysis apparatus 1 can specify, as the uneven state, whether the shape of the road surface is raised or recessed relative to the camera 3. For example, the target subject analysis apparatus 1 can specify, as an attribute of the target subject 6, whether a transparent object placed on the road surface is glass or a puddle. For example, the target subject analysis apparatus 1 can specify, as the three-dimensional shape of the target subject 6, whether the target subject 6 is a flat object such as a sign board, or a three-dimensional object that is relatively thick. The uneven state, material quality, three-dimensional shape, and flat state are attributes that are difficult to analyze using a single image. According to the embodiment, due to using not only the image data 123 but also the range data 124, these attributes, which are difficult to analyze using a single image, can be identified relatively accurately.
(Learning apparatus)
Next, an operation example of the learning apparatus 2 is described using FIG. 9. FIG. 9 is a flowchart showing an example of a processing procedure of the learning apparatus 2. Note that the processing procedure described below is merely an example, and processing may be modified as much as possible. In the processing procedure described below, steps may be omitted, replaced, and added as appropriate, as per an embodiment.
(Step S201)
In step S201, the control unit 21 functions as the learning data acquisition unit 211, and acquires, as the learning data 222, a set of data that includes the image data 223 that represents an image including a figure of the target subject 6, the range data 224 that represents the values of the distance at the respective pixels constituting the image, and the attribute data 225 that represents attributes of the target subject 6.
The learning data 222 is data for training the neural network 8 to be able to analyze attributes of a desired target subject 6. This learning data 222 can be created by shooting a prepared target subject 6 using the camera 5 under various shooting conditions, and associating obtained images with the shooting conditions.
Specifically, the control unit 21 shoots the target subject 6 in a state where the attributes that are targets of analysis appear, using the camera 5. Since the camera 5 is configured similar to the above-described camera 3, the control unit 21 can acquire, through this shooting, the image data 223 that represents an image of the target subject 6 in which the attributes that are the targets of analysis appear, as well as the range data 224. The control unit 21 can then create each piece of the learning data 222 by accepting, as appropriate, the input of the attribute data 225 (i.e. training data) that represents the attributes of the target subject 6 that appear in the image, and associating the input attribute data 225 with the acquired image data 223 and range data 224. The learning data 222 may also be created manually by an operator or the like, or may also be created automatically by a robot or the like.
Note that the learning data 222 may also be created by the learning apparatus 2 as described above, or may also be created by an information processing apparatus other than the learning apparatus 2. In the case where the learning apparatus 2 creates the learning data 222, the control unit 21 can acquire the learning data 222 by executing processing for creating the learning data 222 in step S201. On the other hand, in the case where an information processing apparatus other than the learning apparatus 2 creates the learning data 222, the learning apparatus 2 can acquire the learning data 222 that is created by another information processing apparatus via a network, the storage medium 92, or the like. The number of pieces of learning data 222 to be acquired in step S201 may be determined as appropriate, as per an embodiment, so that the neural network 8 can be trained.
(Step S202)
In the next step S202, the control unit 21 functions as the learning processing unit 212, and trains the neural network 8, using the learning data 222 acquired in step S201, to output output values corresponding to the attributes represented by the attribute data 225 upon the image data 223 and the range data 224 being input.
Specifically, first, the control unit 21 prepares the neural network 8 for which learning processing is to be performed. The configuration of the neural network 8 to be prepared, initial values of the connection weights between neurons, and initial values of the threshold values for the respective neurons may also be given by a template, or may also be given by input made by the operator. In the case of performing re-training, the control unit 21 may also prepare the neural network 8, based on the learning result data 122 that is to be re-trained.
Next, the control unit 21 causes the neural network 8 to undergo training using, as input data, the image data 223 and the range data 224 included in each piece of the learning data 222 acquired in step S201, and also using the attribute data 225 as training data. The neural network 8 may be trained using a gradient descent method, stochastic gradient descent method, or the like.
For example, the control unit 21 calculates errors between the output values that are output from the output layer 83 as a result of inputting the image data 223 and the range data 224 to the input layer 81 and desired values corresponding to the attributes represented by the attribute data 225. Subsequently, the control unit 21 calculates errors between the connection weights between neurons and threshold values for the respective neurons, using an error back-propagation method. Based on the calculated errors, the control unit 21 then updates the values of the connection weights between neurons and the threshold values for the respective neurons.
The control unit 21 trains the neural network 8 by repeating this series of processing for each piece of the learning data 222 until the output values output from the neural network 8 coincides with the desired values corresponding to the attributes represented by the attribute data 225. Thus, a neural network 8 can be constructed that outputs output values corresponding to the attributes represented by the attribute data 225 when the image data 223 and the range data 224 are input thereto.
(Step S203)
In the next step S203, the control unit 21 functions as the learning processing unit 212, and stores, in the storage unit 22, information indicating the configuration of the constructed neural network 8, the connection weights between neurons, and the threshold values for the respective neurons as the learning result data 122. Here, the control unit 21 ends learning processing for the neural network 8 according to this operation example.
(Effects)
As described above, in the embodiment, the image data 123 that represents an image including a figure of the target subject 6, as well as the range data 124 that represents the distance at the respective pixels constituting the image are acquired through the above-described step S102. Then, through the above-described steps S103 and S104, the image data 123 and the range data 124 are input to the neural network 7, and attributes of the target subject 6 are specified based on the output values obtained from the neural network 7.
That is to say, in the embodiment, a two-dimensional image (image data 123) in which the target subject 6 appears, as well as a range image (range data 124) in which the target subject 6 also appears are used as the input to the neural network 7. Thus, the recognition accuracy of the neural network 7 with respect to the attributes of the target subject 6 can be increased. In addition, in the embodiment, the attributes of the target subject 6 can be analyzed simply by inputting the image data 123 and the range data 124 to the neural network 7 in step S103 as in the above-described analysis processing, without performing advanced image processing on the images in which the target subject 6 appears. Accordingly, according to the embodiment, it is possible to increase the recognition accuracy with respect to the attributes of the target subject 6 with a simple configuration, reduce the processing load on the CPU, and reduce the capacity of the memory to be used.
In addition, in the embodiment, the target subject analysis apparatus 1 holds a plurality of pieces of learning result data 122, and sets the neural network 7 to be used in accordance with designation by a user, through the processing in step S101. Accordingly, according to the embodiment, it is possible to prepare, in advance, the learning result data 122 suitable for the respective attributes of the target subject 6, and thus realize analysis processing suitable for the respective attributes of the target subject 6.
§4 Modifications
Although the embodiment of the present invention has been described above in detail, the above descriptions are merely examples of the present invention in all aspects. Needless to say, various improvements and modifications may be made without departing from the scope of the present invention. For example, the following modifications are possible. Note that in the following description, the same constituent elements as the constituent elements described in the above embodiment are assigned the same signs, and descriptions of the same points as the points described in the above embodiment are omitted as appropriate. The following modifications may be combined as appropriate.
<4.1>
For example, in the above embodiment, the target subject analysis apparatus 1 is configured to be able to hold a plurality of pieces of learning result data 122, and set a neural network 7 for analyzing attributes of a desired target subject 6 in accordance with designation by a user. However, the configuration of the target subject analysis apparatus 1 may not be limited to this example. The target subject analysis apparatus 1 may also be configured to hold one piece of learning result data 122. In this case, the above-described neural network selection unit 114 and step S101 may also be omitted.
<4.2>
In the above embodiment, the target subject analysis apparatus 1 that analyzes attributes of a target subject and the learning apparatus 2 that trains a neural network are configured by different computers. However, the configurations of the target subject analysis apparatus 1 and the learning apparatus 2 may not be limited to this example. A system having functions of both the target subject analysis apparatus 1 and the learning apparatus 2 may be realized by one or more computers.
<4.3>
In the above embodiment, as shown in FIGS. 4 and 6, the types of the neural networks (7, 8) are general, forward-propagation multi-layer neural networks. However, the types of the neural networks (7, 8) may not be limited to this example, and may be selected as appropriate, as per an embodiment. For example, the neural networks (7, 8) may also be convolutional neural networks that use the input layer 71 and the intermediate layer 72 as a convolution layer and a pooling layer, respectively. Also, for example, the neural networks (7, 8) may be recurrent neural networks having connections that recur from the output side to the input side, e.g. from the intermediate layer 72 to the input layer 71.
(Note 1)
A target subject analysis apparatus including:
a hardware processor; and
a memory configured to hold a program to be executed by the hardware processor,
wherein the hardware processor is configured to execute the program to perform:
a data acquisition step of acquiring image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image;
a computation processing step of obtaining an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network; and
an attribute specifying step of specifying the attribute of the target subject based on the output value obtained from the neural network.
(Note 2)
A target subject analysis method including:
a data acquisition step of acquiring image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image, by a hardware processor;
a computation processing step of obtaining an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network, by the hardware processor; and
an attribute specifying step of specifying the attribute of the target subject based on the output value obtained from the neural network, by the hardware processor.
(Note 3)
A learning apparatus including:
a hardware processor; and
a memory configured to hold a program to be executed by the hardware processor,
wherein the hardware processor is configured to execute the program to perform:
a learning data acquisition step of acquiring, as learning data, a set of data that includes image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject; and
a learning processing step of training a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data if the image data and the range data are input.
(Note 4)
A learning method including:
a learning data acquisition step of acquiring, as learning data, a set of image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject, by a hardware processor; and
a learning processing step of training a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data if the image data and the range data are input, by the hardware processor.

Claims (16)

  1. A target subject analysis apparatus comprising:
    a data acquisition unit configured to acquire image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image;
    a neural network computing unit configured to obtain an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network; and
    an attribute specifying unit configured to specify the attribute of the target subject based on the output value obtained from the neural network.
  2. The target subject analysis apparatus according to claim 1,
    wherein the attribute specifying unit specifies, as the attribute of the target subject, at least one of an uneven state, material quality, a three-dimensional shape, and a flat state of the target subject.
  3. The target subject analysis apparatus according to claim 1 or 2,
    wherein the attribute specifying unit specifies, as the attribute of the target subject, a plurality of physical characteristics of the target subject.
  4. The target subject analysis apparatus according to any one of claims 1 to 3, further comprising:
    a neural network selection unit configured to select, in accordance with designation by a user, a neural network to be used by the neural network computing unit from among a plurality of trained neural networks that have been trained to determine attributes of different target subjects.
  5. The target subject analysis apparatus according to any one of claims 1 to 4,
    wherein the image data and the range data are obtained by shooting a situation outside a vehicle as the target subject, and
    the attribute specifying unit specifies, as the attribute of the target subject, at least one of a state of a road surface, presence of an obstacle, and a type of obstacle, based on the output value obtained from the neural network.
  6. The target subject analysis apparatus according to any one of claims 1 to 5,
    wherein the image data and the range data are obtained by shooting, as the target subject, a product manufactured on a production line, and
    the attribute specifying unit specifies, as the attribute of the target subject, at least one of a size of the product, a shape of the product, and presence of damage on the product, based on the output value obtained from the neural network.
  7. The target subject analysis apparatus according to any one of claims 1 to 6,
    wherein the image data and the range data are obtained by shooting a human as the target subject, and
    the attribute specifying unit specifies, as the attribute of the target subject, at least one of a body shape of the human, a facial expression of the human, and a posture of the human, based on the output value obtained from the neural network.
  8. A target subject analysis method comprising:
    acquiring, by a computer, image data that represents an image including a figure of a target subject, and range data that represents a value of a distance at each pixel constituting the image;
    obtaining, by the computer, an output value from a trained neural network for determining an attribute of the target subject, by performing computation processing with the neural network using the image data and the range data that are acquired as input to the neural network; and
    specifying, by the computer, the attribute of the target subject based on the output value obtained from the neural network.
  9. The target subject analysis method according to claim 8,
    wherein the computer specifies, as the attribute of the target subject, at least one of an uneven state, material quality, a three-dimensional shape, and a flat state of the target subject.
  10. The target subject analysis method according to claim 8 or 9,
    wherein the computer specifies a plurality of physical characteristics of the target subject as the attribute of the target subject.
  11. The target subject analysis method according to any one of claims 8 to 10, further comprising:
    selecting, by the computer, in accordance with designation by a user, a neural network to be used in the computation processing step from among a plurality of trained neural networks that have been trained to determine attributes of different target subjects.
  12. The target subject analysis method according to any one of claims 8 to 11,
    wherein the image data and the range data are obtained by shooting a situation outside a vehicle as the target subject, and
    the computer specifies at least one of a state of a road surface, presence of an obstacle, and a type of obstacle as the attribute of the target subject, based on the output value obtained from the neural network.
  13. The target subject analysis method according to any one of claims 8 to 12,
    wherein the image data and the range data are obtained by shooting, as the target subject, a product manufactured on a production line, and
    the computer specifies at least one of a size of the product, a shape of the product, and presence of damage on the product as the attribute of the target subject, based on the output value obtained from the neural network.
  14. The target subject analysis method according to any one of claims 8 to 13,
    wherein the image data and the range data are obtained by shooting a human as the target subject, and
    the computer specifies at least one of a body shape of the human, a facial expression of the human, and a posture of the human as the attribute of the target subject, based on the output value obtained from the neural network.
  15. A learning apparatus comprising:
    a learning data acquisition unit configured to acquire, as learning data, a set of data including image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject; and
    a learning processing unit configured to train a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data upon the image data and the range data being input.
  16. A learning method comprising:
    acquiring, by a computer, as learning data, a set of data including image data that represents an image including a figure of a target subject, range data that represents a value of a distance at each pixel constituting the image, and attribute data that represents an attribute of the target subject; and
    training, by the computer, a neural network, using the learning data, to output an output value corresponding to the attribute represented by the attribute data upon the image data and the range data being input.
PCT/JP2018/005819 2017-03-07 2018-02-20 Target subject analysis apparatus, target subject analysis method, learning apparatus, and learning method WO2018163786A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2017-042661 2017-03-07
JP2017042661A JP2018147286A (en) 2017-03-07 2017-03-07 Object analyzing apparatus, object analyzing method, learning apparatus, and learning method

Publications (2)

Publication Number Publication Date
WO2018163786A2 true WO2018163786A2 (en) 2018-09-13
WO2018163786A3 WO2018163786A3 (en) 2018-11-01

Family

ID=61656281

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/005819 WO2018163786A2 (en) 2017-03-07 2018-02-20 Target subject analysis apparatus, target subject analysis method, learning apparatus, and learning method

Country Status (2)

Country Link
JP (1) JP2018147286A (en)
WO (1) WO2018163786A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111381663A (en) * 2018-12-28 2020-07-07 技嘉科技股份有限公司 Efficiency optimization method of processor and mainboard using same
WO2021128343A1 (en) * 2019-12-27 2021-07-01 Siemens Aktiengesellschaft Method and apparatus for product quality inspection
US12020479B2 (en) 2020-10-26 2024-06-25 Seiko Epson Corporation Identification method, identification system, and non-transitory computer-readable storage medium storing a program

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7160606B2 (en) * 2018-09-10 2022-10-25 株式会社小松製作所 Working machine control system and method
WO2020115866A1 (en) * 2018-12-06 2020-06-11 株式会社DeepX Depth processing system, depth processing program, and depth processing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06124120A (en) 1992-10-14 1994-05-06 Ricoh Co Ltd Motion controller
JP2005346297A (en) 2004-06-01 2005-12-15 Fuji Heavy Ind Ltd Three-dimensional object recognition device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02287860A (en) * 1989-04-28 1990-11-27 Omron Corp Information processor
US7840342B1 (en) * 1997-10-22 2010-11-23 Intelligent Technologies International, Inc. Road physical condition monitoring techniques
JP4799104B2 (en) * 2005-09-26 2011-10-26 キヤノン株式会社 Information processing apparatus and control method therefor, computer program, and storage medium
JP2011204195A (en) * 2010-03-26 2011-10-13 Panasonic Electric Works Co Ltd Device and method for inspection of irregularity
JP5642049B2 (en) * 2011-11-16 2014-12-17 クラリオン株式会社 Vehicle external recognition device and vehicle system using the same
JP6754619B2 (en) * 2015-06-24 2020-09-16 三星電子株式会社Samsung Electronics Co.,Ltd. Face recognition method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06124120A (en) 1992-10-14 1994-05-06 Ricoh Co Ltd Motion controller
JP2005346297A (en) 2004-06-01 2005-12-15 Fuji Heavy Ind Ltd Three-dimensional object recognition device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111381663A (en) * 2018-12-28 2020-07-07 技嘉科技股份有限公司 Efficiency optimization method of processor and mainboard using same
WO2021128343A1 (en) * 2019-12-27 2021-07-01 Siemens Aktiengesellschaft Method and apparatus for product quality inspection
US11651587B2 (en) 2019-12-27 2023-05-16 Siemens Aktiengesellschaft Method and apparatus for product quality inspection
US12020479B2 (en) 2020-10-26 2024-06-25 Seiko Epson Corporation Identification method, identification system, and non-transitory computer-readable storage medium storing a program

Also Published As

Publication number Publication date
JP2018147286A (en) 2018-09-20
WO2018163786A3 (en) 2018-11-01

Similar Documents

Publication Publication Date Title
WO2018163786A2 (en) Target subject analysis apparatus, target subject analysis method, learning apparatus, and learning method
CN108885701B (en) Time-of-flight depth using machine learning
US20210287037A1 (en) Object detection method and apparatus, electronic device, and storage medium
CN110689584B (en) Active rigid body pose positioning method in multi-camera environment and related equipment
JP5778237B2 (en) Backfill points in point cloud
CN107466411B (en) Two-dimensional infrared depth sensing
US11068754B1 (en) Systems and methods regarding image distification and prediction models
CN109902702A (en) The method and apparatus of target detection
US10262226B1 (en) Systems and methods regarding 2D image and 3D image ensemble prediction models
US11670097B2 (en) Systems and methods for 3D image distification
JP6168577B2 (en) System and method for adjusting a reference line of an imaging system having a microlens array
CN108040496A (en) The computer implemented method of distance of the detection object away from imaging sensor
US20210174548A1 (en) Calibrating cameras using human skeleton
US11563903B2 (en) Optical sensor, learning apparatus, and image processing system for selection and setting light-receving elements
CN110793544A (en) Sensing sensor parameter calibration method, device, equipment and storage medium
JP7015377B2 (en) Shooting equipment, shooting method and shooting program, and shooting system
US9805454B2 (en) Wide field-of-view depth imaging
CN109313821A (en) Three dimensional object scanning feedback
JP2020042503A (en) Three-dimensional symbol generation system
KR20170028371A (en) Color identification using infrared imaging
US11551037B2 (en) Method and apparatus for determining a physical shape, method for manufacturing a calculation device, calculation device, and use of the calculation device
Azorin-Lopez et al. A novel active imaging model to design visual systems: a case of inspection system for specular surfaces
KR102645539B1 (en) Apparatus and method for encoding in a structured depth camera system
WO2023188919A1 (en) Screw configuration estimation device, screw configuration estimation method, and computer program
Shojaeipour et al. Laser-pointer rangefinder between mobile robot and obstacles via webcam based

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18711416

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18711416

Country of ref document: EP

Kind code of ref document: A2