WO2021200329A1 - Information processing device, information processing method, and information processing program - Google Patents

Information processing device, information processing method, and information processing program Download PDF

Info

Publication number
WO2021200329A1
WO2021200329A1 PCT/JP2021/011644 JP2021011644W WO2021200329A1 WO 2021200329 A1 WO2021200329 A1 WO 2021200329A1 JP 2021011644 W JP2021011644 W JP 2021011644W WO 2021200329 A1 WO2021200329 A1 WO 2021200329A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition
unit
feature amount
image
information processing
Prior art date
Application number
PCT/JP2021/011644
Other languages
French (fr)
Japanese (ja)
Inventor
佑介 日永田
卓 青木
竜太 佐藤
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2021200329A1 publication Critical patent/WO2021200329A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • This disclosure relates to an information processing device, an information processing method, and an information processing program.
  • image recognition function it is possible to improve the detection performance of an object by using a higher resolution captured image.
  • image recognition using a high-resolution captured image requires a large amount of calculation related to the image recognition process, and it is difficult to improve the simultaneity of the recognition process for the captured image.
  • An object of the present disclosure is to provide an information processing device, an information processing method, and an information processing program capable of improving the characteristics of recognition processing using captured images.
  • the information processing apparatus is a generation unit that generates a sampled image composed of sampled pixels in which imaging information composed of pixels is acquired according to pixel positions set for each divided region divided by a predetermined pattern.
  • the recognition process is performed based on the calculation unit for calculating the feature amount of the sampled image, the storage unit for accumulating the calculated feature amount, and the feature amount of at least a part of the feature amount accumulated in the storage unit, and the recognition result is obtained.
  • a recognition unit for output and an output control unit for controlling the recognition unit to output a recognition result based on a predetermined feature amount among the feature amounts stored in the storage unit are provided.
  • First Embodiment 2-1 Outline of the first embodiment 2-2. More specific configuration example according to the first embodiment 2-3. More specific processing according to the first embodiment 2-4. First modification of the first embodiment 2-5. Second modification of the first embodiment 2-6. Another modification of the first embodiment 3. Second Embodiment 3-1. Modification example of the second embodiment 4. Third Embodiment 4-1. Application example of the technology of the present disclosure 4-2. Application example to mobile
  • FIG. 1 is a block diagram showing a basic configuration example of an information processing apparatus applicable to each embodiment.
  • the information processing device 1a includes a sensor unit 10a and a recognition processing unit 20a.
  • the sensor unit 10a includes an imaging means (camera) and an imaging control unit that controls the imaging means.
  • the sensor unit 10a performs imaging under the control of the imaging control unit, and supplies the image data of the captured image acquired by the imaging to the recognition processing unit 20a.
  • the recognition processing unit 20a uses DNN (Deep Neural Network) to perform recognition processing on image data. More specifically, the recognition processing unit 20a includes a recognition model pre-learned using predetermined teacher data by machine learning, and DNN based on the recognition model with respect to the image data supplied from the sensor unit 10a. Performs recognition processing using.
  • the recognition processing unit 20a outputs the recognition result of the recognition processing to, for example, the outside of the information processing device 1a.
  • FIGS. 2A and 2B are diagrams schematically showing an example of recognition processing by DNN.
  • one image is input to the DNN as shown in FIG. 2A.
  • DNN recognition processing is performed on the input image, and the recognition result is output.
  • the DNN executes the feature extraction process and the recognition process.
  • the feature amount is extracted from the input image by the feature extraction process.
  • This feature extraction process is performed using, for example, CNN (Convolutional Neural Network) of DNN.
  • the recognition process is executed on the extracted feature amount, and the recognition result is obtained.
  • recognition processing can be executed using time-series information.
  • 3A and 3B are diagrams schematically showing an example of identification processing by DNN when time series information is used.
  • identification processing by DNN is performed using a fixed number of past information on the time series.
  • the image of the time T [T] the image of the time T-1 before the time T [T-1]
  • the identification process is executed for each of the input images [T], [T-1] and [T-2], and the recognition result [T] at the time T is obtained.
  • FIG. 3B is a diagram for explaining the process of FIG. 3A in more detail.
  • DNN for each of the input images [T], [T-1], and [T-2], a pair of feature extraction processes described with reference to FIG. 2B described above is performed. 1 is executed, and the feature quantities corresponding to the images [T], [T-1] and [T-2] are extracted.
  • each feature amount obtained based on these images [T], [T-1] and [T-2] is integrated, identification processing is executed for the integrated feature amount, and recognition at time T is performed. The result [T] is obtained. It can be said that each feature amount obtained based on the images [T], [T-1] and [T-2] is intermediate data for obtaining an integrated feature amount used in the recognition process.
  • FIG. 4A and 4B are diagrams schematically showing another example of identification processing by DNN when time series information is used.
  • the image [T] of the time T is input to the DNN whose internal state is updated to the state of the time T-1, and the recognition result [T] at the time T is obtained.
  • FIG. 4B is a diagram for explaining the process of FIG. 4A in more detail.
  • the feature extraction process described with reference to FIG. 2B described above is executed on the input time T image [T], and the feature amount corresponding to the image [T] is obtained. Extract.
  • the internal state is updated by the image before the time T, and the feature amount related to the updated internal state is stored.
  • the feature amount related to the stored internal information and the feature amount in the image [T] are integrated, and the identification process is executed for the integrated feature amount.
  • each of the feature amount related to the stored internal information and the feature amount in the image [T] are intermediate data for obtaining the integrated feature amount used in the recognition process.
  • the identification process shown in FIGS. 4A and 4B is executed using, for example, a DNN whose internal state has been updated using the immediately preceding recognition result, and is a recursive process.
  • a DNN that performs recursive processing in this way is called an RNN (Recurrent Neural Network).
  • the identification process by RNN is generally used for moving image recognition or the like, and it is possible to improve the identification accuracy by sequentially updating the internal state of the DNN by, for example, a frame image updated in time series. ..
  • FIG. 5 is a block diagram schematically showing a hardware configuration example of an information processing device applicable to each embodiment.
  • the information processing apparatus 1 includes an imaging unit 1200, a memory 1202, a DSP (Digital Signal Processor) 1203, and an interface (I / F) 1204, which are communicatively connected to each other via a bus 1210. , CPU (Central Processing Unit) 1205, ROM (Read Only Memory) 1206, and RAM (Random Access Memory) 1207.
  • the information processing device 1 can further include an input device that accepts user operations, a display device for displaying information to the user, and a storage device that non-volatilely stores data.
  • the CPU 1205 operates using the RAM 1207 as a work memory according to a program stored in the ROM 1206 in advance, and controls the overall operation of the information processing device 1.
  • the interface 1204 communicates with the outside of the information processing device 1 by wire or wireless communication. For example, when the information processing device 1 is used for in-vehicle use, the information processing device 1 can communicate with the braking control system of the vehicle on which the information processing device 1 is mounted via the interface 1204.
  • the imaging unit 1200 captures a moving image at a predetermined frame cycle and outputs pixel data for composing the frame image. More specifically, the imaging unit 1200 includes a plurality of photoelectric conversion elements that convert the received light into pixel signals that are electrical signals by photoelectric conversion, and a drive circuit that drives each photoelectric conversion element. In the imaging unit 1200, the plurality of photoelectric conversion elements are arranged in a matrix-like arrangement to form a pixel array.
  • the sensor unit 10a in FIG. 1 includes an image pickup unit 1200, and outputs pixel data output from the image pickup unit 1200 within one frame cycle as image data for one frame.
  • each of the photoelectric conversion elements corresponds to a pixel in the image data, and in the pixel array unit, the number of photoelectric conversion elements corresponding to, for example, 1920 pixels ⁇ 1080 pixels as the number of pixels in rows ⁇ columns is arranged in a matrix. Will be done.
  • An image of one frame is formed by pixel signals by a number of photoelectric conversion elements corresponding to 1920 pixels ⁇ 1080 pixels.
  • the optical unit 1201 includes a lens, an autofocus mechanism, and the like, and irradiates the pixel array unit of the imaging unit 1200 with the light incident on the lens.
  • the imaging unit 1200 generates a pixel signal for each photoelectric conversion element according to the light emitted to the pixel array unit via the optical unit 1201.
  • the imaging unit 1200 converts a pixel signal, which is an analog signal, into pixel data, which is a digital signal, and outputs the signal.
  • the pixel data output from the imaging unit 1200 is stored in the memory 1202.
  • the memory 1202 is, for example, a frame memory, and is capable of storing pixel data for at least one frame.
  • the DSP 1203 performs predetermined image processing on the pixel data stored in the memory 1202. Further, the DSP 1203 includes a recognition model learned in advance, and performs a recognition process using the above-mentioned DNN on the image data stored in the memory 1202 based on the recognition model.
  • the recognition result which is the result of the recognition process by the DSP 1203, is temporarily stored in, for example, the memory provided in the DSP 1203 or the RAM 1207, and is output from the interface 1204 to the outside.
  • the recognition result may be stored in the storage device.
  • DSP 1203 may be realized by the CPU 1205. Further, GPU (Graphics Processing Unit) may be used instead of DSP1203.
  • the image pickup unit 1200 can apply a CMOS image sensor (CIS) in which each part included in the image pickup unit 1200 is integrally formed by using CMOS (Complementary Metal Oxide Semiconductor).
  • CMOS Complementary Metal Oxide Semiconductor
  • the imaging unit 1200 can be formed on one substrate.
  • the imaging unit 1200 may be a laminated CIS in which a plurality of semiconductor chips are laminated and integrally formed.
  • the imaging unit 1200 is not limited to this example, and may be another type of optical sensor such as an infrared light sensor that performs imaging with infrared light.
  • the imaging unit 1200 can be formed by a two-layer structure laminated CIS in which semiconductor chips are laminated in two layers.
  • FIG. 6A is a diagram showing an example in which the imaging unit 1200 is formed by a two-layer structure laminated CIS.
  • the pixel portion 2020a is formed on the semiconductor chip of the first layer
  • the memory + logic portion 2020b is formed on the semiconductor chip of the second layer.
  • the pixel unit 2020a includes at least a pixel array unit in the imaging unit 1200.
  • the memory + logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit.
  • the memory + logic unit 2020b can further include the memory 1202.
  • the image pickup unit 1200 is configured as one solid-state image pickup element by bonding the semiconductor chip of the first layer and the semiconductor chip of the second layer while electrically contacting each other.
  • the imaging unit 1200 can be formed by a three-layer structure in which semiconductor chips are laminated in three layers.
  • FIG. 6B is a diagram showing an example in which the imaging unit 1200 is formed by a laminated CIS having a three-layer structure.
  • the pixel portion 2020a is formed on the semiconductor chip of the first layer
  • the memory portion 2020c is formed on the semiconductor chip of the second layer
  • the logic portion 2020d is formed on the semiconductor chip of the third layer.
  • the logic unit 2020d includes, for example, a drive circuit for driving the pixel array unit.
  • the memory unit 2020c can include a frame memory and a memory 1202.
  • the image pickup unit 1200 is attached by electrically contacting the semiconductor chip of the first layer, the semiconductor chip of the second layer, and the semiconductor chip of the third layer. It is configured as one solid-state image sensor.
  • the memory + logic unit 2020b may include configurations corresponding to the DSP 1203, the interface 1204, the CPU 1205, the ROM 1206, and the RAM 1207 shown in FIG.
  • FIG. 7 is a block diagram showing a configuration of an example of the imaging unit 1200 applicable to each embodiment.
  • the imaging unit 1200 includes a pixel array unit 1001, a vertical scanning unit 1002, an AD (Analog to Digital) conversion unit 1003, a pixel signal line 1006, a vertical signal line VSL, a control unit 1100, and a signal.
  • the processing unit 1101 and the like are included. Note that, in FIG. 7, the control unit 1100 and the signal processing unit 1101 can also be realized by, for example, the CPU 1205 and the DSP 1203 shown in FIG.
  • the pixel array unit 1001 includes a plurality of pixel circuits 1000 including, for example, a photoelectric conversion element using a photodiode and a circuit for reading out charges from the photoelectric conversion element, each of which performs photoelectric conversion on the received light.
  • the plurality of pixel circuits 1000 are arranged in a matrix in the horizontal direction (row direction) and the vertical direction (column direction).
  • the arrangement in the row direction of the pixel circuit 1000 is called a line.
  • the pixel array unit 1001 includes at least 1080 lines including lines including at least 1920 pixel circuits 1000.
  • An image (image data) of one frame is formed by the pixel signal read from the pixel circuit 1000 included in the frame.
  • the pixel signal line 1006 is connected to each row and column of each pixel circuit 1000, and the vertical signal line VSL is connected to each column.
  • the end of the pixel signal line 1006 that is not connected to the pixel array unit 1001 is connected to the vertical scanning unit 1002.
  • the vertical scanning unit 1002 transmits a control signal such as a drive pulse when reading a pixel signal from a pixel to the pixel array unit 1001 via the pixel signal line 1006 in accordance with the control of the control unit 1100 described later.
  • the end portion of the vertical signal line VSL that is not connected to the pixel array unit 1001 is connected to the AD conversion unit 1003.
  • the pixel signal read from the pixel is transmitted to the AD conversion unit 1003 via the vertical signal line VSL.
  • the reading control of the pixel signal from the pixel circuit 1000 will be schematically described.
  • the pixel signal is read out from the pixel circuit 1000 by transferring the charge accumulated in the photoelectric conversion element due to exposure to the floating diffusion layer (FD) and converting the electric charge transferred in the floating diffusion layer into a voltage. conduct.
  • the voltage at which the charge is converted in the floating diffusion layer is output as a pixel signal to the vertical signal line VSL via an amplifier.
  • the floating diffusion layer and the vertical signal line VSL are connected according to the selection signal supplied via the pixel signal line 1006. Further, the floating diffusion layer is connected to the supply line of the power supply voltage VDD or the black level voltage in a short period of time according to the reset pulse supplied via the pixel signal line 1006 to reset the floating diffusion layer.
  • the reset level voltage (assumed to be voltage A) of the floating diffusion layer is output to the vertical signal line VSL.
  • the transfer pulse supplied via the pixel signal line 1006 turns the photoelectric conversion element and the floating diffusion layer into an on (closed) state, and transfers the electric charge accumulated in the photoelectric conversion element to the floating diffusion layer.
  • a voltage (referred to as voltage B) corresponding to the amount of electric charge of the floating diffusion layer is output to the vertical signal line VSL.
  • the AD conversion unit 1003 includes an AD converter 1007 provided for each vertical signal line VSL, a reference signal generation unit 1004, and a horizontal scanning unit 1005.
  • the AD converter 1007 is a column AD converter that performs AD conversion processing on each column of the pixel array unit 1001.
  • the AD converter 1007 performs AD conversion processing on the pixel signal supplied from the pixel circuit 1000 via the vertical signal line VSL to reduce noise, and is used for correlated double sampling (CDS: Correlated Double Sampling) processing. Two digital values (values corresponding to voltage A and voltage B, respectively) are generated.
  • CDS Correlated Double Sampling
  • the AD converter 1007 supplies the two generated digital values to the signal processing unit 1101.
  • the signal processing unit 1101 performs CDS processing based on the two digital values supplied from the AD converter 1007, and generates pixel data which is a pixel signal by the digital signal.
  • the reference signal generation unit 1004 generates a lamp signal as a reference signal, which is used by each AD converter 1007 to convert the pixel signal into two digital values, based on the control signal input from the control unit 1100.
  • the lamp signal is a signal whose level (voltage value) decreases with a constant slope with respect to time, or a signal whose level decreases stepwise.
  • the reference signal generation unit 1004 supplies the generated lamp signal to each AD converter 1007.
  • the reference signal generation unit 1004 is configured by using, for example, a DAC (Digital to Analog Converter) or the like.
  • the counter starts counting according to the clock signal.
  • the comparator compares the voltage of the pixel signal supplied from the vertical signal line VSL with the voltage of the lamp signal, and stops the counting by the counter at the timing when the voltage of the lamp signal straddles the voltage of the pixel signal.
  • the AD converter 1007 converts the pixel signal of the analog signal into a digital value by outputting a value corresponding to the count value of the time when the count is stopped.
  • the AD converter 1007 supplies the two generated digital values to the signal processing unit 1101.
  • the signal processing unit 1101 performs CDS processing based on the two digital values supplied from the AD converter 1007, and generates a pixel signal (pixel data) based on the digital signal.
  • the pixel data generated by the signal processing unit 1101 is stored in a frame memory (not shown), and when the pixel data for one frame is stored in the frame memory, the image data is output from the imaging unit 1200 as one frame of image data.
  • the horizontal scanning unit 1005 performs selective scanning in which the AD converters 1007 are selected in a predetermined order to temporarily hold each digital value of the AD converters 1007.
  • the signal processing unit 1101 is sequentially output.
  • the horizontal scanning unit 1005 is configured by using, for example, a shift register or an address decoder.
  • the control unit 1100 performs drive control of the vertical scanning unit 1002, the AD conversion unit 1003, the reference signal generation unit 1004, the horizontal scanning unit 1005, and the like according to the imaging control signal supplied from the sensor control unit 11.
  • the control unit 1100 generates various drive signals that serve as a reference for the operations of the vertical scanning unit 1002, the AD conversion unit 1003, the reference signal generation unit 1004, and the horizontal scanning unit 1005.
  • the control unit 1100 supplies the vertical scanning unit 1002 to each pixel circuit 1000 via the pixel signal line 1006 based on, for example, a vertical synchronization signal or an external trigger signal included in the imaging control signal and a horizontal synchronization signal. Generate a control signal.
  • the control unit 1100 supplies the generated control signal to the vertical scanning unit 1002.
  • control unit 1100 passes, for example, information indicating an analog gain included in the image pickup control signal supplied from the CPU 1205 to the AD conversion unit 1003.
  • the AD conversion unit 1003 controls the gain of the pixel signal input to each AD converter 1007 included in the AD conversion unit 1003 via the vertical signal line VSL according to the information indicating the analog gain.
  • the vertical scanning unit 1002 Based on the control signal supplied from the control unit 1100, the vertical scanning unit 1002 transmits various signals including a drive pulse to the pixel signal line 1006 of the selected pixel line of the pixel array unit 1001 to each pixel circuit 1000 for each line. It is supplied, and the pixel signal is output from each pixel circuit 1000 to the vertical signal line VSL.
  • the vertical scanning unit 1002 is configured by using, for example, a shift register or an address decoder. Further, the vertical scanning unit 1002 controls the exposure in each pixel circuit 1000 according to the information indicating the exposure supplied from the control unit 1100.
  • the imaging unit 1200 configured in this way is a column AD type CMOS (Complementary Metal Oxide Semiconductor) image sensor in which AD converters 1007 are arranged for each column.
  • CMOS Complementary Metal Oxide Semiconductor
  • FIGS. 8A and 8B are diagrams schematically showing examples of captured images 30a and 30b when the same imaging range is captured by using a low-resolution imaging device and a high-resolution imaging device, respectively.
  • the imaging range shown in FIGS. 8A and 8B includes a "person" in the central portion at a position somewhat distant from the imaging apparatus.
  • the recognition process for a high-resolution image requires a large amount of calculation as compared with the recognition process for a low-resolution image, and the processing takes time. Therefore, it is difficult to improve the simultaneity between the recognition result and the captured image.
  • the recognition process for a low-resolution image requires a small amount of calculation, so that the process can be performed in a short time, and the simultaneity with the captured image can be relatively easily increased.
  • recognition processing is performed based on an image captured by an in-vehicle image pickup device.
  • a distant object for example, an oncoming vehicle traveling in the opposite lane in the direction opposite to the traveling direction of the own vehicle
  • recognition processing is performed on a low-resolution image.
  • FIG. 8A it is difficult to recognize a distant object when a low-resolution captured image is used.
  • a high-resolution captured image it is relatively easy to recognize a distant object, but it is difficult to improve the simultaneity with the captured image, and there is a possibility that it cannot respond to an emergency situation. ..
  • a recognition process is performed on a sampled image by pixels obtained by thinning out a high-resolution captured image by subsampling according to a predetermined rule. I do.
  • the captured image acquired in the next frame is sampled with pixels different from the subsampling of the immediately preceding captured image, and the sampled image by the sampled pixels is recognized.
  • the operation of performing recognition processing on the sampled image obtained by sampling pixels different from the first captured image is performed as a frame. Repeat in units. This makes it possible to acquire recognition results at high speed while using a high-resolution captured image. Further, by sequentially integrating the feature amount extracted during the recognition process with the feature amount extracted in the recognition process for the next sampled image, a more accurate recognition result can be obtained.
  • FIG. 9 is a block diagram showing a configuration of an example of an information processing device according to the prerequisite technology of each embodiment of the present disclosure.
  • the information processing device 1b includes a sensor unit 10b and a recognition processing unit 20b.
  • the sensor unit 10b includes an imaging means (camera) and an imaging control unit that controls the imaging means, similarly to the sensor unit 10a described with reference to FIG. This imaging means shall perform imaging at a high resolution (for example, 1920 pixels ⁇ 1080 pixels).
  • the sensor unit 10b supplies the image data of the captured image captured by the imaging means to the recognition processing unit 20b.
  • the recognition processing unit 20b includes a pre-processing unit 210 and a recognition unit 220.
  • the image data supplied from the sensor unit 10b to the recognition processing unit 20b is input to the preprocessing unit 210.
  • the preprocessing unit 210 performs subsampling on the input image data by thinning out the pixels according to a predetermined rule.
  • the sampled image in which the image data is subsampled is input to the recognition unit 220.
  • the recognition unit 220 uses the DNN to perform recognition processing on the image data in the same manner as the recognition processing unit 20a in FIG. More specifically, the recognition processing unit 20a includes a recognition model pre-learned using predetermined teacher data by machine learning, and DNN based on the recognition model with respect to the image data supplied from the sensor unit 10a. Performs recognition processing using. At this time, as the teacher data, a sampled image subsampled in the same manner as the preprocessing unit 210 is used.
  • the recognition unit 220 outputs the recognition result of the recognition process to, for example, the outside of the information processing device 1b.
  • FIG. 10 is a schematic diagram for explaining the recognition process by the recognizer according to the prerequisite technology of each embodiment.
  • the recognizer shown in FIG. 10 corresponds to, for example, the recognition processing unit 20b.
  • the image data 32 schematically shows one frame of image data based on the captured image captured by the sensor unit 10b.
  • the image data 32 includes a plurality of pixels 300 arranged in a matrix.
  • the image data 32 is input to the preprocessing unit 210 in the recognition processing unit 20b.
  • the preprocessing unit 210 subsamples the image data 32 by thinning out according to a predetermined rule (step S10).
  • the sampled image by the sub-sampled sampling pixels is input to the recognition unit 220.
  • the recognition unit 220 extracts the feature amount of the input sampled image by DNN (step S11).
  • the recognition unit 220 extracts the feature amount using CNN among the DNNs.
  • the recognition unit 220 stores the feature amount extracted in step S11 in a storage unit (for example, RAM 1207) (not shown). At this time, for example, when the feature amount extracted in the immediately preceding frame is already stored in the storage unit, the recognition unit 220 recursively uses the feature amount stored in the memory and integrates it with the extracted feature amount. (Step S12).
  • the recognition unit 220 stores, stores, and integrates the feature quantities extracted up to the immediately preceding frame in the storage unit. That is, the process in step S12 corresponds to the process using the RNN of the DNN.
  • the recognition unit 220 executes the recognition process based on the features accumulated and integrated in step S12 (step S13).
  • FIG. 11 is a schematic diagram for explaining the sampling process according to the prerequisite technique of each embodiment.
  • section (a) schematically shows an example of image data 32.
  • the image data 32 includes a plurality of pixels 300 arranged in a matrix.
  • the preprocessing unit 210 divides the image data 32 into a division region 35 including two or more pixels 300.
  • the divided region 35 is a region having a size of 4 pixels ⁇ 4 pixels, and includes 16 pixels 300.
  • the preprocessing unit 210 sets a pixel position for selecting a sampling pixel by subsampling from each pixel 300 included in the division area 35 with respect to the division area 35. Further, the preprocessing unit 210 sets a pixel position different for each frame as a pixel position for selecting a sampling pixel.
  • Section (b) of FIG. 11 shows an example of pixel positions set with respect to the division region 35 in a certain frame.
  • the pixel positions are set so that the pixels 300 are selected every other row and column direction, and the pixels 300sa 1 , 300sa 2 , 300sa 3 and 300sa at each of the set pixel positions are selected. 4 is selected as the sampling pixel.
  • the preprocessing unit 210 performs subsampling in units of the divided region 35.
  • the preprocessing unit 210 generates an image consisting of each pixel 300sa 1 to 300sa 4 selected as a sampling pixel in a certain frame as a sampling image composed of sampling pixels.
  • Section (c) of FIG. 11 shows an example of a sampled image 36 generated from each pixel 300sa 1 to 300sa 4 selected as a sampling pixel in section (b) of FIG.
  • the preprocessing unit 210 inputs the sampled image 36 to the recognition unit 220.
  • the recognition unit 220 executes a recognition process on the sampled image 36.
  • the preprocessing unit 210 sets different pixel positions for each frame as pixel positions for selecting sampling pixels.
  • the recognition unit 220 performs recognition processing for each frame based on a sampled image composed of each pixel 300 at each set pixel position.
  • 12A to 12E show each recognition process for the image data 32a to 32d and 32a'of the frames # 1 to # 5, which are sequentially imaged by the sensor unit 10b in time series.
  • the object 41 is located at a relatively short distance (medium distance) with respect to the sensor unit 10b.
  • the object 42 is located at a distance (referred to as a long distance) farther than the middle distance with respect to the sensor unit 10b, and the size in the image is smaller than the object 41.
  • the preprocessing unit 210 performs subsampling on each divided region 35 of the image data 32a of the frame # 1, for example, with the pixel position in the upper left corner as a base point. More specifically, the preprocessing unit 210 samples each pixel 300 selected every other row and column direction with the pixel position in the upper left corner as the base point in each division region 35 of the image data 32a. Subsampling is performed to select the pixels 300sa 1 to 300sa 4 (step S10a).
  • the preprocessing unit 210 generates a sampled image 36 ⁇ 1 of the first phase by the subsampled pixels 300sa 1 to 300sa 4.
  • the generated sampled image 36 ⁇ 1 is input to the recognition unit 220.
  • the recognition unit 220 extracts the feature amount 50a of the input sampled image 36 ⁇ 1 using DNN (step S11).
  • the recognition unit 220 stores and stores the feature amount 50a extracted in step S11 in the storage unit (step S12).
  • the recognition unit 220 can accumulate the feature amount 50a in the storage unit and integrate the feature amount with the already accumulated feature amount.
  • Section (b) of FIG. 12A shows how the first feature amount 50a is stored in the empty storage portion as the process of step S12.
  • the recognition unit 220 executes the recognition process based on the feature amount 50a accumulated in the storage unit (step S13).
  • the object 41 located at a medium distance is recognized and obtained as the recognition result 60.
  • the object 42 located at a long distance is not recognized.
  • the preprocessing unit 210 relates to each divided area 35 of the image data 32b of the frame # 2 with respect to each divided area 35 of the image data 32a of the frame # 1 shown in FIG. 12A.
  • Subsampling is performed in which each pixel position shifted in the horizontal direction by one pixel with respect to the set pixel position is set as the pixel position of the sampling pixel (step S10b). That is, each sampling pixel selected in step S10b is each pixel 300 at a pixel position adjacent to the right side of the pixel position of each sampling pixel selected in step S10a in FIG. 12A.
  • the preprocessing unit 210 generates a second phase sampled image 36 ⁇ 2 from each sampling pixel subsampled in step S10b.
  • the generated sampled image 36 ⁇ 2 is input to the recognition unit 220.
  • the recognition unit 220 extracts the feature amount 50b of the input sampled image 36 ⁇ 2 using DNN (step S11).
  • the recognition unit 220 stores and stores the feature amount 50b extracted in step S11 in the storage unit (step S12).
  • step S12 the feature amount 50a extracted from the sampled image 36 ⁇ 1 of the first phase is already stored in the storage unit. Therefore, the recognition unit 220 accumulates the feature amount 50b in the storage unit and integrates the feature amount 50b with the stored feature amount 50a.
  • the recognition unit 220 executes the recognition process based on the feature amount in which the feature amount 50a and the feature amount 50b are integrated (step S13).
  • step S13 in section (b)
  • the object 41 located at a medium distance is recognized and obtained as the recognition result 60, but the object 42 located at a long distance is Not recognized at this point.
  • the preprocessing unit 210 relates to each divided area 35 of the image data 32c of the frame # 3 with respect to each divided area 35 of the image data 32a of the frame # 1 shown in FIG. 12A.
  • Subsampling is performed in which the pixel position shifted in the column direction by one pixel with respect to the set pixel position is set as the pixel position of each sampling pixel (step S10c). That is, each sampling pixel selected in step S10c is each pixel 300 at a pixel position adjacent to the lower side in the figure with respect to the pixel position of each sampling image selected in step S10a in FIG. 12A.
  • the preprocessing unit 210 generates a sampled image 36 ⁇ 3 of the third phase by each sampling subsampled in step S10c.
  • the generated sampled image 36 ⁇ 3 is input to the recognition unit 220.
  • the recognition unit 220 extracts the feature amount 50c of the input sampled image 36 ⁇ 3 using DNN (step S11).
  • the recognition unit 220 stores and stores the feature amount 50c extracted in step S11 in the storage unit (step S12).
  • step S12 the feature quantities 50a and 50b extracted from the sampled images 36 ⁇ 1 and 36 ⁇ 2 of the first and second phases are already stored in the storage unit, respectively. There is. Therefore, the recognition unit 220 accumulates the feature amount 50c in the storage unit and integrates the feature amount 50c with the accumulated feature amounts 50a and 50b.
  • the recognition unit 220 executes the recognition process based on the feature amount in which the feature amounts 50a and 50b and the feature amount 50c are integrated (step S13).
  • the object 41 located at a medium distance is recognized and obtained as the recognition result 60, but the object 42 located at a long distance is Not recognized at this point.
  • the preprocessing unit 210 relates to each divided area 35 of the image data 32d of the frame # 4 with respect to each divided area 35 of the image data 32c of the frame # 3 shown in FIG. 12C.
  • Subsampling is performed in which each pixel position shifted in the horizontal direction by one pixel with respect to the set pixel position is set as the pixel position of the sampling pixel (step S10d). That is, each sampling pixel selected in step S10d is each pixel 300 at a pixel position adjacent to the right side in the figure with respect to the pixel position of each sampling image selected in step S10c in FIG. 12C.
  • the preprocessing unit 210 generates a sampled image 36 ⁇ 4 of the fourth phase by each sampling subsampled in step S10d.
  • the generated sampled image 36 ⁇ 4 is input to the recognition unit 220.
  • the recognition unit 220 extracts the feature amount 50d of the input sampled image 36 ⁇ 4 using DNN (step S11).
  • the recognition unit 220 stores and stores the feature amount 50d extracted in step S11 in the storage unit (step S12).
  • step S12 each feature amount 50a to 50c extracted from the sampled images 36 ⁇ 1 to 36 ⁇ 3 of the first to third phases has already been accumulated in the storage unit. ing. Therefore, the recognition unit 220 accumulates the feature amount 50d in the storage unit and integrates the feature amount 50d with the accumulated feature amounts 50a to 50c.
  • the recognition unit 220 executes the recognition process based on the feature amount in which the feature amounts 50a to 50c and the feature amount 50d are integrated (step S13).
  • the object 41 located at a medium distance is recognized and obtained as a recognition result 60
  • the object 42 located at a long distance is further recognized and recognized.
  • the result is 61.
  • the preprocessing unit 210 selects the pixel positions of all the pixels 300 included in one frame as the pixel positions of the sampling pixels. Further, it can be said that the preprocessing unit 210 selects the pixel positions of the 16 pixels 300 included in each division region 35 by shifting the phase by one pixel.
  • the pixel positions of all the pixels 300 included in each division area 35 or one frame are selected as the pixel positions of the sampling pixels.
  • the period until it is done is one cycle. That is, the preprocessing unit 210 circulates each pixel position of each division area 35 at a constant cycle, and sets all the pixel positions in the division area 35 as pixel positions for acquiring sampling pixels.
  • the preprocessing unit 210 uses the pixel position in the upper left corner as the base point for each divided region 35 of the image data 32a'of the frame # 1'in the same manner as in the example of FIG. 12A. Subsampling is performed (step S10a'). As shown in section (b), the preprocessing unit 210 generates a sampled image 36 ⁇ 1'of the first phase by each sampling subsampled in step S10a'. The generated sampled image 36 ⁇ 1'is input to the recognition unit 220.
  • the recognition unit 220 extracts the feature amount 50a'of the input sampled image 36 ⁇ 1' using DNN (step S11).
  • the recognition unit 220 stores and stores the feature amount 50a'extracted in step S11 in the storage unit (step S12).
  • step S12 in section (b)
  • the recognition unit 220 may reset the storage unit every cycle of selecting the pixel position of the sampling pixel.
  • the storage unit can be reset, for example, by deleting the feature amounts 50a to 50d for one cycle accumulated in the storage unit from the storage unit.
  • the recognition unit 220 can always accumulate a certain amount of features in the storage unit. For example, the recognition unit 220 stores the feature amount for one cycle, that is, the feature amount for four frames with the storage unit. In this case, when the new feature amount 50a'is extracted, the recognition unit 220 deletes, for example, the oldest feature amount 50d among the feature amounts 50a to 50d accumulated in the storage unit, and the new feature amount 50a'. Is stored in the storage unit and stored. The recognition unit 220 executes the recognition process based on the accumulated amount in which the feature amounts 50a to 50c remaining after the feature amount 50d is deleted and the new feature amount 50a'are integrated.
  • the recognition unit 220 executes the recognition process based on the feature amount in which the feature amounts 50a to 50d already accumulated in the storage unit and the newly extracted feature amount 50a'are integrated (step S13).
  • the recognition process based on the feature amount in which the feature amounts 50a to 50d already accumulated in the storage unit and the newly extracted feature amount 50a'are integrated (step S13).
  • the object 41 located at a medium distance is recognized and obtained as a recognition result 60
  • the object 42 located at a long distance is further recognized and recognized.
  • the result is 61.
  • the sampled image 36 is a thinned image in which pixels are thinned out from the original image data 32.
  • the sampled image 36 is image data obtained by reducing the image data 32 by 1/2 in the row and column directions, respectively, and the number of pixels is 1/4 of the original image data 32. .. Therefore, the recognition unit 220 can execute the recognition process for the sampled image 36 at high speed with respect to the recognition process using all the pixels 300 included in the original image data 32.
  • the pixel position of the pixel 300 set as the sampling pixel for generating the sampled image 36 is selected by shifting it by one pixel for each frame in the division area 35. Therefore, it is possible to obtain a sampled image 36 that is out of phase by one pixel for each frame. At this time, the pixel positions of all the pixels 300 included in the division area 35 are selected as the pixel positions of the pixels 300 to be set as sampling pixels.
  • the pixel positions of the pixels 300 that generate the sampled image 36 are selected, and the feature amounts calculated from each sampled image 36 are accumulated and integrated.
  • the pixels 300 at all the pixel positions included in the image data 32 can be involved in the recognition process, and for example, a distant object can be easily recognized.
  • the pixel position for selecting the sampling pixel is set by the preprocessing unit 210 according to a predetermined rule, but this is not limited to this example.
  • the preprocessing unit 210 sets a pixel position for selecting sampling pixels in response to an instruction from the outside of the recognition processing unit 20b or the outside of the information processing device 1b including the recognition processing unit 20b. You may.
  • FIG. 13A and 13B are schematic views for explaining the subsampling process in the recognition process according to the prerequisite technology of each embodiment.
  • the divided region 35 is defined as a region of 2 pixels ⁇ 2 pixels.
  • the upper left pixel position is the origin coordinate [0,0]
  • the upper right, lower left, and lower right pixel positions are the coordinates [1,0] [0,1] and [1,1], respectively.
  • sampling of the pixel 300 is performed in each division region 35 with the coordinates [1,1], [1,0], [0,1], [0,1] starting from the lower right pixel position [1,1]. 0] shall be performed in this order.
  • the passage of time is shown from the bottom to the top of the figure.
  • the image data 32a is the image [T] having the newest time T, and thereafter, the time is in the order of the image data 32b, the image data 32c, and the image data 32d.
  • the images are T-1, T-2, and T-3, and the image [T-1], the image [T-2], and the image [T-3] based on the old image data 32 frame by frame.
  • the preprocessing unit 210 selected the pixels 300 at the coordinates [1,1] of each division region 35 as sampling pixels for the image data 32a (step S10a), and the recognition unit 220 was selected.
  • the feature amount of the sampled image 36 ⁇ 1 by the sampling pixel is extracted (step S11).
  • the recognition unit 220 integrates the feature amount 50a extracted from the sampled image 36 ⁇ 1 with, for example, the feature amount extracted in a predetermined period before that (step S12), and performs recognition processing based on the integrated feature amount (step S12). S13).
  • the sampled image 36 ⁇ 1 obtained by uniformly thinning out the image data 32a can be obtained by the subsampling process (step S10a) in each of the divided regions 35 of the image data 32a described above.
  • the recognition process for the entire image data 32a can be executed. It is possible to complete the recognition process for the image data 32 by the recognition process for the sampled image by the sampling pixels selected by subsampling from the image data 32.
  • This series of processes in which a sampled image is generated from the image data 32, a feature amount is extracted from the generated sampled image, and recognition processing is performed based on the extracted feature amount is called a unit process.
  • the subsampling process of step S10a the feature amount extraction process of step S11 for the sampled image 36 ⁇ 1 generated by the subsampling process, the feature amount integration process of step S12, and the recognition by step S13.
  • Processing is included in one unit of processing.
  • the recognition unit 220 can execute the recognition process for the thinned-out image data 32 for each process of this one unit (step S13).
  • the recognition processing unit 20b executes the above-mentioned one-unit processing for each of the image data 32b, 32c, and 32d that are sequentially updated in the frame cycle, and executes the recognition processing.
  • the feature amount integration process in step S12 and the recognition process in step S13 can be common in the process of each unit.
  • FIG. 13B shows the next one unit of processing after one cycle of sampling pixel selection for each pixel position included in each divided region 35. That is, when one unit of processing for each of the image data 32a, 32b, 32c and 32d has completed, one unit of processing for the image data 32a'of the next frame input to the recognition processing unit 20b is executed.
  • the feature amount 50d extracted based on the oldest image data 32d is discarded, and the feature amount 50a'is extracted from the new image data 32a'. That is, the preprocessing unit 210 selects each pixel 300 of the coordinates [1,1] of each division region 35 of the image data 32a'as a sampling pixel, and generates a sampled image 36 ⁇ 1.
  • the recognition unit 220 extracts the feature amount 50a'from the sampled image 36 ⁇ 1 selected from the image data 32a'.
  • the recognition unit 220 integrates the feature amount 50a'and the feature amounts 50a, 50b, and 50c extracted up to the previous time, and performs recognition processing based on the integrated feature amount. In this case, the recognition unit 220 may perform the feature amount extraction process only for the newly acquired image data 32a'.
  • the recognition process related to the prerequisite technology of each embodiment is performed by executing the process for one unit in the same processing system in the recognition processing unit 20b. More specifically, the recognition processing unit 20b repeats the processing system of the subsampling process and the feature amount extraction process for the image data 32 for each frame as the processing for one unit, and integrates the feature amounts extracted by this repetition. And the recognition process is being performed.
  • the recognition processing unit 20b performs the subsampling process including the pixel positions of all the pixels 300 included in the image data 32 while periodically shifting the pixel positions for selecting the sampling pixels. Further, the recognition processing unit 20b integrates the feature amounts as intermediate data extracted from the sampled image by the sampling pixels selected from the image data 32 of each frame in step S11 to perform the recognition process.
  • the recognition process related to the prerequisite technology of each embodiment configured in this way is a processing system that can be completed in the process of one unit, the recognition result can be obtained more quickly. Further, since the sampling pixel is selected from the entire image data 32 in one unit, a wide range of recognition results can be confirmed by one unit of processing. Further, since the intermediate data (feature amount) based on the plurality of image data 32 is integrated, it is possible to acquire a more detailed recognition result acquired by straddling a plurality of units.
  • the information processing device 1b by using the information processing device 1b according to the prerequisite technology of each embodiment, it is possible to improve the simultaneity of the recognition results and acquire the recognition results by utilizing the resolution of the captured image. It is possible to improve the characteristics of the recognition process using.
  • FIG. 14A is a schematic diagram for explaining the basic architecture of the recognition process according to the existing technology.
  • the recognizer in the existing technique executes the recognition process for one input information (for example, an image), and basically outputs one recognition result for the input information.
  • FIG. 14B is a schematic diagram for explaining the basic architecture of the recognition process according to each embodiment.
  • the recognizer according to each embodiment corresponds to, for example, the recognition unit 220 of FIG. 9, and as shown in FIG. 14B, executes recognition processing for one input information (for example, an image) by time axis expansion, and performs the recognition process. It is possible to output a plurality of recognition results according to the processing.
  • the recognition process based on the time axis expansion as described with reference to FIGS. 10, 11, 12A to 12E, subsampling is performed by thinning out the pixels for each division region 35, and the subsampled sampling pixels are used. It is a process of executing the recognition process for each sampled image by.
  • the recognizer according to each embodiment has two types of input information, one is a highly responsive breaking news result and the other is a highly accurate integrated result, by the recognition process in the time axis expansion.
  • the recognition result can be output.
  • the breaking news result is, for example, the recognition result by the recognition process performed on the sampled image acquired by the first subsampling in each divided region 35.
  • the integration result is, for example, a recognition result obtained by a recognition process performed based on the integrated feature amount of the feature amounts extracted from each sampled image acquired by each subsampling in each divided region 35.
  • the calculation amount of the recognition process executed in the recognizer according to each embodiment shown in FIG. 14B is substantially the same as the calculation amount of the recognition process executed in the recognizer according to the existing technology shown in FIG. 14A. Therefore, according to the recognizer according to each embodiment, the recognition result of both the more responsive breaking news result and the more accurate integrated result can be obtained by the amount of calculation substantially the same as that of the recognizer by the existing technology. It is possible to get it.
  • FIG. 15 is an example time chart showing a first example of reading and recognition processing in the basic architecture of recognition processing according to each embodiment.
  • sampling pixels are selected every other pixel in the divided region 35 having a size of 4 pixels ⁇ 4 pixels described in the section (b) of FIG.
  • all pixel positions are selected by four subsamplings, and the image data 32 of one frame is divided into four sampled images 36 ⁇ 1 to 36 ⁇ 4 in the first to fourth phases. become.
  • the sampled images 36 ⁇ 1 to 36 ⁇ 4 of the first to fourth phases by subsampling are extracted from each of the image data 32 of a plurality of frames connected in chronological order. That is, in this first example, the sampled images 36 ⁇ 1 to 36 ⁇ 4 of the first to fourth phases are extracted across the image data 32 of a plurality of frames connected in chronological order.
  • the recognition process according to the first example is a recognition process performed between a plurality of frames, and is appropriately referred to as an inter-frame process.
  • the imaging cycle is a frame cycle, for example, 50 [ms] (20 [fps (frame per second)]). Further, here, reading from the pixel circuit 1000 arranged in a matrix arrangement in the pixel array unit 1001 is performed line-sequentially by a rolling shutter method. Here, in FIG. 15, the passage of time is shown to the right, and the line position is shown from top to bottom.
  • each line is exposed for a predetermined time, and after the exposure is completed, the pixel signal is transferred from each pixel circuit 1000 to the AD conversion unit 1003 via the vertical signal line VSL to perform AD conversion.
  • each AD converter 1007 converts the transferred analog pixel signal into pixel data which is a digital signal.
  • the image data 32a based on the pixel data of frame # 1 is input to the preprocessing unit 210.
  • the preprocessing unit 210 performs subsampling of the first phase ⁇ 1 on the input image data 32a by the subsampling process (indicated as “SS” in the figure) as described above.
  • the pre-processing unit 210 acquires the pixels 300 from the pixel positions of the sampling pixels selected for each division region 35 by the subsampling of the first phase ⁇ 1, and generates the sampled image 36 ⁇ 1 (step S10a).
  • the preprocessing unit 210 passes the sampled image 36 ⁇ 1 to the recognition unit 220.
  • the sampled image 36 ⁇ 1 passed from the preprocessing unit 210 to the recognition unit 220 is an image in which the number of pixels is reduced with respect to the image data 32a by thinning out by the subsampling process.
  • the recognition unit 220 executes a recognition process on the sampled image 36 ⁇ 1.
  • the recognition process it is shown that the feature amount extraction process (step S11), the feature amount integration process (step S12), and the recognition process (step S13) are included.
  • the recognition result ⁇ 1 based on the sampled image 36 ⁇ 1 is output to the outside of the recognition processing unit 20b.
  • steps S11 to S13 are performed within a period of one frame.
  • the sampled image 36 ⁇ 1 to be processed is an image in which the number of pixels is reduced with respect to the image data 32a by thinning out by the subsampling process. Therefore, the amount of processing executed for the image data 32a is smaller than the amount of processing executed for the image data 32 for one frame that is not thinned out.
  • the processing of steps S11 to S13 for the sampled image 36 ⁇ 1 based on the image data 32a is completed in a period of approximately 1/4 of the one-frame period.
  • Image data 32b composed of pixel data of frame # 2 is input to the preprocessing unit 210.
  • the preprocessing unit 210 performs subsampling processing on the input image data 32b in a second phase ⁇ 2 different from that of the image data 32a to generate a sampled image 36 ⁇ 2.
  • the pre-processing unit 210 passes the sampled image 36 ⁇ 2, which has a smaller number of pixels than the image data 32b by subsampling, to the recognition unit 220.
  • the recognition unit 220 executes the recognition process on the sampled image 36 ⁇ 2 within a period of one frame. In this case as well, as described above, the recognition process is completed in a period of approximately 1/4 of the one-frame period.
  • the recognition unit 220 integrates the feature amount 50b extracted from the sampled image 36 ⁇ 2 and the feature amount 50a extracted by the feature amount extraction process for the image data 32a by the feature amount integration process in step S12.
  • the recognition unit 220 executes the recognition process using the integrated feature amount.
  • the recognition result ⁇ 2 by this recognition process is output to the outside of the recognition process unit 20b.
  • the preprocessing unit 210 executes subsampling processing with the third phase ⁇ 3 for the image data 32c of the next frame # 3 in parallel with the processing for the image data 32b of the immediately preceding frame # 2.
  • the recognition unit 220 extracts the feature amount 50c from the sampled image 36 ⁇ 3 generated by the subsampling process.
  • the recognition unit 220 further integrates the feature amount 50a and 50b extracted from the image data 32a and 32b, respectively, and the extracted feature amount 50c, and performs recognition processing based on the integrated feature amount. Run.
  • the recognition unit 220 outputs the recognition result ⁇ 3 obtained by this recognition process to the outside. In this case as well, as described above, the recognition process is completed in a period of approximately 1/4 of the one-frame period.
  • the recognition processing unit 20b performs subsampling processing and feature quantity by the fourth phase ⁇ 4 in parallel with the processing for the image data 32c of the immediately preceding frame # 3. Extraction processing is performed to obtain a feature amount of 50d.
  • the recognition processing unit 20b further integrates the feature amount 50a to 50c extracted from each of the image data 32a to 32c by the recognition unit 220 and the extracted feature amount 50d, and the integrated feature. Execute recognition processing based on the quantity.
  • the recognition unit 220 outputs the recognition result ⁇ 4 obtained by this recognition process to the outside. In this case as well, as described above, the recognition process is completed in a period of approximately 1/4 of the one-frame period.
  • the vertical arrows that is, the arrows indicating the output of each recognition process from each image data 32a to 32d, each step S10a to step S10d, and each recognition result ⁇ 1 to ⁇ 4 by each recognition process are shown.
  • Its thickness outlines the amount of information.
  • the preprocessing unit 210 to step S10a with respect to the amount of data of each image data 32a to 32d input to the preprocessing unit 210 for the processing of steps S10a to S10d.
  • the amount of data in the sampled images 36 ⁇ 1 to ⁇ 4 subsampled by the process of step S10d and passed to the recognition unit 220 is smaller.
  • the amount of information of each recognition result ⁇ 1 to ⁇ 4 by the recognition process based on each image data 32a to 32d increases as the recognition process is repeated, and the obtained recognition result becomes more detailed for each recognition process. Shown.
  • This is a feature amount that integrates the feature amount acquired while shifting the phase of the sampled image up to the previous time and the feature amount newly acquired by further shifting the phase with respect to the sampled image immediately before each recognition process. This is because it uses.
  • FIG. 16 is an example time chart showing a second example of reading and recognition processing in the basic architecture of recognition processing according to each embodiment.
  • the sampled images 36 ⁇ 1 to 36 ⁇ 4 of the first to fourth phases by subsampling are extracted from the image data 32 of one frame, respectively. That is, in this second example, the recognition process by the sampled images 36 ⁇ 1 to 36 ⁇ 4 of the first to fourth phases is completed in one frame, and is hereinafter appropriately referred to as an intra-frame process.
  • each line is exposed for a predetermined time, and after the exposure is completed, the pixel signal is transferred from each pixel circuit 1000 to the AD conversion unit 1003 via the vertical signal line VSL to perform AD conversion.
  • each AD converter 1007 converts the transferred analog pixel signal into pixel data which is a digital signal.
  • the image data 32a based on the pixel data of frame # 1 is input to the preprocessing unit 210.
  • the preprocessing unit 210 performs subsampling of the first phase ⁇ 1 as described above on the image data 32a of the first frame in FIG. 16, and starts from the pixel positions of the sampling pixels selected for each division region 35. Pixels 300 are acquired and a sampled image 36 ⁇ 1 with the first phase ⁇ 1 is generated (step S10a).
  • the preprocessing unit 210 executes the subsampling of the second phase ⁇ 2 for the image data 32b.
  • the preprocessing unit 210 generates a sampled image 36 ⁇ 2 in the second phase ⁇ 2 from each sampling pixel acquired by the subsampling of the second phase ⁇ 2 (step S10b).
  • the preprocessing unit 210 executes subsampling with different phases (subsampling of the third phase ⁇ 3, subsampling of the fourth phase ⁇ 4) with respect to the image data 32a, and the sampled image by the third phase ⁇ 3.
  • a sampled image 36 ⁇ 4 with 36 ⁇ 3 and a fourth phase ⁇ 4 is generated (step S10c, step S10d), respectively.
  • the preprocessing unit 210 executes subsampling according to the first to fourth phases ⁇ 1 to ⁇ 4 for one frame of image data 32a within one frame period, respectively.
  • the recognition unit 220 executes a feature amount extraction process on the sampled image 36 ⁇ 1 of the first phase ⁇ 1 generated based on the image data 32a by the preprocessing unit 210 (step S11a), and extracts the feature amount.
  • the recognition unit 220 can integrate the feature amount extracted in step S11a with the accumulated feature amount that can be integrated (step S12a).
  • the recognition unit 220 executes the recognition process based on the feature quantity integrated in step S12a (step S13a), and outputs the recognition result ⁇ 1 by the first phase.
  • the recognition unit 220 executes a feature amount extraction process on the sampled image 36 ⁇ 2 of the second phase ⁇ 2 generated based on the image data 32a by the preprocessing unit 210 (step S11b), and extracts the feature amount.
  • the recognition unit 220 can integrate the feature amount extracted in step S11b with the accumulated feature amount that can be integrated (step S12b).
  • the recognition unit 220 which can integrate the feature amount extracted in step S11b and the feature amount extracted in step S11a described above, performs recognition processing on the integrated feature amount. (Step S13b), the recognition result ⁇ 2 by the second phase ⁇ 2 is output.
  • the recognition unit 220 executes the feature amount extraction process on the sampled images 36 ⁇ 3 and 36 ⁇ 4 of the third and fourth phases ⁇ 3 and ⁇ 4 generated by the preprocessing unit 210 based on the image data 32a ().
  • Step S11c, step S11d) the feature amount is extracted.
  • the recognition unit 220 sequentially integrates each feature amount extracted in step S11c and step S11d with the feature amount integrated up to the immediately preceding integration process (step S12c, step S12d).
  • the recognition unit 220 executes recognition processing based on, for example, each feature quantity integrated in each phase ⁇ 3 and ⁇ 4, and outputs recognition results ⁇ 3 and ⁇ 4 of each phase ⁇ 3 and ⁇ 4, respectively.
  • each feature amount extraction process (step S11a to step S11d), each integration process (step S12a to step S12d), and each recognition process (step S13a to step S13d) in each of the phases ⁇ 1 to ⁇ 4 described above. And are executed within the period of one frame. That is, the recognition unit 220 performs recognition processing on each sampled image 36 ⁇ 1 to 36 ⁇ 4 in which pixels are thinned out by subsampling the image data 32a of one frame. Therefore, the amount of calculation of each recognition process in the recognition unit 220 is small, and each recognition process can be executed in a short time.
  • FIG. 17 is a schematic diagram for explaining the effect of the processing (intraframe processing) according to the second example described above.
  • FIG. 17A is an example time chart comparing the processing according to the second example described above with the processing according to the existing technique, and shows the passage of time toward the right.
  • section (a) shows an example of reading and recognition processing by existing technology.
  • section (b) shows an example of reading and recognizing processing according to the second example described above.
  • the imaging process is performed during a period of time t 0 to t 1.
  • the imaging process includes exposure in the pixel array unit 1001 for a predetermined time, and transfer processing of each pixel data based on the electric charge generated by the photoelectric conversion element in response to the exposure.
  • Each pixel data transferred from the pixel array unit 1001 by the imaging process is stored in the frame memory as, for example, one frame of image data.
  • reading of the image data stored in the frame memory is started, for example, from time t 1.
  • the recognition processing for the image data for one frame is started after the reading of the image data for one frame is completed (time t 4).
  • this recognition process ends at the time t 6 when one frame period elapses from the time t 4.
  • the reading of the image data from the frame memory is started after the time t 1 as in the example of the section (a).
  • the reading of the sampled image 36 ⁇ 1 by the subsampling of the first phase ⁇ 1 is executed, for example, during the period t 1 to t 2 which is 1/4 of the one frame period, and the same is true.
  • the recognition process for the sampled image 36 ⁇ 1 is executed, for example, during the period t 2 to t 3 , which is 1/4 of the one frame period, and the recognition result ⁇ 1 is output.
  • the reading of the sampled images 36 ⁇ 2 to 36 ⁇ 4 by the subsampling of the second to fourth phases ⁇ 2 to ⁇ 4 is, for example, 1/4 of the time of one frame period. It is executed at times t 2 to t 3 , ..., And ends at time t 4, for example.
  • Recognition processing for the sampling image 36 ⁇ 2 is started to a time t 2, for example, be terminated elapsed time 1/4 time of one frame period t 3, the recognition result ⁇ 2 is output.
  • the recognition process for the other sampled images 36 ⁇ 3 and 36 ⁇ 4 is also executed following the recognition process for the immediately preceding sampled image. ..
  • the recognition processing for the sampling image 36 ⁇ 4 by the last sub-sampling of the image data 32 for one frame has ended at time t 5.
  • FIG. 17B is a diagram schematically showing each recognition result according to the second example.
  • the upper stage, the middle stage, and the lower stage show examples of the recognition results ⁇ 1, ⁇ 2, and ⁇ 4 by the recognition processing for the first phase ⁇ 1, the second phase ⁇ 2, and the fourth phase ⁇ 4, respectively.
  • images of three people whose recognition targets are people and are at different distances from the sensor unit 10b (information processing device 1b) are included in one frame. Shows the case.
  • three objects 96L, 96M, and 96S which are images of people and have different sizes, are included with respect to the frame 95.
  • the object 96L is the largest, and of the three persons included in the frame 95, the person corresponding to the object 96L is the closest to the sensor unit 10b.
  • the smallest object 96S among the objects 96L, 96M and 96S represents the person whose person corresponding to the object 96S is the farthest from the sensor unit 10b among the three people included in the frame 95. There is.
  • the recognition result ⁇ 1 is an example in which the recognition process is executed on the above-mentioned sampled image 36 ⁇ 1 and the largest object 96L is recognized.
  • the recognition result ⁇ 2 is an example in which the feature amount extracted from the sampled image 36 ⁇ 2 is further integrated with the feature amount in the recognition result ⁇ 1 and the next largest object 96M is recognized.
  • the recognition result ⁇ 4 the feature amount extracted from the sampled image 36 ⁇ 4, the feature amount extracted from the sampled image 36 ⁇ 2, and the feature amount extracted from the next sampled image 36 ⁇ are integrated, and the objects 96L and 96M are integrated.
  • the smallest object 96S is recognized.
  • a rough recognition result ⁇ 1 can be obtained based on the sampled image 36 ⁇ 1 by the first subsampling for the frame.
  • the recognition result ⁇ 1 can be output at time t 3 in FIG. 17A, and as shown by the arrow B in the figure, low latency is realized with respect to the time t 6 at which the recognition result is output by the existing technology. can.
  • the recognition result ⁇ 1 based on the sampled image 36 ⁇ 1 by the first subsampling for the frame according to this second example is the breaking news result.
  • This breaking news result is also applicable to the first example described above.
  • the recognition process in the final subsampling for the frame is performed based on the feature quantity that integrates the feature quantities extracted from each sampled image 36 ⁇ 1 to 36 ⁇ 4 in the frame, so that the accuracy is higher.
  • the recognition result ⁇ 4 can be obtained.
  • This recognition result ⁇ 4 can realize, for example, the same accuracy as the recognition processing by the existing technology.
  • this last sub-sampling 1/4 frame period, for example with respect to the time t 4 when read process is completed by an existing technique is terminated and the time t 5 elapses.
  • the accuracy equivalent to that of the existing technology can be obtained in a shorter time than the recognition processing by the existing technology as shown by the arrow A in the figure, resulting in lower latency. Can be planned.
  • the recognition result ⁇ 4 is the integration result. This integration result is also applicable to the first example described above.
  • the preprocessing unit 210 sets the division region 35 as 4 pixels ⁇ 4 pixels and performs subsampling by thinning out every other pixel, which is described with reference to FIG. 11, and expands the image data 32 of one frame on the time axis. Therefore, it is assumed that four sampled images 36 ⁇ 1, 36 ⁇ 2, 36 ⁇ 3, and 36 ⁇ 4 that are out of phase are generated.
  • FIG. 18 is a diagram schematically showing a configuration of an example of a recognizer according to the first embodiment.
  • the left end shows a state in which the image data 32 of one frame is divided into four according to the pixels 300 ⁇ 1, 300 ⁇ 2, 300 ⁇ 3, and 300 ⁇ 4 of the four phases of the first phase ⁇ 1 to the fourth phase ⁇ 4.
  • Sampling images 36 ⁇ 1 to 36 ⁇ 4 of each phase are generated by the subsampling process (steps S11a to S11d) according to the first to fourth phases ⁇ 1 to ⁇ 4.
  • sampled images 36 ⁇ 1 to 36 ⁇ 4 of each phase are generated in the order of the first phase ⁇ 1, the second phase ⁇ 2, the third phase ⁇ 3, and the fourth phase ⁇ 4.
  • the size of the divided area 35 may be 8 pixels ⁇ 8 pixels (in this case, 4 ⁇ 4 is divided into 16), or the divided area 35 may be further set to another size.
  • the divided region 35 does not have to be square and is not limited to a rectangle.
  • the entire image data 32 or an arbitrary pixel position of the predetermined division area 35 may be selected, and the pixel 300 at the selected pixel position may be used as the sampling pixel.
  • the plurality of pixel positions arbitrarily selected include, for example, a plurality of discrete and aperiodic pixel positions.
  • the preprocessing unit 210 can select the plurality of pixel positions by using pseudo-random numbers. Further, the selected pixel positions are preferably different for each frame, but some pixel positions may overlap between the frames.
  • Feature extraction processing is performed on the sampled images 36 ⁇ 1 to 36 ⁇ 4 of each phase (steps S11a to S11d).
  • the feature amount of the sampled image 36 ⁇ 1 first extracted in step S11a is integrated with the feature amount already accumulated in step S12a.
  • FIG. 18 it is shown that the feature amount extracted from the sampled image 36 ⁇ 1 is directly subjected to the recognition process and the recognition result ⁇ 1 is acquired (step S13a).
  • the recognition result ⁇ 1 of the recognition process in step S13a is a breaking news because it is the first recognition result of the recognition results ⁇ 1 to ⁇ 4 based on the sampling images 36 ⁇ 1 to 36 ⁇ 4 generated from the image data 32 of one frame. Call it the result.
  • the feature amount of the sampled image 36 ⁇ 2 extracted in step S11b is integrated with the feature amount extracted from the sampled image 36 ⁇ 1 in step S11a in step S12b.
  • the recognition process is performed on the features integrated in step S12b, and the recognition result ⁇ 2 is acquired (step S13b).
  • the integrated feature amount is integrated with the feature amount of the sampled image 36 ⁇ 3 extracted in step S11c (step S12c). That is, in step S12c, the feature quantities extracted from the sampled images 36 ⁇ 1, 36 ⁇ 2, and 36 ⁇ 3 are integrated.
  • Recognition processing is performed on the feature amount integrated in step S12c, and the recognition result ⁇ 3 is acquired (step S13c).
  • the integrated feature amount is integrated with the feature amount of the sampled image 36 ⁇ 4 extracted in step S11d (step S12d). That is, in step S12d, the feature quantities extracted from the sampled images 36 ⁇ 1, 36 ⁇ 2, 36 ⁇ 3, and 36 ⁇ 4 are integrated.
  • Recognition processing is performed on the integrated feature amount in step S13d, and the recognition result ⁇ 4 is acquired.
  • the recognition result ⁇ 4 of the recognition process in this step S13d is called an integration result because it is acquired based on the integrated feature amount that integrates the feature amounts extracted from all of the sampled images 36 ⁇ 1 to 36 ⁇ 4.
  • the integration result corresponds to the recognition result when the pixels 300 at all the pixel positions of the image data 32 of one frame are used as sampling pixels.
  • the recognition results ⁇ 2 and ⁇ 3 of the recognition processing in steps S13b and S13c are recognition results acquired by the recognition processing in the middle of acquiring the integration result, and are called intermediate results.
  • the recognizer according to the first embodiment has any of these recognition results ⁇ 1 to ⁇ 4, or each recognition result ⁇ 1 to ⁇ 4, depending on the prior information detected in advance, the environmental information, and the like. Of these, a combination of multiple recognition results can be output adaptively.
  • FIG. 19 shows how to output the recognition result according to the first embodiment, that is, any one of the recognition results ⁇ 1 to ⁇ 4, or a combination of a plurality of recognition results among the recognition results ⁇ 1 to ⁇ 4. It is an example time chart showing an example of the timing for determining which recognition result is to be output and what condition is to determine the recognition result to be output. In FIG. 19, the passage of time is shown in the right direction.
  • timing P timing P
  • timing Q timing Q
  • timing R determination timings from the earliest in the time series.
  • Timing P detects prior information before a series of recognition processes are started.
  • the prior information is, for example, information to be detected in advance for executing the recognition process, and based on the prior information, for example, which of the recognition results ⁇ 1 to ⁇ 4 is to be output is determined.
  • the timing Q sets the recognition result to be output based on the recognition result in the predetermined period 100.
  • the predetermined period 100 can be, for example, a period of one frame or more in frame units.
  • the timing R sets the recognition result to be output based on the recognition result (for example, the above-mentioned breaking news result or the intermediate result) in the predetermined period 101 in the frame.
  • the recognition result to be output is determined based on the prior information detected before the recognition process is started, corresponding to the above-mentioned timing P process.
  • FIG. 20A is a functional block diagram of an example for explaining a more detailed function of the pretreatment unit 210 according to the first embodiment.
  • the preprocessing unit 210 includes a utilization area acquisition unit 211, a recognition result setting unit 212, a recognition result output calculation unit 213, and a storage unit 214.
  • the storage unit 214 includes a memory and a memory control unit for controlling reading and writing to the memory.
  • the used area acquisition unit 211, the recognition result setting unit 212, the recognition result output calculation unit 213, and the storage unit 214 are realized by, for example, an information processing program running on the CPU 1205.
  • This information processing program can be stored in ROM 1206 in advance. Not limited to this, the information processing program can also be supplied from the outside via the interface 1204 and written to the ROM 1206.
  • the utilization area acquisition unit 211, the recognition result setting unit 212, the recognition result output calculation unit 213, and the storage unit 214 may be realized by operating the CPU 1205 and the DSP 1203, respectively, according to the information processing program. .. Furthermore, a part or all of the usage area acquisition unit 211, the recognition result setting unit 212, the recognition result output calculation unit 213, and the storage unit 214 (memory control unit) are configured by a hardware circuit that operates in cooperation with each other. You may.
  • the used area acquisition unit 211 includes a reading unit that reads the image data 32 from the sensor unit 10b.
  • the used area acquisition unit 211 performs subsampling processing on the image data 32 read from the sensor unit 10b by the reading unit according to a predetermined pattern (for example, a divided area 35 having a size of 4 pixels ⁇ 4 pixels). , Sampling pixels are extracted, and a sampled image 36 ⁇ x having a phase ⁇ x is generated from the extracted sampling pixels. That is, the utilization area acquisition unit 211 realizes the function of the generation unit that generates the sampled image.
  • the used area acquisition unit 211 passes the generated sampled image 36 ⁇ x to the recognition unit 220.
  • the use area acquisition unit 211 can perform read control on the sensor unit 10b to specify a line or the like for reading.
  • FIG. 20B is a functional block diagram of an example for explaining a more detailed function of the recognition unit 220 according to the first embodiment.
  • the recognition unit 220 includes a feature amount calculation unit 221, a feature amount accumulation control unit 222, a feature amount accumulation unit 223, and a recognition process execution unit 224.
  • the feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount storage unit 223, and the recognition process execution unit 224 are realized by, for example, an information processing program running on the CPU 1205.
  • This information processing program can be stored in ROM 1206 in advance. Not limited to this, the information processing program can also be supplied from the outside via the interface 1204 and written to the ROM 1206.
  • the feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount storage unit 223, and the recognition process execution unit 224 may be realized by operating the CPU 1205 and the DSP 1203, respectively, according to the information processing program. Furthermore, a part or all of the feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount storage unit 223, and the recognition processing execution unit 224 may be configured by a hardware circuit that operates in cooperation with each other. ..
  • the recognition unit 220 the feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount accumulation unit 223, and the recognition process execution unit 224 constitute a recognizer that executes recognition processing based on image data.
  • the recognition unit 220 can construct the recognizer and change the configuration according to the recognizer information passed from the parameter storage unit 230.
  • the sampled image 36 ⁇ x passed from the usage area acquisition unit 211 is input to the feature amount calculation unit 221.
  • the feature amount calculation unit 221 includes one or more feature calculation units for calculating the feature amount, and calculates the feature amount based on the passed sampled image 36 ⁇ x. That is, the feature amount calculation unit 221 functions as a calculation unit for calculating the feature amount of the sampled image 36 ⁇ x composed of sampling pixels. Not limited to this, the feature amount calculation unit 221 may acquire information for setting the exposure and analog gain from, for example, the sensor unit 10b, and further use the acquired information to calculate the feature amount.
  • the feature amount calculation unit 221 passes the calculated feature amount to the feature amount accumulation control unit 222.
  • the feature amount accumulation control unit 222 accumulates the feature amount passed from the feature amount calculation unit 221 in the feature amount accumulation unit 223. At this time, the feature amount accumulation control unit 222 integrates the past feature amount already accumulated in the feature amount storage unit 223 and the feature amount passed from the feature amount calculation unit 221 to generate the integrated feature amount. can do. Further, when the feature amount storage unit 223 is initialized and the feature amount does not exist, the feature amount accumulation control unit 222 uses the feature amount passed from the feature amount calculation unit 221 as the first feature amount as the feature amount storage unit. Accumulate in 223.
  • the feature amount accumulation control unit 222 can delete unnecessary feature amounts from the feature amounts accumulated in the feature amount accumulation unit 223.
  • the unnecessary feature amount is, for example, a feature amount related to the previous frame, a feature amount calculated based on a frame image of a scene different from the frame image in which a new feature amount is calculated, and an already accumulated feature amount.
  • the feature amount accumulation control unit 222 can also specify the feature amount to be deleted in response to an instruction from the outside. Further, the feature amount accumulation control unit 222 can also delete and initialize all the feature amounts accumulated in the feature amount accumulation unit 223, if necessary.
  • the feature amount accumulation control 222 is the feature amount passed from the feature amount calculation unit 221 to the feature amount accumulation control unit 222, or the feature amount accumulated in the feature amount accumulation unit 223, and the feature amount is passed from the feature amount calculation unit 221.
  • the feature amount integrated with the feature amount is passed to the recognition processing execution unit 224.
  • the recognition process execution unit 224 executes a recognition process that performs object detection, person detection, face detection, etc. based on the feature amount passed from the feature amount accumulation control unit 222. For example, the recognition processing execution unit 224 recognizes when the feature amount is a feature amount passed from the feature amount calculation unit 221 to the feature amount accumulation control unit 222, that is, a feature amount that is not integrated with other feature amounts. The breaking news result is output as the recognition result of the processing result.
  • the recognition processing execution unit 224 recognizes the result of the recognition processing when the feature amount is a combination of all the feature amounts based on all the sampled images 36 ⁇ x generated from the image data 32 of one frame. As a result, the integration result is output. Further, the recognition processing execution unit 224 can also output an intermediate result which is an intermediate recognition result between the breaking news result and the integration result.
  • the storage unit 214 accumulates the recognition results output by the recognition unit 220.
  • the recognition result accumulated in the storage unit 214 is passed to the recognition result output calculation unit 213.
  • the storage unit 214 can also directly pass the recognition result output from the recognition processing execution unit 224 to the recognition result output calculation unit 213.
  • the recognition result output calculation unit 213 obtains one or more recognition results to be output from the recognition unit 220 among the recognition results ⁇ 1 to ⁇ 4 based on the recognition result passed from the storage unit 214.
  • the recognition result output calculation unit 213 passes the obtained recognition result to the recognition result output setting unit 212.
  • the recognition result output setting unit 212 sets the recognition result to be output to the recognition unit 220 based on the recognition result passed from the recognition result output calculation unit 213 or, for example, prior information supplied from the outside of the recognition processing unit 20b. .. That is, the recognition result output setting unit 212 executes any of the timing P, Q, and R processes described with reference to FIG. 19 based on the recognition result or prior information. In this way, the recognition result output setting unit and the recognition result output calculation unit 213 function as an output control unit that controls the output of the recognition result by the recognition unit 220.
  • FIG. 21 is an example flowchart showing the recognition process according to the first embodiment.
  • the information processing device 1b according to the first embodiment will be described as being used for in-vehicle use.
  • the recognition processing unit 20b detects advance information by the recognition result output setting unit 212.
  • the prior information includes, for example, vehicle body information of a vehicle on which the information processing device 1b including the recognition processing unit 20b is mounted, position information indicating the current position of the vehicle (information processing device 1b), and time information indicating the current date and time. Can be applied.
  • the traveling speed of the vehicle can be applied as the vehicle body information.
  • the position information can be acquired by providing a self-position acquisition means such as GNSS (Global Navigation Satellite System) or SLAM (Simultaneous Localization and Mapping) in the vehicle or the information processing device 1b itself. ..
  • GNSS Global Navigation Satellite System
  • SLAM Simultaneous Localization and Mapping
  • the country or region can be identified.
  • the area includes a wide area such as a prefecture and a specific area (shopping district, school zone, etc.) in the urban area.
  • the time information can be acquired from, for example, a timer or calendar mounted on the vehicle or the information processing device 1b itself, and it is possible to know the day and night and the season.
  • the recognition processing unit 20b outputs the recognition result (for example, any of recognition results ⁇ 1 to ⁇ 4) at which timing based on the prior information detected in step S100 by the recognition result output setting unit 212.
  • the recognition result for example, any of recognition results ⁇ 1 to ⁇ 4
  • the recognition result may be output at which timing based on the prior information detected in step S100 by the recognition result output setting unit 212.
  • the recognition result For example, when the prior information is running speed information, if the running speed is equal to or higher than a predetermined value, a breaking news result (recognition result ⁇ 1) may be output, and if not, an integration result (recognition result ⁇ 4) may be output. Be done.
  • the breaking news result (recognition result ⁇ 1) is output in order to prioritize the recognition of short-distance objects, and if it is a highway, it is far away. It is conceivable to output the integration result (recognition result ⁇ 4) in order to prioritize the recognition of traffic conditions and oncoming vehicles.
  • the recognition processing unit 20b acquires the sampled image 36 ⁇ x in which the image data 32 is subsampled according to, for example, a preset pattern (for example, the division area 35) by the utilization area acquisition unit 211.
  • the recognition result output setting unit 212 specifies to the recognition unit 220 which timing of the recognition results ⁇ 1 to ⁇ 4 is to be output according to the determination of the recognition result output setting unit 212 in step S101. do.
  • the recognition processing unit 20b executes the recognition processing on the sampled image 36 ⁇ x passed from the used area acquisition unit 211 by the recognition unit 220.
  • the recognition processing unit 20b determines whether or not the recognition result ⁇ x recognized by the recognition unit 220 is the recognition result specified in the recognition unit 220 by the recognition result output setting unit 212. do.
  • the recognition result output setting unit 212 can execute the determination by acquiring the recognition result ⁇ x output from the recognition unit 220 via the storage unit 214 and the recognition result output calculation unit 213.
  • step S105 the recognition result output setting unit 212 determines that the recognition result ⁇ x recognized by the recognition unit 220 is not the recognition result specified in the recognition unit 220 in step S103 (step S105, “No”), the recognition result output setting unit 212 performs the processing in step S102. Return to.
  • step S105 the recognition result output setting unit 212 determines that the recognition result ⁇ x recognized by the recognition unit 220 is the recognition result specified in the recognition unit 220 in step S103 (step S105, “Yes”), the processing is performed. The process proceeds to step S106.
  • step S106 the recognition result output setting unit 212 instructs the recognition unit 220 to output the recognition result ⁇ x obtained by the recognition process in step S104. In response to this instruction, the recognition result ⁇ x is output from the recognition unit 220.
  • the recognition processing unit 20b according to the first embodiment detects the prior information and determines at what timing the recognition result ⁇ x is output based on the detected prior information. Therefore, by applying the recognition processing unit 20b according to the first embodiment, it is possible to obtain a recognition result according to the situation. Further, this makes it possible to suppress the cost of calculation amount and the communication cost.
  • the first modification of the first embodiment corresponds to the processing of the timing R described with reference to FIG. 19, and based on the recognition result in the predetermined period 101 in the frame, the recognition result ⁇ x output by the recognition unit 220 is obtained. decide.
  • FIG. 22 is a flowchart of an example showing the recognition process according to the first modification of the first embodiment.
  • the recognition target for determining the recognition result ⁇ x output by the recognition unit 220 is set.
  • the information processing device 1b according to the first modification of the first embodiment is for in-vehicle use, for example, a person (pedestrian) can be considered as a recognition target. Not limited to this, it is also possible to recognize oncoming vehicles and road signs.
  • step S200 the recognition processing unit 20b acquires the sampled image 36 ⁇ x in which the image data 32 of the frame (t) is subsampled by the utilization area acquisition unit 211, for example, according to a preset pattern (for example, the division area 35). ..
  • the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results ⁇ 1 to ⁇ 4 is to be output by the recognition result output setting unit 212.
  • the recognition result output setting unit 212 specifies to the recognition unit 220 to output the recognition result based on the sampled image 36 ⁇ x acquired in step S200.
  • the recognition processing unit 20b executes the recognition processing for the sampled image 36 ⁇ x acquired in step S200 by the recognition unit 220. More specifically, the recognition unit 220 extracts the feature amount of the sampled image 36 ⁇ x acquired in step S200, and executes the recognition process based on the extracted feature amount.
  • the recognition processing unit 20b uses the recognition result output setting unit 212 to perform the recognition results ⁇ 1 to ⁇ 4 in the next frame (t + 1) of the frame (t) based on the result of the recognition processing in step S202. It is determined whether or not it is decided at which timing the recognition result is to be output.
  • the recognition result output setting unit 212 may use any of the recognition results ⁇ 1 to ⁇ 4 according to the recognition target, for example, when a recognition target designated in advance is detected based on the result of the recognition process in step S202. It is determined that it has been decided whether to output the timing recognition result.
  • step S203 determines in step S203 that the timing of which to output the recognition result has not been determined (step S203, “No”), the process is returned to step S200 and acquired in step S200.
  • the sampled image 36 ⁇ (x + 1) of the next phase of the sampled image 36 ⁇ x is acquired.
  • step S203 determines in step S203 that the timing of the recognition result to be output is determined (step S203, “Yes”), the process shifts to step S204.
  • step S204 the recognition processing unit 20b acquires the sampled image 36 ⁇ x according to the recognition result ⁇ x whose output is determined according to the determination in step S203 in the utilization area acquisition unit 211.
  • the sampled image 36 ⁇ x to be acquired may be the sampled image 36 ⁇ x of the image data 32 at the time (t), or may be the sampled image 36 ⁇ x of the image data 32 at the next time (t + 1).
  • the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results ⁇ 1 to ⁇ 4 is to be output by the recognition result output setting unit 212.
  • the recognition result output setting unit 212 specifies to the recognition unit 220 to output the recognition result based on the sampled image 36 ⁇ x determined in step S203.
  • the recognition processing unit 20b executes the recognition processing for the sampled image 36 ⁇ x acquired in step S204 by the recognition unit 220.
  • step S207 whether or not the recognition processing unit 20b has performed the recognition process based on the recognition result whose output is specified in step S205 by the recognition result output setting unit 212, for example, the sampling image 36 ⁇ x determined in step S203. Is determined. If it is determined that the recognition process has not been performed (step S207, "No"), the process is returned to step S204. On the other hand, when the recognition result output setting unit 212 determines that the recognition process has been performed (step S207, “Yes”), the process shifts to step S208.
  • step S208 the recognition processing unit 20b instructs the recognition unit 220 to output the recognition result ⁇ x by the recognition result output setting unit 212.
  • the recognition unit 220 outputs the recognition result ⁇ x in response to this instruction.
  • FIG. 23 is a diagram corresponding to FIG. 18 described above, and is a schematic diagram for explaining the recognition process according to the first modification of the first embodiment.
  • the recognition target is recognized by the recognition process (step S13a) based on the sampled image 36 ⁇ 1 by the first subsampling of the image data 32 (step S202).
  • the information processing device 1b performs communication for urging the own vehicle to brake.
  • the recognition result output setting unit 212 has at least one of the recognition results ⁇ 1 to ⁇ 4 based on the image data 32 of the next time (t + 1) (the image data 32 of the frame next to the image data 32 of the time (t)). Is determined as the recognition result for output.
  • the recognition unit 220 does not execute the recognition process (step S13b to step S13d) based on the remaining sampled images 36 ⁇ 2 to 36 ⁇ 4 for the image data 32 of the time (t), and starts from step S204 to the next time (t + 1). ) Is executed.
  • FIG. 24 is an example time chart for explaining the effect of the processing of FIG. 23.
  • FIGS. 16 and 17A described above when the sampled images 36 ⁇ 1 to 36 ⁇ 4 of the four phases ⁇ 1 to ⁇ 4 are acquired from the image data 32, each recognition process for each of the sampled images 36 ⁇ 1 to 36 ⁇ 4 is completed within one frame period. It is shown to do.
  • the one-frame period includes an exposure period and a period for performing pixel data transfer processing, and in fact, for example, the recognition processing for the first sampled image 36 ⁇ 1 is delayed with respect to the frame start timing time t 0. And start. Further, each recognition process is not always completed within 1/4 of the time of one frame period.
  • the time t 11 at which the recognition process of the final recognition result ⁇ 4 is completed is the second.
  • the time t 1 at which the one frame period ends may not be met. That is, in this case, the recognition process of the first frame period is applied to the next second frame period across the frames. Therefore, the next recognition process starts from the time t 2 at which the second frame period ends, which may cause a problem in responsiveness.
  • the recognition process for the first sampled image 36 ⁇ 1 is executed, and the recognition process for the subsequent sampled images ⁇ 2 to ⁇ 4 is not performed.
  • the recognition process for the image data 32 can be started from the time t 1 at which the second frame period starts. Therefore, it is superior in responsiveness as compared with the above example.
  • recognition processing for the sampling image 36 ⁇ 1 is to start the second frame period at the time t 10 to end It is possible to further improve the responsiveness.
  • the second modification of the first embodiment corresponds to the processing of the timing Q described with reference to FIG. 19, and sets the recognition result ⁇ x to be output based on the recognition result in the predetermined period 100 across the frames.
  • FIG. 25 is a flowchart of an example showing the recognition process according to the second modification of the first embodiment.
  • the recognition target for determining the recognition result ⁇ x output by the recognition unit 220 is set.
  • the information processing device 1b according to the second modification of the first embodiment is for in-vehicle use, for example, a person (pedestrian) can be considered as a recognition target. Not limited to this, it is also possible to recognize oncoming vehicles and road signs.
  • step S300 the recognition processing unit 20b acquires the sampled image 36 ⁇ x in which the image data 32 of one frame is subsampled by the utilization area acquisition unit 211, for example, according to a preset pattern (for example, the division area 35).
  • the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results ⁇ 1 to ⁇ 4 is to be output by the recognition result output setting unit 212.
  • the recognition result output setting unit 212 specifies to the recognition unit 220 to output the recognition result based on the sampled image 36 ⁇ x acquired in step S300.
  • the recognition processing unit 20b executes the recognition processing for the sampled image 36 ⁇ x acquired in step S300 by the recognition unit 220. More specifically, the recognition unit 220 extracts the feature amount of the sampled image 36 ⁇ x acquired in step S300, and executes the recognition process based on the extracted feature amount. In the next step S303, the storage unit 214 accumulates the recognition result ⁇ x by the recognition process executed in step S302.
  • step S304 the recognition processing unit 20b determines whether or not the processing of steps S300 to S303 has been executed for a predetermined period (for example, several frame period) by the recognition result output setting unit 212.
  • a predetermined period for example, several frame period
  • step S305 the recognition processing unit 20b outputs the recognition result of which timing of the recognition results ⁇ 1 to ⁇ 4 in the subsequent frames based on the recognition result ⁇ x accumulated in the storage unit 214 by the recognition result output setting unit 212. Decide if you want to.
  • the recognition result output setting unit 212 can determine one or more recognition results as the output recognition result ⁇ x from the recognition results ⁇ 1 to ⁇ 4 based on the image data 32 of one frame.
  • step S306 the recognition processing unit 20b acquires the sampled image 36 ⁇ x according to the recognition result ⁇ x whose output is determined according to the determination in step S305 in the utilization area acquisition unit 211.
  • the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results ⁇ 1 to ⁇ 4 is to be output by the recognition result output setting unit 212.
  • the recognition result output setting unit 212 designates the recognition unit 220 to output the recognition result based on the sampled image 36 ⁇ x determined in step S303.
  • a plurality of recognition results are determined as the recognition results ⁇ x to be output in step S305, they are sequentially selected and determined in the loop of steps S306 to S309.
  • the recognition processing unit 20b executes the recognition processing for the sampled image 36 ⁇ x acquired in step S306 by the recognition unit 220.
  • step S309 the recognition processing unit 20b determines whether or not all the recognition processing based on the recognition result whose output is specified in step S305 has been performed by the recognition result output setting unit 212.
  • step S309, “No” the recognition result output setting unit 212 returns the process to step S306.
  • step S309, “Yes” the recognition result output setting unit 212 determines that the recognition process has been performed for all the recognition results whose output has been determined.
  • step S310 the recognition processing unit 20b instructs the recognition result output setting unit 212 to output each recognition result ⁇ x whose output has been determined to the recognition unit 220.
  • the recognition unit 220 outputs each recognition result ⁇ x in response to this instruction.
  • the process of step S310 may be executed between the process of step S308 and the process of step S309.
  • the explanation will be given using a more specific example. Similar to the above, an obstacle (pedestrian, etc.) on the road is applied as the recognition target, and the preliminary report result (recognition result ⁇ 1) shows the braking target distance of the vehicle (own vehicle) on which the information processing device 1b is mounted. It is assumed that the recognition target can be recognized within the range.
  • FIG. 26 is a diagram corresponding to FIG. 18 described above, and is a schematic diagram for explaining the recognition process according to the second modification of the first embodiment.
  • the recognition result output setting unit 212 instructs the recognition unit 220 to output, for example, the recognition result ⁇ 4, that is, the integration result, and accumulates the recognition result ⁇ 4 for a predetermined period 100. Accumulate in unit 214. Based on the recognition result ⁇ 4 accumulated in the storage unit 214, the recognition result output setting unit 212 determines that the area where the own vehicle is currently located is an area with many pedestrians.
  • the recognition result output setting unit 212 outputs the recognition result ⁇ 1 (breaking news result) and the recognition result ⁇ 4 (integration result) as shown in the lower part of FIG. 26. (Step S305).
  • the recognition unit 220 outputs the recognition results ⁇ 1 and ⁇ 4 according to this determination. These recognition results ⁇ 1 and ⁇ 4 are sent to, for example, the braking system of the own vehicle.
  • the recognition result to be output is determined based on the recognition result for a predetermined period, it is possible to appropriately output the recognition result according to the situation. Become.
  • the technique according to the present disclosure has been described as being applied to the recognition process using DNN, but this is not limited to this example.
  • any architecture that expands and uses image information on the time axis can be applied to other technologies.
  • a second embodiment of the present disclosure is an example in which a sensor unit 10b including a pixel array unit 1001, a recognition unit 220, and a configuration corresponding to a preprocessing unit 210 are integrally incorporated into a layered CIS. ..
  • FIG. 28 is a block diagram showing a configuration of an example of the information processing device according to the second embodiment.
  • the information processing device 1c includes a sensor unit 10c and a recognition unit 220. Further, the sensor unit 10c includes a pixel array unit 1001 and a read control unit 240.
  • the read control unit 240 includes, for example, a function corresponding to the preprocessing unit 210 described in the first embodiment and a function of the control unit 1100 in the imaging unit 1200.
  • the vertical scanning unit 1002, the AD conversion unit 1003, and the signal processing unit 1101 will be described as being included in the pixel array unit 1001.
  • the read control unit 240 supplies the pixel array unit 1001 with a control signal that specifies the pixel circuit 1000 that reads the pixel signal.
  • the read control unit 240 can selectively read a line including sampling pixels from the pixel array unit 1001.
  • the read control unit 240 can selectively specify the pixel circuit 1000 corresponding to the sampling pixel in the pixel circuit 1000 unit for the pixel array unit 1001.
  • the read control unit 240 may specify to the pixel array unit 1001 the pixel circuit 1000 corresponding to the pixel position of the sampled pixel by subsampling performed while shifting the phase described in the first embodiment. can.
  • the pixel array unit 1001 converts the pixel signal read from the designated pixel circuit 1000 into digital pixel data, and passes this pixel data to the read control unit 240.
  • the read control unit 240 passes the pixel data for one frame passed from the pixel array unit 1001 to the recognition unit 220 as image data.
  • This image data is a sampled image by phase shift subsampling.
  • the recognition unit 220 executes a recognition process on the passed image data.
  • the information processing apparatus 1c can be configured by the laminated CIS having a two-layer structure in which semiconductor chips are laminated in two layers, which is described with reference to FIG. 6A.
  • the pixel portion 2020a is formed on the semiconductor chip of the first layer
  • the memory + logic portion 2020b is formed on the semiconductor chip of the second layer.
  • the pixel unit 2020a includes at least the sensor unit 10c in the information processing device 1c.
  • the memory + logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit 1001, a read control unit 240, and a recognition unit 220.
  • the memory + logic unit 2020b can further include a frame memory.
  • the information processing apparatus 1c can be configured by the laminated CIS having a three-layer structure in which semiconductor chips are laminated in three layers, which is described with reference to FIG. 6B.
  • the pixel portion 2020a described above is formed on the semiconductor chip of the first layer
  • the memory portion 2020c including, for example, a frame memory is formed on the semiconductor chip of the second layer
  • the memory + logic described above is formed on the semiconductor chip of the third layer.
  • the logic unit 2020d corresponding to the unit 2020b is formed.
  • the logic unit 2020d includes, for example, a drive circuit for driving the pixel array unit, a read control unit 240, and a recognition unit 220.
  • the memory unit 2020c can include a frame memory and a memory 1202.
  • the sensor unit 10c performs the subsampling process. Therefore, it is not necessary to read from the all-pixel circuit 1000 included in the pixel array unit 1001. Therefore, the delay in the recognition process can be further shortened as compared with the first embodiment described above. Further, since the pixel circuit 1000 of the line including the sampling pixels is selectively read from the all pixel circuits 1000, the amount of reading the pixel signal from the pixel array unit 1001 can be reduced, and the bus width can be reduced.
  • the pixel array unit 1001 selectively reads out the lines including the sampling pixels, and reads out by thinning out the lines. Therefore, it is possible to reduce the distortion of the captured image by the rolling shutter. Further, it is possible to reduce the power consumption at the time of imaging in the pixel array unit 1001. Further, in the lines thinned out by subsampling, it is possible to change the imaging conditions such as exposure to the lines to be read out by subsampling to perform imaging.
  • a modification of the second embodiment is an example in which the sensor unit 10c and the recognition unit 220 are separated from each other in the information processing device 1c according to the second embodiment described above.
  • FIG. 29 is a block diagram showing a configuration of an example of an information processing device according to a modified example of the second embodiment.
  • the information processing device 1d includes a sensor unit 10d and a recognition processing unit 20d
  • the sensor unit 10d includes a pixel array unit 1001 and a read control unit 240.
  • the recognition processing unit 20d includes a recognition unit 220.
  • the sensor unit 10d is formed by, for example, the laminated CIS having a two-layer structure in which semiconductor chips are laminated in two layers, which is described with reference to FIG. 6A.
  • the pixel portion 2020a is formed on the semiconductor chip of the first layer
  • the memory + logic portion 2020b is formed on the semiconductor chip of the second layer.
  • the pixel unit 2020a includes at least the pixel array unit 1001 in the sensor unit 10d.
  • the memory + logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit 1001 and a read control unit 240.
  • the memory + logic unit 2020b can further include a frame memory.
  • the sensor unit 10d outputs the image data of the sampled image from the read control unit 240 and supplies it to the recognition processing unit 20d included in the hardware different from the sensor unit 10d.
  • the recognition processing unit 20d inputs the image data supplied from the sensor unit 10d to the recognition unit 220.
  • the recognition unit 220 executes the recognition process based on the input image data, and outputs the recognition result to the outside.
  • the sensor unit 10d can be formed by the laminated CIS having a three-layer structure in which semiconductor chips are laminated in three layers, which is described with reference to FIG. 6B.
  • the pixel portion 2020a described above is formed on the semiconductor chip of the first layer
  • the memory portion 2020c including, for example, a frame memory is formed on the semiconductor chip of the second layer
  • the memory + logic described above is formed on the semiconductor chip of the third layer.
  • the logic portion 2020b corresponding to the portion 2020b is formed.
  • the logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit 1001 and a read control unit 240.
  • the memory unit 2020c can include a frame memory and a memory 1202.
  • the recognition processing unit 20d (recognition unit 220) with hardware different from the sensor unit 10d, it is possible to easily change the configuration of the recognition unit 220, for example, the recognition model.
  • the sensor unit 10d performs the recognition process based on the sub-sampled sampled image
  • the load of the recognition process can be reduced as compared with the case where the recognition process is performed by using the image data 32 of the captured image as it is. Can be done. Therefore, for example, in the recognition processing unit 20d, a CPU, DSP, or GPU having a low processing capacity can be used, and the cost of the information processing device 1d can be reduced.
  • FIG. 30 is a diagram showing a first embodiment and each modification thereof, and a usage example using the information processing devices 1b, 1c, and 1d according to the second embodiment and the modification.
  • the information processing devices 1b, 1c and 1d will be represented by the information processing device 1b when it is not necessary to distinguish them.
  • the information processing device 1b described above can be used in various cases where, for example, as shown below, light such as visible light, infrared light, ultraviolet light, and X-ray is sensed and recognition processing is performed based on the sensing result. can.
  • -A device that captures images used for viewing, such as digital cameras and mobile devices with camera functions.
  • in-vehicle sensors that photograph the front, rear, surroundings, inside of the vehicle, etc., surveillance cameras that monitor traveling vehicles and roads, inter-vehicle distance, etc.
  • a device used for traffic such as a distance measuring sensor that measures a distance.
  • -A device used for home appliances such as TVs, refrigerators, and air conditioners in order to take a picture of a user's gesture and operate the device according to the gesture.
  • -Devices used for medical treatment and healthcare such as endoscopes and devices that perform angiography by receiving infrared light.
  • -Devices used for security such as surveillance cameras for crime prevention and cameras for personal authentication.
  • -Devices used for cosmetology such as a skin measuring device that photographs the skin and a microscope that photographs the scalp.
  • -Devices used for sports such as action cameras and wearable cameras for sports applications.
  • -Agricultural equipment such as cameras for monitoring the condition of fields and crops.
  • the technology according to the present disclosure (the present technology) can be applied to various products.
  • the technology according to the present disclosure is realized as a device mounted on a moving body of any kind such as an automobile, an electric vehicle, a hybrid electric vehicle, a motorcycle, a bicycle, a personal mobility, an airplane, a drone, a ship, and a robot. You may.
  • FIG. 31 is a block diagram showing a schematic configuration example of a vehicle control system, which is an example of a mobile control system to which the technique according to the present disclosure can be applied.
  • the vehicle control system 12000 includes a plurality of electronic control units connected via the communication network 12001.
  • the vehicle control system 12000 includes a drive system control unit 12010, a body system control unit 12020, an outside information detection unit 12030, an in-vehicle information detection unit 12040, and an integrated control unit 12050.
  • a microcomputer 12051, an audio image output unit 12052, and an in-vehicle network I / F (interface) 12053 are shown as a functional configuration of the integrated control unit 12050.
  • the drive system control unit 12010 controls the operation of the device related to the drive system of the vehicle according to various programs.
  • the drive system control unit 12010 provides a driving force generator for generating the driving force of the vehicle such as an internal combustion engine or a driving motor, a driving force transmission mechanism for transmitting the driving force to the wheels, and a steering angle of the vehicle. It functions as a control device such as a steering mechanism for adjusting and a braking device for generating a braking force of a vehicle.
  • the body system control unit 12020 controls the operation of various devices mounted on the vehicle body according to various programs.
  • the body system control unit 12020 functions as a keyless entry system, a smart key system, a power window device, or a control device for various lamps such as a head lamp, a back lamp, a brake lamp, a winker, or a fog lamp.
  • the body system control unit 12020 may be input with radio waves transmitted from a portable device that substitutes for the key or signals of various switches.
  • the body system control unit 12020 receives inputs of these radio waves or signals and controls a vehicle door lock device, a power window device, a lamp, and the like.
  • the vehicle outside information detection unit 12030 detects information outside the vehicle equipped with the vehicle control system 12000.
  • the imaging unit 12031 is connected to the vehicle exterior information detection unit 12030.
  • the vehicle outside information detection unit 12030 causes the image pickup unit 12031 to capture an image of the outside of the vehicle and receives the captured image.
  • the vehicle exterior information detection unit 12030 may perform object detection processing or distance detection processing such as a person, a vehicle, an obstacle, a sign, or a character on the road surface based on the received image.
  • the imaging unit 12031 is an optical sensor that receives light and outputs an electric signal according to the amount of the light received.
  • the image pickup unit 12031 can output an electric signal as an image or can output it as distance measurement information. Further, the light received by the imaging unit 12031 may be visible light or invisible light such as infrared light.
  • the in-vehicle information detection unit 12040 detects the in-vehicle information.
  • a driver state detection unit 12041 that detects the driver's state is connected to the in-vehicle information detection unit 12040.
  • the driver state detection unit 12041 includes, for example, a camera that images the driver, and the in-vehicle information detection unit 12040 determines the degree of fatigue or concentration of the driver based on the detection information input from the driver state detection unit 12041. It may be calculated, or it may be determined whether the driver is dozing.
  • the microcomputer 12051 calculates the control target value of the driving force generator, the steering mechanism, or the braking device based on the information inside and outside the vehicle acquired by the outside information detection unit 12030 or the inside information detection unit 12040, and the drive system control unit.
  • a control command can be output to 12010.
  • the microcomputer 12051 realizes ADAS (Advanced Driver Assistance System) functions including vehicle collision avoidance or impact mitigation, follow-up driving based on inter-vehicle distance, vehicle speed maintenance driving, vehicle collision warning, vehicle lane deviation warning, and the like. It is possible to perform cooperative control for the purpose of.
  • ADAS Advanced Driver Assistance System
  • the microcomputer 12051 controls the driving force generator, the steering mechanism, the braking device, and the like based on the information around the vehicle acquired by the vehicle exterior information detection unit 12030 or the vehicle interior information detection unit 12040, so that the driver can control the vehicle. It is possible to perform coordinated control for the purpose of automatic driving, etc., which runs autonomously without depending on the operation.
  • the microcomputer 12051 can output a control command to the body system control unit 12020 based on the information outside the vehicle acquired by the vehicle exterior information detection unit 12030.
  • the microcomputer 12051 controls the headlamps according to the position of the preceding vehicle or the oncoming vehicle detected by the external information detection unit 12030, and performs coordinated control for the purpose of anti-glare such as switching the high beam to the low beam. It can be carried out.
  • the audio image output unit 12052 transmits the output signal of at least one of the audio and the image to the output device capable of visually or audibly notifying the passenger or the outside of the vehicle of the information.
  • an audio speaker 12061, a display unit 12062, and an instrument panel 12063 are exemplified as output devices.
  • the display unit 12062 may include, for example, at least one of an onboard display and a heads-up display.
  • FIG. 32 is a diagram showing an example of the installation position of the imaging unit 12031.
  • the vehicle 12100 has imaging units 12101, 12102, 12103, 12104, 12105 as imaging units 12031.
  • the imaging units 12101, 12102, 12103, 12104, 12105 are provided at positions such as the front nose, side mirrors, rear bumpers, back doors, and the upper part of the windshield in the vehicle interior of the vehicle 12100, for example.
  • the image pickup unit 12101 provided on the front nose and the image pickup section 12105 provided on the upper part of the windshield in the vehicle interior mainly acquire an image in front of the vehicle 12100.
  • the imaging units 12102 and 12103 provided in the side mirrors mainly acquire images of the side of the vehicle 12100.
  • the imaging unit 12104 provided on the rear bumper or the back door mainly acquires an image of the rear of the vehicle 12100.
  • the images in front acquired by the imaging units 12101 and 12105 are mainly used for detecting a preceding vehicle or a pedestrian, an obstacle, a traffic light, a traffic sign, a lane, or the like.
  • FIG. 32 shows an example of the photographing range of the imaging units 12101 to 12104.
  • the imaging range 12111 indicates the imaging range of the imaging unit 12101 provided on the front nose
  • the imaging ranges 12112 and 12113 indicate the imaging ranges of the imaging units 12102 and 12103 provided on the side mirrors, respectively
  • the imaging range 12114 indicates the imaging range of the imaging units 12102 and 12103.
  • the imaging range of the imaging unit 12104 provided on the rear bumper or the back door is shown. For example, by superimposing the image data captured by the imaging units 12101 to 12104, a bird's-eye view image of the vehicle 12100 as viewed from above can be obtained.
  • At least one of the imaging units 12101 to 12104 may have a function of acquiring distance information.
  • at least one of the image pickup units 12101 to 12104 may be a stereo camera composed of a plurality of image pickup elements, or an image pickup element having pixels for phase difference detection.
  • the microcomputer 12051 has a distance to each three-dimensional object within the imaging range 12111 to 12114 based on the distance information obtained from the imaging units 12101 to 12104, and a temporal change of this distance (relative velocity with respect to the vehicle 12100). By obtaining can. Further, the microcomputer 12051 can set an inter-vehicle distance to be secured in front of the preceding vehicle in advance, and can perform automatic braking control (including follow-up stop control), automatic acceleration control (including follow-up start control), and the like. In this way, it is possible to perform coordinated control for the purpose of automatic driving or the like in which the vehicle travels autonomously without depending on the operation of the driver.
  • automatic braking control including follow-up stop control
  • automatic acceleration control including follow-up start control
  • the microcomputer 12051 converts three-dimensional object data related to a three-dimensional object into two-wheeled vehicles, ordinary vehicles, large vehicles, pedestrians, electric poles, and other three-dimensional objects based on the distance information obtained from the imaging units 12101 to 12104. It can be classified and extracted and used for automatic avoidance of obstacles. For example, the microcomputer 12051 distinguishes obstacles around the vehicle 12100 into obstacles that can be seen by the driver of the vehicle 12100 and obstacles that are difficult to see. Then, the microcomputer 12051 determines the collision risk indicating the risk of collision with each obstacle, and when the collision risk is equal to or higher than the set value and there is a possibility of collision, the microcomputer 12051 is used via the audio speaker 12061 or the display unit 12062. By outputting an alarm to the driver and performing forced deceleration and avoidance steering via the drive system control unit 12010, driving support for collision avoidance can be provided.
  • At least one of the imaging units 12101 to 12104 may be an infrared camera that detects infrared rays.
  • the microcomputer 12051 can recognize a pedestrian by determining whether or not a pedestrian is present in the captured image of the imaging units 12101 to 12104.
  • pedestrian recognition includes, for example, a procedure for extracting feature points in an image captured by an imaging unit 12101 to 12104 as an infrared camera, and a pattern matching process for a series of feature points indicating the outline of an object to determine whether or not the pedestrian is a pedestrian. It is done by the procedure to determine.
  • the audio image output unit 12052 When the microcomputer 12051 determines that a pedestrian is present in the captured images of the imaging units 12101 to 12104 and recognizes the pedestrian, the audio image output unit 12052 outputs a square contour line for emphasizing the recognized pedestrian.
  • the display unit 12062 is controlled so as to superimpose and display. Further, the audio image output unit 12052 may control the display unit 12062 so as to display an icon or the like indicating a pedestrian at a desired position.
  • the above is an example of a vehicle control system to which the technology according to the present disclosure can be applied.
  • the technique according to the present disclosure can be applied to the imaging unit 12031 and the vehicle exterior information detection unit 12030 among the configurations described above.
  • the sensor unit 10b of the information processing device 1b is applied to the image pickup unit 12031
  • the recognition processing unit 20b is applied to the vehicle exterior information detection unit 12030.
  • the recognition result output from the recognition processing unit 20b is passed to the integrated control unit 12050 via, for example, the communication network 12001.
  • the technique according to the present disclosure to the imaging unit 12031 and the vehicle exterior information detection unit 12030, it is possible to switch the pattern by subsampling according to a predetermined condition, and the recognizer used for the recognition process. And the parameters can be changed according to the switched pattern. Therefore, the breaking news result, which is the recognition result with an emphasis on breaking news, can be obtained with higher accuracy, and more reliable driving support becomes possible.
  • the present technology can also have the following configurations.
  • a generation unit that generates a sampled image composed of sampling pixels obtained according to pixel positions set for each division area in which imaging information composed of pixels is divided in a predetermined pattern.
  • a calculation unit that calculates the feature amount of the sampled image,
  • a storage unit that accumulates the calculated features and
  • a recognition unit that performs recognition processing based on at least a part of the feature amounts accumulated in the storage unit and outputs a recognition result.
  • An output control unit that controls the recognition unit to output the recognition result based on a predetermined feature amount among the feature amounts stored in the storage unit.
  • the predetermined feature amount is The feature amount calculated using the sampled image of 1.
  • the predetermined feature amount is It is the feature amount calculated using a plurality of the sampled images including a part of the sampled pixels at all the pixel positions of the captured image in one frame.
  • the predetermined feature amount is It is the feature amount calculated by using the sampling pixels of all the pixel positions of the imaging information in one frame.
  • the output control unit The recognition unit A first feature amount calculated using the sampled image of 1 as the predetermined feature amount, and A second feature amount calculated using a plurality of the sampled images, which includes a part of sampled pixels at all pixel positions of the captured image in one frame as the predetermined feature amount, and a second feature amount.
  • the output control unit Which of the first feature amount, the second feature amount, and the third feature amount to output the recognition result based on the feature amount is set in advance for the recognition process. Determined based on prior information The information processing device according to (5) above.
  • the output control unit The recognition accumulated in the cumulative unit indicates which of the first feature amount, the second feature amount, and the third feature amount is used to output the recognition result. Determine based on the results, The information processing device according to (5) above.
  • the output control unit According to the recognition result based on the feature amount calculated using the sampled image of 1, one or more of the imaging information to be acquired later in time series with respect to the frame from which the sampled image was acquired. In the frame, it is determined whether or not at least one of the calculation process by the calculation unit, the accumulation process by the storage unit, and the recognition process by the recognition unit needs to be executed.
  • the information processing device according to (1) above.
  • the information processing device To determine the necessity of executing at least one of the processes, The information processing device according to (1) above. (11) The recognition unit The recognition process is performed based on the integrated feature amount that integrates the plurality of feature amounts accumulated in the storage unit. The information processing device according to any one of (1) to (10). (12) The recognition unit The feature amount calculated by the calculation unit in response to the acquisition of the imaging information is integrated with at least a part of the feature amount accumulated in the storage unit by the time immediately before the acquisition, and the integrated feature is integrated.
  • the information processing device Perform the recognition process based on the amount.
  • the information processing device according to (11) above.
  • the recognition unit Based on the teacher data for each pixel corresponding to the pixel position in each of the divided regions, the recognition process for the feature amount of the sampled image is performed.
  • the information processing device according to any one of (1) to (12).
  • the recognition unit Among the imaging information, the sampling pixels set in the first imaging information and the sampling pixels set in the second imaging information acquired next to the first imaging information in time series are included.
  • Machine learning processing is executed by the used RNN (Recurrent Neural Network), and the recognition processing is performed based on the result of the machine learning processing.
  • the information processing device according to any one of (1) to (13).
  • An output control step that controls the recognition step to output the recognition result based on a predetermined feature amount among the feature amounts accumulated by the accumulation step.
  • An output control step that controls the recognition step to output the recognition result based on a predetermined feature amount among the feature amounts accumulated by the accumulation step.
  • An information processing program that allows a computer to execute.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Studio Devices (AREA)

Abstract

Provided are an information processing device, an information processing method, and an information processing program by which characteristics of a recognition process, in which a captured image is used, can be improved. The information processing device of the present invention is provided with: a generation unit (211) that generates a sampling image constituted by sampling pixels acquired according to pixel positions set for each divided region in which captured image information constituted by pixels is divided by a predetermined pattern; a calculation unit (221) that calculates a feature amount of the sampling image; an accumulation unit (223) that accumulates calculated feature amounts; a recognition unit (224) that carries out recognition processing on the basis of at least a portion of the feature amounts accumulated by the accumulation unit, and outputs a recognition result; and an output control unit (212) that performs control such that the recognition unit outputs a recognition result based on a predetermined feature amount from among the feature amounts accumulated by the accumulation unit.

Description

情報処理装置、情報処理方法および情報処理プログラムInformation processing equipment, information processing methods and information processing programs
 本開示は、情報処理装置、情報処理方法および情報処理プログラムに関する。 This disclosure relates to an information processing device, an information processing method, and an information processing program.
 近年、デジタルスチルカメラ、デジタルビデオカメラ、多機能型携帯電話機(スマートフォン)などに搭載される小型カメラなどの撮像装置の高解像度化に伴い、撮像画像に含まれる所定の対象物を認識する画像認識機能を搭載する情報処理装置が開発されている。 In recent years, with the increase in resolution of image pickup devices such as digital still cameras, digital video cameras, and small cameras mounted on multifunctional mobile phones (smartphones), image recognition that recognizes a predetermined object included in a captured image is recognized. Information processing devices equipped with functions have been developed.
特開2017-112409号公報JP-A-2017-112409
 画像認識機能において、より高解像度の撮像画像を用いることで、対象物の検出性能を向上させることが可能である。しかしながら、従来の技術では、高解像度の撮像画像を用いた画像認識は、画像認識処理に係る計算量が多くなり、撮像画像に対する認識処理の同時性を向上させることが困難であった。 In the image recognition function, it is possible to improve the detection performance of an object by using a higher resolution captured image. However, in the conventional technique, image recognition using a high-resolution captured image requires a large amount of calculation related to the image recognition process, and it is difficult to improve the simultaneity of the recognition process for the captured image.
 本開示は、撮像画像を用いた認識処理の特性を向上させることが可能な情報処理装置、情報処理方法および情報処理プログラムを提供することを目的とする。 An object of the present disclosure is to provide an information processing device, an information processing method, and an information processing program capable of improving the characteristics of recognition processing using captured images.
 本開示に係る情報処理装置は、画素によって構成される撮像情報が所定のパターンで分割された分割領域毎に設定された画素位置に従い取得されたサンプリング画素により構成されるサンプリング画像を生成する生成部と、サンプリング画像の特徴量を算出する算出部と、算出された特徴量を蓄積する蓄積部と、蓄積部に蓄積された特徴量の少なくとも一部の特徴量に基づき認識処理を行い認識結果を出力する認識部と、認識部が、蓄積部に蓄積された特徴量のうち所定の特徴量に基づく認識結果を出力するように制御する出力制御部と、を備える。 The information processing apparatus according to the present disclosure is a generation unit that generates a sampled image composed of sampled pixels in which imaging information composed of pixels is acquired according to pixel positions set for each divided region divided by a predetermined pattern. The recognition process is performed based on the calculation unit for calculating the feature amount of the sampled image, the storage unit for accumulating the calculated feature amount, and the feature amount of at least a part of the feature amount accumulated in the storage unit, and the recognition result is obtained. A recognition unit for output and an output control unit for controlling the recognition unit to output a recognition result based on a predetermined feature amount among the feature amounts stored in the storage unit are provided.
各実施形態に適用な情報処理装置の基本的な構成例を示すブロック図である。It is a block diagram which shows the basic configuration example of the information processing apparatus applied to each embodiment. DNNによる認識処理の例を概略的に示す図である。It is a figure which shows typically the example of the recognition processing by DNN. DNNによる認識処理の例を概略的に示す図である。It is a figure which shows typically the example of the recognition processing by DNN. 時系列の情報を用いた場合の、DNNによる識別処理の第1の例を概略的に示す図である。It is a figure which shows typically the 1st example of the identification processing by DNN when the time series information is used. 時系列の情報を用いた場合の、DNNによる識別処理の第1の例を概略的に示す図である。It is a figure which shows typically the 1st example of the identification processing by DNN when the time series information is used. 時系列の情報を用いた場合の、DNNによる識別処理の第2の例を概略的に示す図である。It is a figure which shows typically the 2nd example of the identification processing by DNN when the time series information is used. 時系列の情報を用いた場合の、DNNによる識別処理の第2の例を概略的に示す図である。It is a figure which shows typically the 2nd example of the identification processing by DNN when the time series information is used. 各実施形態に適用可能な情報処理装置としての撮像装置のハードウェア構成例を概略的に示すブロック図である。It is a block diagram which shows schematic the hardware configuration example of the image pickup apparatus as an information processing apparatus applicable to each embodiment. 撮像部を2層構造の積層型CISにより形成した例を示す図である。It is a figure which shows the example which formed the imaging part by the laminated type CIS of a two-layer structure. 撮像部1200を3層構造の積層型CISにより形成した例を示す図である。It is a figure which shows the example which formed the imaging part 1200 by the laminated type CIS of a three-layer structure. 各実施形態に適用可能な撮像部の一例の構成を示すブロック図である。It is a block diagram which shows the structure of an example of the imaging unit applicable to each embodiment. 認識処理に用いる画像の解像度について説明するための図である。It is a figure for demonstrating the resolution of the image used for recognition processing. 認識処理に用いる画像の解像度について説明するための図である。It is a figure for demonstrating the resolution of the image used for recognition processing. 本開示の第1の実施形態に係る情報処理装置の一例の構成を示すブロック図である。It is a block diagram which shows the structure of an example of the information processing apparatus which concerns on 1st Embodiment of this disclosure. 第1の実施形態に係る認識器による認識処理を説明するための模式図である。It is a schematic diagram for demonstrating the recognition process by the recognizer which concerns on 1st Embodiment. 第1の実施形態に係るサンプリング処理を説明するための模式図である。It is a schematic diagram for demonstrating the sampling process which concerns on 1st Embodiment. 第1の実施形態に係る認識器による認識処理について、より具体的に説明するための図である。It is a figure for demonstrating more concretely about the recognition process by the recognizer which concerns on 1st Embodiment. 第1の実施形態に係る認識器による認識処理について、より具体的に説明するための図である。It is a figure for demonstrating more concretely about the recognition process by the recognizer which concerns on 1st Embodiment. 第1の実施形態に係る認識器による認識処理について、より具体的に説明するための図である。It is a figure for demonstrating more concretely about the recognition process by the recognizer which concerns on 1st Embodiment. 第1の実施形態に係る認識器による認識処理について、より具体的に説明するための図である。It is a figure for demonstrating more concretely about the recognition process by the recognizer which concerns on 1st Embodiment. 第1の実施形態に係る認識器による認識処理について、より具体的に説明するための図である。It is a figure for demonstrating more concretely about the recognition process by the recognizer which concerns on 1st Embodiment. 第1の実施形態に係る認識処理におけるサブサンプリング処理について説明するための模式図である。It is a schematic diagram for demonstrating the subsampling process in the recognition process which concerns on 1st Embodiment. 第1の実施形態に係る認識処理におけるサブサンプリング処理について説明するための模式図である。It is a schematic diagram for demonstrating the subsampling process in the recognition process which concerns on 1st Embodiment. 既存技術に係る認識処理の基本的なアーキテクチャを説明するための模式図である。It is a schematic diagram for demonstrating the basic architecture of recognition processing which concerns on the existing technology. 各実施形態に係る認識処理の基本的なアーキテクチャを説明するための模式図である。It is a schematic diagram for demonstrating the basic architecture of recognition processing which concerns on each embodiment. 各実施形態に係る認識処理の基本的なアーキテクチャにおける読み出しおよび認識処理の第1の例を示す一例のタイムチャートである。It is an example time chart which shows the 1st example of the read | recognition process in the basic architecture of the recognition process which concerns on each embodiment. 各実施形態に係る認識処理の基本的なアーキテクチャにおける読み出しおよび認識処理の第2の例を示す一例のタイムチャートである。It is an example time chart which shows the 2nd example of the read | recognition process in the basic architecture of the recognition process which concerns on each embodiment. イントラフレーム処理と、既存技術による処理とを比較する一例のタイムチャートである。It is an example time chart comparing the intraframe processing and the processing by the existing technology. イントラフレーム処理による各認識結果を模式的に示す図である。It is a figure which shows each recognition result by the intraframe processing schematically. 第1の実施形態に係る認識器の一例の構成を概略的に示す図である。It is a figure which shows typically the structure of the example of the recognizer which concerns on 1st Embodiment. 第1の実施形態に係る、認識結果をどのように出力するかを判定するタイミングの例を示す一例のタイムチャートである。It is an example time chart which shows the example of the timing which determines how to output the recognition result which concerns on 1st Embodiment. 第1の実施形態に係る前処理部のより詳細な機能を説明するための一例の機能ブロック図である。It is a functional block diagram of an example for demonstrating the more detailed function of the pre-processing part which concerns on 1st Embodiment. 第1の実施形態に係る認識部のより詳細な機能を説明するための一例の機能ブロック図である。It is a functional block diagram of an example for explaining the more detailed function of the recognition part which concerns on 1st Embodiment. 第1の実施形態に係る認識処理を示す一例のフローチャートである。It is an example flowchart which shows the recognition process which concerns on 1st Embodiment. 第1の実施形態の第1の変形例に係る認識処理を示す一例のフローチャートである。It is a flowchart of an example which shows the recognition process which concerns on the 1st modification of 1st Embodiment. 第1の実施形態の第1の変形例に係る認識処理を説明するための模式図である。It is a schematic diagram for demonstrating the recognition process which concerns on 1st modification of 1st Embodiment. 第1の実施形態の第1の変形例に係る認識処理による効果を説明するための一例のタイムチャートである。It is an example time chart for demonstrating the effect of the recognition process which concerns on the 1st modification of 1st Embodiment. 第1の実施形態の第2の変形例に係る認識処理を示す一例のフローチャートである。It is a flowchart of an example which shows the recognition process which concerns on the 2nd modification of 1st Embodiment. 第1の実施形態の第2の変形例に係る認識処理を説明するための模式図である。It is a schematic diagram for demonstrating the recognition process which concerns on the 2nd modification of 1st Embodiment. 第2の実施形態に係る情報処理装置の一例の構成を示すブロック図である。It is a block diagram which shows the structure of an example of the information processing apparatus which concerns on 2nd Embodiment. 第2の実施形態の変形例に係る情報処理装置の一例の構成を示すブロック図である。It is a block diagram which shows the structure of an example of the information processing apparatus which concerns on the modification of 2nd Embodiment. 第1の実施形態およびその各変形例、ならびに、第2の実施形態およびその変形例に係る情報処理装置を使用する使用例を示す図である。It is a figure which shows the 1st Embodiment and each modification | use example which uses the information processing apparatus which concerns on 2nd Embodiment and the modification. 車両制御システムの概略的な構成の一例を示すブロック図である。It is a block diagram which shows an example of the schematic structure of a vehicle control system. 車外情報検出部及び撮像部の設置位置の一例を示す説明図である。It is explanatory drawing which shows an example of the installation position of the vehicle exterior information detection unit and the image pickup unit.
 以下、本開示の実施形態について、図面に基づいて詳細に説明する。なお、以下の実施形態において、同一の部位には同一の符号を付することにより、重複する説明を省略する。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In the following embodiments, the same parts are designated by the same reference numerals, so that duplicate description will be omitted.
 以下、本開示の実施形態について、下記の順序に従って説明する。
1.各実施形態に適用可能な技術
 1-0.各実施形態に適用可能な認識処理の概略
 1-1.各実施形態に適用可能なハードウェア構成
  1-1-1.各実施形態に適用可能な撮像部の構成例
  1-1-2.撮像画像の解像度について
 1-2.各実施形態の前提となる認識処理の概略
  1-2-1.各実施形態の前提技術に係る構成
   1-2-1-1.各実施形態の前提技術に適用可能な構成の概略
   1-2-1-2.各実施形態の前提技術に係る認識処理の例
   1-2-1-3.各実施形態の前提技術に係るサブサンプリング処理について
  1-3.各実施形態に係る認識処理の基本的なアーキテクチャ
   1-3-1.より具体的な構成
    1-3-1-1.第1の例
    1-3-1-2.第2の例
2.第1の実施形態
 2-1.第1の実施形態の概要
 2-2.第1の実施形態に係るより具体的な構成例
 2-3.第1の実施形態に係るより具体的な処理
 2-4.第1の実施形態の第1の変形例
 2-5.第1の実施形態の第2の変形例
 2-6.第1の実施形態の他の変形例
3.第2の実施形態
 3-1.第2の実施形態の変形例
4.第3の実施形態
 4-1.本開示の技術の適用例
 4-2.移動体への適用例
Hereinafter, embodiments of the present disclosure will be described in the following order.
1. 1. Techniques applicable to each embodiment 1-0. Outline of recognition processing applicable to each embodiment 1-1. Hardware configuration applicable to each embodiment 1-1-1. Configuration example of the imaging unit applicable to each embodiment 1-1-2. About the resolution of the captured image 1-2. Outline of recognition processing that is the premise of each embodiment 1-2-1. Configuration related to the prerequisite technology of each embodiment 1-2-1-1. Outline of the configuration applicable to the prerequisite technology of each embodiment 1-2-1-2. Examples of recognition processing related to the prerequisite technology of each embodiment 1-2-1-3. Subsampling processing related to the prerequisite technology of each embodiment 1-3. Basic architecture of recognition processing according to each embodiment 1-3-1. More specific configuration 1-3-1-1. First example 1-3-1-2. Second example 2. First Embodiment 2-1. Outline of the first embodiment 2-2. More specific configuration example according to the first embodiment 2-3. More specific processing according to the first embodiment 2-4. First modification of the first embodiment 2-5. Second modification of the first embodiment 2-6. Another modification of the first embodiment 3. Second Embodiment 3-1. Modification example of the second embodiment 4. Third Embodiment 4-1. Application example of the technology of the present disclosure 4-2. Application example to mobile
[1.各実施形態に適用可能な技術]
 先ず、理解を容易とするために、各実施形態に適用可能な技術について、概略的に説明する。
[1. Technology applicable to each embodiment]
First, in order to facilitate understanding, the techniques applicable to each embodiment will be schematically described.
(1-0.各実施形態に適用可能な認識処理の概略)
 図1は、各実施形態に適用な情報処理装置の基本的な構成例を示すブロック図である。図1において、情報処理装置1aは、センサ部10aと、認識処理部20aと、を含む。図示は省略するが、センサ部10aは、撮像手段(カメラ)と、撮像手段を制御する撮像制御部と、を含む。
(1-0. Outline of recognition processing applicable to each embodiment)
FIG. 1 is a block diagram showing a basic configuration example of an information processing apparatus applicable to each embodiment. In FIG. 1, the information processing device 1a includes a sensor unit 10a and a recognition processing unit 20a. Although not shown, the sensor unit 10a includes an imaging means (camera) and an imaging control unit that controls the imaging means.
 センサ部10aは、撮像制御部の制御に従い撮像を行い、撮像により取得された撮像画像の画像データを認識処理部20aに供給する。認識処理部20aは、DNN(Deep Neural Network)を用いて、画像データに対する認識処理を行う。より具体的には、認識処理部20aは、機械学習により所定の教師データを用いて予め学習された認識モデルを含み、センサ部10aから供給された画像データに対して、当該認識モデルに基づきDNNを用いた認識処理を施す。認識処理部20aは、認識処理による認識結果を、例えば情報処理装置1aの外部に出力する。 The sensor unit 10a performs imaging under the control of the imaging control unit, and supplies the image data of the captured image acquired by the imaging to the recognition processing unit 20a. The recognition processing unit 20a uses DNN (Deep Neural Network) to perform recognition processing on image data. More specifically, the recognition processing unit 20a includes a recognition model pre-learned using predetermined teacher data by machine learning, and DNN based on the recognition model with respect to the image data supplied from the sensor unit 10a. Performs recognition processing using. The recognition processing unit 20a outputs the recognition result of the recognition processing to, for example, the outside of the information processing device 1a.
 図2Aおよび図2Bは、DNNによる認識処理の例を概略的に示す図である。この例では、図2Aに示されるように、1つの画像をDNNに入力する。DNNにおいて、入力された画像に対して認識処理が行われ、認識結果が出力される。 2A and 2B are diagrams schematically showing an example of recognition processing by DNN. In this example, one image is input to the DNN as shown in FIG. 2A. In DNN, recognition processing is performed on the input image, and the recognition result is output.
 図2Bを用いて、図2Aの処理をより詳細に説明する。図2Bに示されるように、DNNは、特徴抽出処理と、認識処理とを実行する。DNNにおいて、入力された画像に対して特徴抽出処理により特徴量を抽出する。この特徴抽出処理は、例えばDNNのうちCNN(Convolutional Neural Network)を用いて行われる。また、DNNにおいて、抽出された特徴量に対して認識処理を実行し、認識結果を得る。 The process of FIG. 2A will be described in more detail with reference to FIG. 2B. As shown in FIG. 2B, the DNN executes the feature extraction process and the recognition process. In DNN, the feature amount is extracted from the input image by the feature extraction process. This feature extraction process is performed using, for example, CNN (Convolutional Neural Network) of DNN. Further, in the DNN, the recognition process is executed on the extracted feature amount, and the recognition result is obtained.
 DNNにおいて、時系列の情報を用いて認識処理を実行することができる。図3Aおよび図3Bは、時系列の情報を用いた場合の、DNNによる識別処理の例を概略的に示す図である。この図3Aおよび図3Bの例では、時系列上の、固定数の過去情報を用いて、DNNによる識別処理を行う。図3Aの例では、時間Tの画像[T]と、時間Tより前の時間T-1の画像[T-1]と、時間T-1より前の時間T-2の画像[T-2]と、をDNNに入力する。DNNにおいて、入力された各画像[T]、[T-1]および[T-2]に対して識別処理を実行し、時間Tにおける認識結果[T]を得る。 In DNN, recognition processing can be executed using time-series information. 3A and 3B are diagrams schematically showing an example of identification processing by DNN when time series information is used. In the examples of FIGS. 3A and 3B, identification processing by DNN is performed using a fixed number of past information on the time series. In the example of FIG. 3A, the image of the time T [T], the image of the time T-1 before the time T [T-1], and the image of the time T-2 before the time T-1 [T-2]. ] And is input to DNN. In the DNN, the identification process is executed for each of the input images [T], [T-1] and [T-2], and the recognition result [T] at the time T is obtained.
 図3Bは、図3Aの処理をより詳細に説明するための図である。図3Bに示されるように、DNNにおいて、入力された画像[T]、[T-1]および[T-2]それぞれに対して、上述の図2Bを用いて説明した特徴抽出処理を1対1に実行し、画像[T]、[T-1]および[T-2]にそれぞれ対応する特徴量を抽出する。DNNでは、これら画像[T]、[T-1]および[T-2]に基づき得られた各特徴量を統合し、統合された特徴量に対して識別処理を実行し、時間Tにおける認識結果[T]を得る。画像[T]、[T-1]および[T-2]に基づき得られた各特徴量は、認識処理に用いる、統合された特徴量を得るための中間データであるといえる。 FIG. 3B is a diagram for explaining the process of FIG. 3A in more detail. As shown in FIG. 3B, in DNN, for each of the input images [T], [T-1], and [T-2], a pair of feature extraction processes described with reference to FIG. 2B described above is performed. 1 is executed, and the feature quantities corresponding to the images [T], [T-1] and [T-2] are extracted. In DNN, each feature amount obtained based on these images [T], [T-1] and [T-2] is integrated, identification processing is executed for the integrated feature amount, and recognition at time T is performed. The result [T] is obtained. It can be said that each feature amount obtained based on the images [T], [T-1] and [T-2] is intermediate data for obtaining an integrated feature amount used in the recognition process.
 図4Aおよび図4Bは、時系列の情報を用いた場合の、DNNによる識別処理の別の例を概略的に示す図である。図4Aの例では、内部状態が時間T-1の状態に更新されたDNNに対して時間Tの画像[T]を入力し、時間Tにおける認識結果[T]を得ている。 4A and 4B are diagrams schematically showing another example of identification processing by DNN when time series information is used. In the example of FIG. 4A, the image [T] of the time T is input to the DNN whose internal state is updated to the state of the time T-1, and the recognition result [T] at the time T is obtained.
 図4Bは、図4Aの処理をより詳細に説明するための図である。図4Bに示されるように、DNNにおいて、入力された時間Tの画像[T]に対して上述の図2Bを用いて説明した特徴抽出処理を実行し、画像[T]に対応する特徴量を抽出する。DNNにおいて、時間Tより前の画像により内部状態が更新され、更新された内部状態に係る特徴量が保存されている。この保存された内部情報に係る特徴量と、画像[T]における特徴量とを統合し、統合された特徴量に対して識別処理を実行する。この場合、保存された内部情報に係る特徴量、および、画像[T]における特徴量のそれぞれは、認識処理に用いる、統合された特徴量を得るための中間データであるといえる。 FIG. 4B is a diagram for explaining the process of FIG. 4A in more detail. As shown in FIG. 4B, in the DNN, the feature extraction process described with reference to FIG. 2B described above is executed on the input time T image [T], and the feature amount corresponding to the image [T] is obtained. Extract. In the DNN, the internal state is updated by the image before the time T, and the feature amount related to the updated internal state is stored. The feature amount related to the stored internal information and the feature amount in the image [T] are integrated, and the identification process is executed for the integrated feature amount. In this case, it can be said that each of the feature amount related to the stored internal information and the feature amount in the image [T] are intermediate data for obtaining the integrated feature amount used in the recognition process.
 この図4Aおよび図4Bに示す識別処理は、例えば直前の認識結果を用いて内部状態が更新されたDNNを用いて実行されるもので、再帰的な処理となる。このように、再帰的な処理を行うDNNをRNN(Recurrent Neural Network)と呼ぶ。RNNによる識別処理は、一般的には動画像認識などに用いられ、例えば時系列で更新されるフレーム画像によりDNNの内部状態を順次に更新することで、識別精度を向上させることが可能である。 The identification process shown in FIGS. 4A and 4B is executed using, for example, a DNN whose internal state has been updated using the immediately preceding recognition result, and is a recursive process. A DNN that performs recursive processing in this way is called an RNN (Recurrent Neural Network). The identification process by RNN is generally used for moving image recognition or the like, and it is possible to improve the identification accuracy by sequentially updating the internal state of the DNN by, for example, a frame image updated in time series. ..
(1-1.各実施形態に適用可能なハードウェア構成)
 図5は、各実施形態に適用可能な情報処理装置のハードウェア構成例を概略的に示すブロック図である。図5において、情報処理装置1は、それぞれバス1210を介して互いに通信可能に接続された、撮像部1200と、メモリ1202と、DSP(Digital Signal Processor)1203と、インタフェース(I/F)1204と、CPU(Central Processing Unit)1205と、ROM(Read Only Memory)1206と、RAM(Random Access Memory)1207と、を含む。情報処理装置1は、さらに、ユーザ操作を受け付ける入力デバイスと、ユーザに対して情報を表示するための表示デバイスと、データを不揮発に記憶するストレージ装置と、を含むことができる。
(1-1. Hardware configuration applicable to each embodiment)
FIG. 5 is a block diagram schematically showing a hardware configuration example of an information processing device applicable to each embodiment. In FIG. 5, the information processing apparatus 1 includes an imaging unit 1200, a memory 1202, a DSP (Digital Signal Processor) 1203, and an interface (I / F) 1204, which are communicatively connected to each other via a bus 1210. , CPU (Central Processing Unit) 1205, ROM (Read Only Memory) 1206, and RAM (Random Access Memory) 1207. The information processing device 1 can further include an input device that accepts user operations, a display device for displaying information to the user, and a storage device that non-volatilely stores data.
 CPU1205は、ROM1206に予め記憶されるプログラムに従い、RAM1207をワークメモリとして用いて動作し、この情報処理装置1の全体の動作を制御する。インタフェース1204は、有線あるいは無線通信により、当該情報処理装置1の外部と通信を行う。例えば、情報処理装置1が車載用途として用いられる場合、情報処理装置1は、当該情報処理装置1が搭載される車両の制動制御系などと、インタフェース1204を介して通信を行うことができる。 The CPU 1205 operates using the RAM 1207 as a work memory according to a program stored in the ROM 1206 in advance, and controls the overall operation of the information processing device 1. The interface 1204 communicates with the outside of the information processing device 1 by wire or wireless communication. For example, when the information processing device 1 is used for in-vehicle use, the information processing device 1 can communicate with the braking control system of the vehicle on which the information processing device 1 is mounted via the interface 1204.
 撮像部1200は、所定のフレーム周期で動画像の撮像を行い、フレーム画像を構成するための画素データを出力する。より具体的には、撮像部1200は、それぞれ受光した光を光電変換により電気信号である画素信号に変換する複数の光電変換素子と、各光電変換素子を駆動する駆動回路とを含む。撮像部1200において、複数の光電変換素子は、行列状の配列で配置され、画素アレイを構成する。 The imaging unit 1200 captures a moving image at a predetermined frame cycle and outputs pixel data for composing the frame image. More specifically, the imaging unit 1200 includes a plurality of photoelectric conversion elements that convert the received light into pixel signals that are electrical signals by photoelectric conversion, and a drive circuit that drives each photoelectric conversion element. In the imaging unit 1200, the plurality of photoelectric conversion elements are arranged in a matrix-like arrangement to form a pixel array.
 例えば図1のセンサ部10aは、撮像部1200を含み、撮像部1200から1フレーム周期内で出力された画素データを、1フレーム分の画像データとして出力する。 For example, the sensor unit 10a in FIG. 1 includes an image pickup unit 1200, and outputs pixel data output from the image pickup unit 1200 within one frame cycle as image data for one frame.
 ここで、光電変換素子のそれぞれは画像データにおける画素に対応し、画素アレイ部は、行×列の画素数として例えば1920画素×1080画素に対応する数の光電変換素子が行列状の配列で配置される。なお、この1920画素×1080画素に対応する数の光電変換素子による画素信号により1フレームの画像が形成される。 Here, each of the photoelectric conversion elements corresponds to a pixel in the image data, and in the pixel array unit, the number of photoelectric conversion elements corresponding to, for example, 1920 pixels × 1080 pixels as the number of pixels in rows × columns is arranged in a matrix. Will be done. An image of one frame is formed by pixel signals by a number of photoelectric conversion elements corresponding to 1920 pixels × 1080 pixels.
 光学部1201は、レンズやオートフォーカス機構などを含み、レンズに入射された光を撮像部1200が有する画素アレイ部に照射させる。撮像部1200は、光学部1201を介して画素アレイ部に照射された光に応じて、光電変換素子毎の画素信号を生成する。撮像部1200は、アナログ信号である画素信号をディジタル信号である画素データに変換して出力する。撮像部1200から出力された画素データは、メモリ1202に格納される。メモリ1202は、例えばフレームメモリであって、少なくとも1フレーム分の画素データを格納可能とされている。 The optical unit 1201 includes a lens, an autofocus mechanism, and the like, and irradiates the pixel array unit of the imaging unit 1200 with the light incident on the lens. The imaging unit 1200 generates a pixel signal for each photoelectric conversion element according to the light emitted to the pixel array unit via the optical unit 1201. The imaging unit 1200 converts a pixel signal, which is an analog signal, into pixel data, which is a digital signal, and outputs the signal. The pixel data output from the imaging unit 1200 is stored in the memory 1202. The memory 1202 is, for example, a frame memory, and is capable of storing pixel data for at least one frame.
 DSP1203は、メモリ1202に格納された画素データに対して所定の画像処理を施す。また、DSP1203は、予め学習された認識モデルを含み、メモリ1202に格納された画像データに対して、当該認識モデルに基づき、上述したDNNを用いた認識処理を行う。DSP1203による認識処理の結果である認識結果は、例えばDSP1203が備えるメモリや、RAM1207に一時的に記憶され、インタフェース1204から外部に出力される。これに限らず、情報処理装置1がストレージ装置を備える場合、認識結果を当該ストレージ装置に格納してもよい。 The DSP 1203 performs predetermined image processing on the pixel data stored in the memory 1202. Further, the DSP 1203 includes a recognition model learned in advance, and performs a recognition process using the above-mentioned DNN on the image data stored in the memory 1202 based on the recognition model. The recognition result, which is the result of the recognition process by the DSP 1203, is temporarily stored in, for example, the memory provided in the DSP 1203 or the RAM 1207, and is output from the interface 1204 to the outside. Not limited to this, when the information processing device 1 includes a storage device, the recognition result may be stored in the storage device.
 これに限らず、DSP1203の機能をCPU1205により実現してもよい。また、DSP1203の代わりにGPU(Graphics Processing Unit)を用いてもよい。 Not limited to this, the function of the DSP 1203 may be realized by the CPU 1205. Further, GPU (Graphics Processing Unit) may be used instead of DSP1203.
 撮像部1200は、撮像部1200に含まれる各部がCMOS(Complementary Metal Oxide Semiconductor)を用いて一体的に形成されたCMOSイメージセンサ(CIS)を適用することができる。撮像部1200は、1つの基板上に形成することができる。これに限らず、撮像部1200を、複数の半導体チップが積層され一体的に形成された積層型CISとしてもよい。なお、撮像部1200は、この例に限らず、赤外光による撮像を行う赤外光センサなど、他の種類の光センサであってもよい。 The image pickup unit 1200 can apply a CMOS image sensor (CIS) in which each part included in the image pickup unit 1200 is integrally formed by using CMOS (Complementary Metal Oxide Semiconductor). The imaging unit 1200 can be formed on one substrate. Not limited to this, the imaging unit 1200 may be a laminated CIS in which a plurality of semiconductor chips are laminated and integrally formed. The imaging unit 1200 is not limited to this example, and may be another type of optical sensor such as an infrared light sensor that performs imaging with infrared light.
 一例として、撮像部1200を半導体チップを2層に積層した2層構造の積層型CISにより形成することができる。図6Aは、撮像部1200を2層構造の積層型CISにより形成した例を示す図である。図6Aの構造では、第1層の半導体チップに画素部2020aを形成し、第2層の半導体チップにメモリ+ロジック部2020bを形成している。画素部2020aは、少なくとも撮像部1200における画素アレイ部を含む。メモリ+ロジック部2020bは、例えば、画素アレイ部を駆動するための駆動回路を含む。メモリ+ロジック部2020bに、さらに、メモリ1202を含ませることもできる。 As an example, the imaging unit 1200 can be formed by a two-layer structure laminated CIS in which semiconductor chips are laminated in two layers. FIG. 6A is a diagram showing an example in which the imaging unit 1200 is formed by a two-layer structure laminated CIS. In the structure of FIG. 6A, the pixel portion 2020a is formed on the semiconductor chip of the first layer, and the memory + logic portion 2020b is formed on the semiconductor chip of the second layer. The pixel unit 2020a includes at least a pixel array unit in the imaging unit 1200. The memory + logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit. The memory + logic unit 2020b can further include the memory 1202.
 図6Aの右側に示されるように、第1層の半導体チップと、第2層の半導体チップとを電気的に接触させつつ貼り合わせることで、撮像部1200を1つの固体撮像素子として構成する。 As shown on the right side of FIG. 6A, the image pickup unit 1200 is configured as one solid-state image pickup element by bonding the semiconductor chip of the first layer and the semiconductor chip of the second layer while electrically contacting each other.
 別の例として、撮像部1200を、半導体チップを3層に積層した3層構造により形成することができる。図6Bは、撮像部1200を3層構造の積層型CISにより形成した例を示す図である。図6Bの構造では、第1層の半導体チップに画素部2020aを形成し、第2層の半導体チップにメモリ部2020cを形成し、第3層の半導体チップにロジック部2020dを形成している。この場合、ロジック部2020dは、例えば画素アレイ部を駆動するための駆動回路を含む。また、メモリ部2020cは、フレームメモリやメモリ1202を含むことができる。 As another example, the imaging unit 1200 can be formed by a three-layer structure in which semiconductor chips are laminated in three layers. FIG. 6B is a diagram showing an example in which the imaging unit 1200 is formed by a laminated CIS having a three-layer structure. In the structure of FIG. 6B, the pixel portion 2020a is formed on the semiconductor chip of the first layer, the memory portion 2020c is formed on the semiconductor chip of the second layer, and the logic portion 2020d is formed on the semiconductor chip of the third layer. In this case, the logic unit 2020d includes, for example, a drive circuit for driving the pixel array unit. Further, the memory unit 2020c can include a frame memory and a memory 1202.
 図6Bの右側に示されるように、第1層の半導体チップと、第2層の半導体チップと、第3層の半導体チップとを電気的に接触させつつ貼り合わせることで、撮像部1200を1つの固体撮像素子として構成する。 As shown on the right side of FIG. 6B, the image pickup unit 1200 is attached by electrically contacting the semiconductor chip of the first layer, the semiconductor chip of the second layer, and the semiconductor chip of the third layer. It is configured as one solid-state image sensor.
 なお、図6Aおよび図6Bの構成において、メモリ+ロジック部2020bに、図5に示したDSP1203、インタフェース1204、CPU1205、ROM1206およびRAM1207に相当する構成を含ませることも可能である。 In the configurations of FIGS. 6A and 6B, the memory + logic unit 2020b may include configurations corresponding to the DSP 1203, the interface 1204, the CPU 1205, the ROM 1206, and the RAM 1207 shown in FIG.
(1-1-1.各実施形態に適用可能な撮像部の構成例)
 図7は、各実施形態に適用可能な撮像部1200の一例の構成を示すブロック図である。図7において、撮像部1200は、画素アレイ部1001と、垂直走査部1002と、AD(Analog to Digital)変換部1003と、画素信号線1006と、垂直信号線VSLと、制御部1100と、信号処理部1101と、を含む。なお、図7において、制御部1100および信号処理部1101は、例えば図5に示したCPU1205およびDSP1203にて実現することもできる。
(1-1-1. Configuration example of imaging unit applicable to each embodiment)
FIG. 7 is a block diagram showing a configuration of an example of the imaging unit 1200 applicable to each embodiment. In FIG. 7, the imaging unit 1200 includes a pixel array unit 1001, a vertical scanning unit 1002, an AD (Analog to Digital) conversion unit 1003, a pixel signal line 1006, a vertical signal line VSL, a control unit 1100, and a signal. The processing unit 1101 and the like are included. Note that, in FIG. 7, the control unit 1100 and the signal processing unit 1101 can also be realized by, for example, the CPU 1205 and the DSP 1203 shown in FIG.
 画素アレイ部1001は、それぞれ受光した光に対して光電変換を行う、例えばフォトダイオードによる光電変換素子と、光電変換素子から電荷の読み出しを行う回路と、を含む複数の画素回路1000を含む。画素アレイ部1001において、複数の画素回路1000は、水平方向(行方向)および垂直方向(列方向)に行列状の配列で配置される。画素アレイ部1001において、画素回路1000の行方向の並びをラインと呼ぶ。例えば、1920画素×1080ラインで1フレームの画像が形成される場合、画素アレイ部1001は、少なくとも1920個の画素回路1000が含まれるラインを、少なくとも1080ライン、含む。フレームに含まれる画素回路1000から読み出された画素信号により、1フレームの画像(画像データ)が形成される。 The pixel array unit 1001 includes a plurality of pixel circuits 1000 including, for example, a photoelectric conversion element using a photodiode and a circuit for reading out charges from the photoelectric conversion element, each of which performs photoelectric conversion on the received light. In the pixel array unit 1001, the plurality of pixel circuits 1000 are arranged in a matrix in the horizontal direction (row direction) and the vertical direction (column direction). In the pixel array unit 1001, the arrangement in the row direction of the pixel circuit 1000 is called a line. For example, when an image of one frame is formed by 1920 pixels × 1080 lines, the pixel array unit 1001 includes at least 1080 lines including lines including at least 1920 pixel circuits 1000. An image (image data) of one frame is formed by the pixel signal read from the pixel circuit 1000 included in the frame.
 また、画素アレイ部1001には、各画素回路1000の行および列に対し、行毎に画素信号線1006が接続され、列毎に垂直信号線VSLが接続される。画素信号線1006の画素アレイ部1001と接続されない端部は、垂直走査部1002に接続される。垂直走査部1002は、後述する制御部1100の制御に従い、画素から画素信号を読み出す際の駆動パルスなどの制御信号を、画素信号線1006を介して画素アレイ部1001へ伝送する。垂直信号線VSLの画素アレイ部1001と接続されない端部は、AD変換部1003に接続される。画素から読み出された画素信号は、垂直信号線VSLを介してAD変換部1003に伝送される。 Further, in the pixel array unit 1001, the pixel signal line 1006 is connected to each row and column of each pixel circuit 1000, and the vertical signal line VSL is connected to each column. The end of the pixel signal line 1006 that is not connected to the pixel array unit 1001 is connected to the vertical scanning unit 1002. The vertical scanning unit 1002 transmits a control signal such as a drive pulse when reading a pixel signal from a pixel to the pixel array unit 1001 via the pixel signal line 1006 in accordance with the control of the control unit 1100 described later. The end portion of the vertical signal line VSL that is not connected to the pixel array unit 1001 is connected to the AD conversion unit 1003. The pixel signal read from the pixel is transmitted to the AD conversion unit 1003 via the vertical signal line VSL.
 画素回路1000からの画素信号の読み出し制御について、概略的に説明する。画素回路1000からの画素信号の読み出しは、露出により光電変換素子に蓄積された電荷を浮遊拡散層(FD;Floating Diffusion)に転送し、浮遊拡散層において転送された電荷を電圧に変換することで行う。浮遊拡散層において電荷が変換された電圧は、画素信号としてアンプを介して垂直信号線VSLに出力される。 The reading control of the pixel signal from the pixel circuit 1000 will be schematically described. The pixel signal is read out from the pixel circuit 1000 by transferring the charge accumulated in the photoelectric conversion element due to exposure to the floating diffusion layer (FD) and converting the electric charge transferred in the floating diffusion layer into a voltage. conduct. The voltage at which the charge is converted in the floating diffusion layer is output as a pixel signal to the vertical signal line VSL via an amplifier.
 より具体的には、画素回路1000において、露出中は、光電変換素子と浮遊拡散層との間をオフ(開)状態として、光電変換素子において、光電変換により入射された光に応じて生成された電荷を蓄積させる。露出終了後、画素信号線1006を介して供給される選択信号に応じて浮遊拡散層と垂直信号線VSLとを接続する。さらに、画素信号線1006を介して供給されるリセットパルスに応じて浮遊拡散層を電源電圧VDDまたは黒レベル電圧の供給線と短期間において接続し、浮遊拡散層をリセットする。垂直信号線VSLには、浮遊拡散層のリセットレベルの電圧(電圧Aとする)が出力される。その後、画素信号線1006を介して供給される転送パルスにより光電変換素子と浮遊拡散層との間をオン(閉)状態として、光電変換素子に蓄積された電荷を浮遊拡散層に転送する。垂直信号線VSLに対して、浮遊拡散層の電荷量に応じた電圧(電圧Bとする)が出力される。 More specifically, in the pixel circuit 1000, during exposure, the space between the photoelectric conversion element and the floating diffusion layer is turned off (open), and the photoelectric conversion element is generated according to the light incidented by the photoelectric conversion. Accumulates electric charge. After the end of exposure, the floating diffusion layer and the vertical signal line VSL are connected according to the selection signal supplied via the pixel signal line 1006. Further, the floating diffusion layer is connected to the supply line of the power supply voltage VDD or the black level voltage in a short period of time according to the reset pulse supplied via the pixel signal line 1006 to reset the floating diffusion layer. The reset level voltage (assumed to be voltage A) of the floating diffusion layer is output to the vertical signal line VSL. After that, the transfer pulse supplied via the pixel signal line 1006 turns the photoelectric conversion element and the floating diffusion layer into an on (closed) state, and transfers the electric charge accumulated in the photoelectric conversion element to the floating diffusion layer. A voltage (referred to as voltage B) corresponding to the amount of electric charge of the floating diffusion layer is output to the vertical signal line VSL.
 AD変換部1003は、垂直信号線VSL毎に設けられたAD変換器1007と、参照信号生成部1004と、水平走査部1005と、を含む。AD変換器1007は、画素アレイ部1001の各列(カラム)に対してAD変換処理を行うカラムAD変換器である。AD変換器1007は、垂直信号線VSLを介して画素回路1000から供給された画素信号に対してAD変換処理を施し、ノイズ低減を行う相関二重サンプリング(CDS:Correlated Double Sampling)処理のための2つのディジタル値(電圧Aおよび電圧Bにそれぞれ対応する値)を生成する。 The AD conversion unit 1003 includes an AD converter 1007 provided for each vertical signal line VSL, a reference signal generation unit 1004, and a horizontal scanning unit 1005. The AD converter 1007 is a column AD converter that performs AD conversion processing on each column of the pixel array unit 1001. The AD converter 1007 performs AD conversion processing on the pixel signal supplied from the pixel circuit 1000 via the vertical signal line VSL to reduce noise, and is used for correlated double sampling (CDS: Correlated Double Sampling) processing. Two digital values (values corresponding to voltage A and voltage B, respectively) are generated.
 AD変換器1007は、生成した2つのディジタル値を信号処理部1101に供給する。信号処理部1101は、AD変換器1007から供給される2つのディジタル値に基づきCDS処理を行い、ディジタル信号による画素信号である画素データを生成する。 The AD converter 1007 supplies the two generated digital values to the signal processing unit 1101. The signal processing unit 1101 performs CDS processing based on the two digital values supplied from the AD converter 1007, and generates pixel data which is a pixel signal by the digital signal.
 参照信号生成部1004は、制御部1100から入力される制御信号に基づき、各AD変換器1007が画素信号を2つのディジタル値に変換するために用いるランプ信号を参照信号として生成する。ランプ信号は、レベル(電圧値)が時間に対して一定の傾きで低下する信号、または、レベルが階段状に低下する信号である。参照信号生成部1004は、生成したランプ信号を、各AD変換器1007に供給する。参照信号生成部1004は、例えばDAC(Digital to Analog Converter)などを用いて構成される。 The reference signal generation unit 1004 generates a lamp signal as a reference signal, which is used by each AD converter 1007 to convert the pixel signal into two digital values, based on the control signal input from the control unit 1100. The lamp signal is a signal whose level (voltage value) decreases with a constant slope with respect to time, or a signal whose level decreases stepwise. The reference signal generation unit 1004 supplies the generated lamp signal to each AD converter 1007. The reference signal generation unit 1004 is configured by using, for example, a DAC (Digital to Analog Converter) or the like.
 参照信号生成部1004から、所定の傾斜に従い階段状に電圧が降下するランプ信号が供給されると、カウンタによりクロック信号に従いカウントが開始される。コンパレータは、垂直信号線VSLから供給される画素信号の電圧と、ランプ信号の電圧とを比較して、ランプ信号の電圧が画素信号の電圧を跨いだタイミングでカウンタによるカウントを停止させる。AD変換器1007は、カウントが停止された時間のカウント値に応じた値を出力することで、アナログ信号による画素信号を、ディジタル値に変換する。 When the reference signal generation unit 1004 supplies a lamp signal whose voltage drops stepwise according to a predetermined inclination, the counter starts counting according to the clock signal. The comparator compares the voltage of the pixel signal supplied from the vertical signal line VSL with the voltage of the lamp signal, and stops the counting by the counter at the timing when the voltage of the lamp signal straddles the voltage of the pixel signal. The AD converter 1007 converts the pixel signal of the analog signal into a digital value by outputting a value corresponding to the count value of the time when the count is stopped.
 AD変換器1007は、生成した2つのディジタル値を信号処理部1101に供給する。信号処理部1101は、AD変換器1007から供給される2つのディジタル値に基づきCDS処理を行い、ディジタル信号による画素信号(画素データ)を生成する。信号処理部1101により生成された画素データは、図示されないフレームメモリに格納され、1フレーム分の画素データが当該フレームメモリに格納されると、1フレームの画像データとして撮像部1200から出力される。 The AD converter 1007 supplies the two generated digital values to the signal processing unit 1101. The signal processing unit 1101 performs CDS processing based on the two digital values supplied from the AD converter 1007, and generates a pixel signal (pixel data) based on the digital signal. The pixel data generated by the signal processing unit 1101 is stored in a frame memory (not shown), and when the pixel data for one frame is stored in the frame memory, the image data is output from the imaging unit 1200 as one frame of image data.
 水平走査部1005は、制御部1100の制御の下、各AD変換器1007を所定の順番で選択する選択走査を行うことによって、各AD変換器1007が一時的に保持している各ディジタル値を信号処理部1101へ順次出力させる。水平走査部1005は、例えばシフトレジスタやアドレスデコーダなどを用いて構成される。 Under the control of the control unit 1100, the horizontal scanning unit 1005 performs selective scanning in which the AD converters 1007 are selected in a predetermined order to temporarily hold each digital value of the AD converters 1007. The signal processing unit 1101 is sequentially output. The horizontal scanning unit 1005 is configured by using, for example, a shift register or an address decoder.
 制御部1100は、センサ制御部11から供給される撮像制御信号に従い、垂直走査部1002、AD変換部1003、参照信号生成部1004および水平走査部1005などの駆動制御を行う。制御部1100は、垂直走査部1002、AD変換部1003、参照信号生成部1004および水平走査部1005の動作の基準となる各種の駆動信号を生成する。制御部1100は、例えば、撮像制御信号に含まれる垂直同期信号または外部トリガ信号と、水平同期信号とに基づき、垂直走査部1002が画素信号線1006を介して各画素回路1000に供給するための制御信号を生成する。制御部1100は、生成した制御信号を垂直走査部1002に供給する。 The control unit 1100 performs drive control of the vertical scanning unit 1002, the AD conversion unit 1003, the reference signal generation unit 1004, the horizontal scanning unit 1005, and the like according to the imaging control signal supplied from the sensor control unit 11. The control unit 1100 generates various drive signals that serve as a reference for the operations of the vertical scanning unit 1002, the AD conversion unit 1003, the reference signal generation unit 1004, and the horizontal scanning unit 1005. The control unit 1100 supplies the vertical scanning unit 1002 to each pixel circuit 1000 via the pixel signal line 1006 based on, for example, a vertical synchronization signal or an external trigger signal included in the imaging control signal and a horizontal synchronization signal. Generate a control signal. The control unit 1100 supplies the generated control signal to the vertical scanning unit 1002.
 また、制御部1100は、例えば、CPU1205から供給される撮像制御信号に含まれる、アナログゲインを示す情報をAD変換部1003に渡す。AD変換部1003は、このアナログゲインを示す情報に応じて、AD変換部1003に含まれる各AD変換器1007に垂直信号線VSLを介して入力される画素信号のゲインを制御する。 Further, the control unit 1100 passes, for example, information indicating an analog gain included in the image pickup control signal supplied from the CPU 1205 to the AD conversion unit 1003. The AD conversion unit 1003 controls the gain of the pixel signal input to each AD converter 1007 included in the AD conversion unit 1003 via the vertical signal line VSL according to the information indicating the analog gain.
 垂直走査部1002は、制御部1100から供給される制御信号に基づき、画素アレイ部1001の選択された画素行の画素信号線1006に駆動パルスを含む各種信号を、ライン毎に各画素回路1000に供給し、各画素回路1000から、画素信号を垂直信号線VSLに出力させる。垂直走査部1002は、例えばシフトレジスタやアドレスデコーダなどを用いて構成される。また、垂直走査部1002は、制御部1100から供給される露出を示す情報に応じて、各画素回路1000における露出を制御する。 Based on the control signal supplied from the control unit 1100, the vertical scanning unit 1002 transmits various signals including a drive pulse to the pixel signal line 1006 of the selected pixel line of the pixel array unit 1001 to each pixel circuit 1000 for each line. It is supplied, and the pixel signal is output from each pixel circuit 1000 to the vertical signal line VSL. The vertical scanning unit 1002 is configured by using, for example, a shift register or an address decoder. Further, the vertical scanning unit 1002 controls the exposure in each pixel circuit 1000 according to the information indicating the exposure supplied from the control unit 1100.
 このように構成された撮像部1200は、AD変換器1007が列毎に配置されたカラムAD方式のCMOS(Complementary Metal Oxide Semiconductor)イメージセンサである。 The imaging unit 1200 configured in this way is a column AD type CMOS (Complementary Metal Oxide Semiconductor) image sensor in which AD converters 1007 are arranged for each column.
(1-1-2.撮像画像の解像度について)
 次に、図8Aおよび図8Bを用いて、認識処理に用いる画像の解像度について説明する。図8Aおよび図8Bは、同一の撮像範囲を、それぞれ低解像度の撮像装置、および、高解像度の撮像装置を用いて撮像した場合の撮像画像30aおよび30bの例を模式的に示す図である。図8Aおよび図8Bに示される撮像範囲は、中央部に、撮像装置からある程度離れた位置に「人」が含まれる。認識処理により、この対象物としての「人」を認識する場合について考える。
(1-1-2. Resolution of captured image)
Next, the resolution of the image used for the recognition process will be described with reference to FIGS. 8A and 8B. 8A and 8B are diagrams schematically showing examples of captured images 30a and 30b when the same imaging range is captured by using a low-resolution imaging device and a high-resolution imaging device, respectively. The imaging range shown in FIGS. 8A and 8B includes a "person" in the central portion at a position somewhat distant from the imaging apparatus. Consider the case of recognizing a "person" as an object by the recognition process.
 図8Aの低解像度の例では、撮像画像30aに含まれる「人」の認識が困難であり、認識処理による「人」の認識性能が極めて低いものとなる。一方、図8Bの高解像度の例では、撮像画像30bに含まれる「人」の認識が容易であり、認識された「人」が認識結果40として得られており、図8Aの低解像度の例と比較して、認識性能が高いものとなっている。 In the low resolution example of FIG. 8A, it is difficult to recognize the "person" included in the captured image 30a, and the recognition performance of the "person" by the recognition process is extremely low. On the other hand, in the high-resolution example of FIG. 8B, the "person" included in the captured image 30b is easily recognized, and the recognized "person" is obtained as the recognition result 40. Compared with, the recognition performance is high.
 一方で、高解像度の画像に対する認識処理は、低解像度の画像に対する認識処理と比較して計算量が多くなり、処理に時間を要する。そのため、認識結果と撮像画像との同時性を高めることが困難となる。これに対して、低解像度の画像に対する認識処理は、計算量が少なくて済むため、短時間で処理が可能であり、撮像画像との同時性を比較的容易に高めることが可能である。 On the other hand, the recognition process for a high-resolution image requires a large amount of calculation as compared with the recognition process for a low-resolution image, and the processing takes time. Therefore, it is difficult to improve the simultaneity between the recognition result and the captured image. On the other hand, the recognition process for a low-resolution image requires a small amount of calculation, so that the process can be performed in a short time, and the simultaneity with the captured image can be relatively easily increased.
 一例として、車載の撮像装置において撮像された撮像画像に基づき認識処理を行う場合を考える。この場合、遠方の対象物(例えば対向車線を自車の進行方向と逆方向に走行する対向車)を高い同時性で認識する必要があるため、低解像度の画像に対する認識処理を行うとが考えられる。しかしながら、図8Aを用いて説明したように、低解像度の撮像画像を用いた場合は、遠方の対象物の認識が困難である。また、高解像度の撮像画像を用いた場合は、遠方の対象物の認識は比較的容易となるが、撮像画像に対する同時性を高めることが困難であり、危急の事態に対応できない可能性がある。 As an example, consider a case where recognition processing is performed based on an image captured by an in-vehicle image pickup device. In this case, since it is necessary to recognize a distant object (for example, an oncoming vehicle traveling in the opposite lane in the direction opposite to the traveling direction of the own vehicle) with high simultaneity, it is considered that recognition processing is performed on a low-resolution image. Be done. However, as described with reference to FIG. 8A, it is difficult to recognize a distant object when a low-resolution captured image is used. Further, when a high-resolution captured image is used, it is relatively easy to recognize a distant object, but it is difficult to improve the simultaneity with the captured image, and there is a possibility that it cannot respond to an emergency situation. ..
 本開示の各実施形態では、遠方の対象物を容易且つ高速に認識可能とするために、高解像度の撮像画像を所定の規則に従ったサブサンプリングにより間引きした画素によるサンプリング画像に対して認識処理を行う。次のフレームで取得された撮像画像に対し、直前の撮像画像に対するサブサンプリングとは異なる画素のサンプリングを行い、サンプリングした画素によるサンプリング画像に対して認識処理を行う。 In each embodiment of the present disclosure, in order to make it possible to recognize a distant object easily and at high speed, a recognition process is performed on a sampled image by pixels obtained by thinning out a high-resolution captured image by subsampling according to a predetermined rule. I do. The captured image acquired in the next frame is sampled with pixels different from the subsampling of the immediately preceding captured image, and the sampled image by the sampled pixels is recognized.
 この、第1の撮像画像に対して時系列で次に取得される第2の撮像画像において、第1の撮像画像とは異なる画素をサンプリングしたサンプリング画像に対して認識処理を行う動作を、フレーム単位で繰り返し実行する。これにより、高解像度の撮像画像を用いつつ、高速に認識結果を取得することが可能となる。また、認識処理を行う際に抽出した特徴量を、順次、次のサンプリング画像に対する認識処理において抽出される特徴量に対して統合していくことで、より高精度の認識結果を取得できる。 In the second captured image acquired next in chronological order with respect to the first captured image, the operation of performing recognition processing on the sampled image obtained by sampling pixels different from the first captured image is performed as a frame. Repeat in units. This makes it possible to acquire recognition results at high speed while using a high-resolution captured image. Further, by sequentially integrating the feature amount extracted during the recognition process with the feature amount extracted in the recognition process for the next sampled image, a more accurate recognition result can be obtained.
(1-2.各実施形態の前提となる認識処理の概略)
 次に、本開示の各実施形態の前提となる認識処理技術(以下、前提技術)について概略的に説明する。
(1-2. Outline of recognition processing that is a premise of each embodiment)
Next, the recognition processing technology (hereinafter referred to as the prerequisite technology) that is the premise of each embodiment of the present disclosure will be schematically described.
(1-2-1.各実施形態の前提技術に係る構成)
(1-2-1-1.各実施形態の前提技術に適用可能な構成の概略)
 図9は、本開示の各実施形態の前提技術に係る情報処理装置の一例の構成を示すブロック図である。図9において、情報処理装置1bは、センサ部10bと、認識処理部20bと、を含む。図示は省略するが、センサ部10bは、図1を用いて説明したセンサ部10aと同様に、撮像手段(カメラ)と、撮像手段を制御する撮像制御部と、を含む。この撮像手段は、高解像度(例えば1920画素×1080画素)で撮像を行うものとする。センサ部10bは、撮像手段により撮像された撮像画像の画像データを認識処理部20bに供給する。
(1-2-1. Configuration related to the prerequisite technology of each embodiment)
(1-2-1-1. Outline of configuration applicable to the prerequisite technology of each embodiment)
FIG. 9 is a block diagram showing a configuration of an example of an information processing device according to the prerequisite technology of each embodiment of the present disclosure. In FIG. 9, the information processing device 1b includes a sensor unit 10b and a recognition processing unit 20b. Although not shown, the sensor unit 10b includes an imaging means (camera) and an imaging control unit that controls the imaging means, similarly to the sensor unit 10a described with reference to FIG. This imaging means shall perform imaging at a high resolution (for example, 1920 pixels × 1080 pixels). The sensor unit 10b supplies the image data of the captured image captured by the imaging means to the recognition processing unit 20b.
 認識処理部20bは、前処理部210と認識部220とを含む。センサ部10bから認識処理部20bに供給された画像データは、前処理部210に入力される。前処理部210は、入力された画像データに対して、所定の規則に従い画素を間引いてサブサンプリングを行う。画像データがサブサンプリングされたサンプリング画像は、認識部220に入力される。 The recognition processing unit 20b includes a pre-processing unit 210 and a recognition unit 220. The image data supplied from the sensor unit 10b to the recognition processing unit 20b is input to the preprocessing unit 210. The preprocessing unit 210 performs subsampling on the input image data by thinning out the pixels according to a predetermined rule. The sampled image in which the image data is subsampled is input to the recognition unit 220.
 認識部220は、図1の認識処理部20aと同様に、DNNを用いて、画像データに対する認識処理を行う。より具体的には、認識処理部20aは、機械学習により所定の教師データを用いて予め学習された認識モデルを含み、センサ部10aから供給された画像データに対して、当該認識モデルに基づきDNNを用いた認識処理を施す。このとき、教師データとしては、前処理部210と同様にしてサブサンプリングされたサンプリング画像を用いる。 The recognition unit 220 uses the DNN to perform recognition processing on the image data in the same manner as the recognition processing unit 20a in FIG. More specifically, the recognition processing unit 20a includes a recognition model pre-learned using predetermined teacher data by machine learning, and DNN based on the recognition model with respect to the image data supplied from the sensor unit 10a. Performs recognition processing using. At this time, as the teacher data, a sampled image subsampled in the same manner as the preprocessing unit 210 is used.
 認識部220は、認識処理による認識結果を、例えば情報処理装置1bの外部に出力する。 The recognition unit 220 outputs the recognition result of the recognition process to, for example, the outside of the information processing device 1b.
(1-2-1-2.各実施形態の前提技術に係る認識処理の例)
 図10は、各実施形態の前提技術に係る認識器による認識処理を説明するための模式図である。図10において示される認識器は、例えば認識処理部20bに対応する。画像データ32は、センサ部10bで撮像された撮像画像による1フレームの画像データを概略的に示している。画像データ32は、行列状に配列された複数の画素300を含む。画像データ32は、認識処理部20bにおいて、前処理部210に入力される。前処理部210は、画像データ32に対して、所定の規則に従った間引きによりサブサンプリングを行う(ステップS10)。
(1-2-1-2. Example of recognition processing related to the prerequisite technology of each embodiment)
FIG. 10 is a schematic diagram for explaining the recognition process by the recognizer according to the prerequisite technology of each embodiment. The recognizer shown in FIG. 10 corresponds to, for example, the recognition processing unit 20b. The image data 32 schematically shows one frame of image data based on the captured image captured by the sensor unit 10b. The image data 32 includes a plurality of pixels 300 arranged in a matrix. The image data 32 is input to the preprocessing unit 210 in the recognition processing unit 20b. The preprocessing unit 210 subsamples the image data 32 by thinning out according to a predetermined rule (step S10).
 サブサンプリングされたサンプリング画素によるサンプリング画像は、認識部220に入力される。認識部220は、DNNにより、入力されたサンプリング画像の特徴量を抽出する(ステップS11)。ここでは、認識部220は、DNNのうちCNNを用いて特徴量の抽出を行う。 The sampled image by the sub-sampled sampling pixels is input to the recognition unit 220. The recognition unit 220 extracts the feature amount of the input sampled image by DNN (step S11). Here, the recognition unit 220 extracts the feature amount using CNN among the DNNs.
 認識部220は、ステップS11で抽出された特徴量を、図示されない蓄積部(例えばRAM1207)に格納する。このとき、認識部220は、例えば直前のフレームにおいて抽出された特徴量が既に蓄積部に格納されている場合、メモリに格納されている特徴量を再帰的に用いて、抽出した特徴量と統合する(ステップS12)。認識部220は、直前のフレームまでにおいて抽出された特徴量を蓄積部に格納し、蓄積、統合する。すなわち、このステップS12での処理は、DNNのうちRNNを用いた処理に相当する。 The recognition unit 220 stores the feature amount extracted in step S11 in a storage unit (for example, RAM 1207) (not shown). At this time, for example, when the feature amount extracted in the immediately preceding frame is already stored in the storage unit, the recognition unit 220 recursively uses the feature amount stored in the memory and integrates it with the extracted feature amount. (Step S12). The recognition unit 220 stores, stores, and integrates the feature quantities extracted up to the immediately preceding frame in the storage unit. That is, the process in step S12 corresponds to the process using the RNN of the DNN.
 認識部220は、ステップS12で蓄積、統合された特徴量に基づき認識処理を実行する(ステップS13)。 The recognition unit 220 executes the recognition process based on the features accumulated and integrated in step S12 (step S13).
 ここで、ステップS10における前処理部210によるサブサンプリング処理について、より詳細に説明する。図11は、各実施形態の前提技術に係るサンプリング処理を説明するための模式図である。図11において、セクション(a)は、画像データ32の例を模式的に示している。上述したように、画像データ32は、行列状に配列された複数の画素300を含む。前処理部210は、画像データ32を、2以上の画素300を含む分割領域35に分割する。図11の例では、分割領域35は、サイズが4画素×4画素の領域とされ、16個の画素300を含む。 Here, the subsampling process by the preprocessing unit 210 in step S10 will be described in more detail. FIG. 11 is a schematic diagram for explaining the sampling process according to the prerequisite technique of each embodiment. In FIG. 11, section (a) schematically shows an example of image data 32. As described above, the image data 32 includes a plurality of pixels 300 arranged in a matrix. The preprocessing unit 210 divides the image data 32 into a division region 35 including two or more pixels 300. In the example of FIG. 11, the divided region 35 is a region having a size of 4 pixels × 4 pixels, and includes 16 pixels 300.
 前処理部210は、この分割領域35に対して、分割領域35に含まれる各画素300からサブサンプリングによるサンプリング画素を選択するための画素位置を設定する。また、前処理部210は、フレーム毎に異なる画素位置を、サンプリング画素を選択するための画素位置として設定する。 The preprocessing unit 210 sets a pixel position for selecting a sampling pixel by subsampling from each pixel 300 included in the division area 35 with respect to the division area 35. Further, the preprocessing unit 210 sets a pixel position different for each frame as a pixel position for selecting a sampling pixel.
 図11のセクション(b)は、あるフレームにおいて、分割領域35に対して設定される画素位置の例を示している。この例では、分割領域35において、行および列方向それぞれについて画素300を1つおきに選択するように画素位置を設定し、設定された各画素位置の画素300sa1、300sa2、300sa3および300sa4を、サンプリング画素として選択している。このように、前処理部210は、分割領域35を単位としてサブサンプリングを行う。 Section (b) of FIG. 11 shows an example of pixel positions set with respect to the division region 35 in a certain frame. In this example, in the divided region 35, the pixel positions are set so that the pixels 300 are selected every other row and column direction, and the pixels 300sa 1 , 300sa 2 , 300sa 3 and 300sa at each of the set pixel positions are selected. 4 is selected as the sampling pixel. In this way, the preprocessing unit 210 performs subsampling in units of the divided region 35.
 前処理部210は、あるフレームにおいてサンプリング画素として選択された各画素300sa1~300sa4からなる画像を、サンプリング画素からなるサンプリング画像として生成する。図11のセクション(c)は、図11のセクション(b)にてサンプリング画素として選択された各画素300sa1~300sa4から生成されるサンプリング画像36の例を示している。前処理部210は、このサンプリング画像36を認識部220に入力する。認識部220は、このサンプリング画像36に対して認識処理を実行する。 The preprocessing unit 210 generates an image consisting of each pixel 300sa 1 to 300sa 4 selected as a sampling pixel in a certain frame as a sampling image composed of sampling pixels. Section (c) of FIG. 11 shows an example of a sampled image 36 generated from each pixel 300sa 1 to 300sa 4 selected as a sampling pixel in section (b) of FIG. The preprocessing unit 210 inputs the sampled image 36 to the recognition unit 220. The recognition unit 220 executes a recognition process on the sampled image 36.
 図12A~図12Eを用いて、各実施形態の前提技術に係る認識器による認識処理について、より具体的に説明する。上述したように、前処理部210は、フレーム毎に異なる画素位置を、サンプリング画素を選択する画素位置として設定する。認識部220は、フレーム毎に、設定された各画素位置の各画素300からなるサンプリング画像に基づき認識処理を行う。図12A~図12Eは、センサ部10bにより時系列で順次に撮像されたフレーム#1~#5それぞれの画像データ32a~32d、32a’に対する各認識処理を示している。 The recognition process by the recognizer according to the prerequisite technology of each embodiment will be described more specifically with reference to FIGS. 12A to 12E. As described above, the preprocessing unit 210 sets different pixel positions for each frame as pixel positions for selecting sampling pixels. The recognition unit 220 performs recognition processing for each frame based on a sampled image composed of each pixel 300 at each set pixel position. 12A to 12E show each recognition process for the image data 32a to 32d and 32a'of the frames # 1 to # 5, which are sequentially imaged by the sensor unit 10b in time series.
 なお、図12A~図12Eそれぞれにおいて、画像データ32a~32d、32a’による画像には、それぞれ人による対象物41および42が含まれている。対象物41は、センサ部10bに対して比較的近距離(中距離とする)に位置している。一方、対象物42は、センサ部10bに対して、当該中距離より遠方の距離(遠距離とする)に位置しており、画像中のサイズが対象物41より小さい。 Note that in FIGS. 12A to 12E, the images based on the image data 32a to 32d and 32a'contain the objects 41 and 42 by humans, respectively. The object 41 is located at a relatively short distance (medium distance) with respect to the sensor unit 10b. On the other hand, the object 42 is located at a distance (referred to as a long distance) farther than the middle distance with respect to the sensor unit 10b, and the size in the image is smaller than the object 41.
 図12Aのセクション(a)において、前処理部210は、フレーム#1の画像データ32aの各分割領域35に対し、例えば左上隅の画素位置を基点としたサブサンプリングを行う。より具体的には、前処理部210は、画像データ32aの各分割領域35において、左上隅の画素位置を基点として行および列方向にそれぞれ1つおきに選択した各画素300を、それぞれサンプリング画素である画素300sa1~300sa4として選択するサブサンプリングを行う(ステップS10a)。 In the section (a) of FIG. 12A, the preprocessing unit 210 performs subsampling on each divided region 35 of the image data 32a of the frame # 1, for example, with the pixel position in the upper left corner as a base point. More specifically, the preprocessing unit 210 samples each pixel 300 selected every other row and column direction with the pixel position in the upper left corner as the base point in each division region 35 of the image data 32a. Subsampling is performed to select the pixels 300sa 1 to 300sa 4 (step S10a).
 前処理部210は、セクション(b)に示されるように、このサブサンプリングされた各画素300sa1~300sa4により、第1の位相のサンプリング画像36φ1を生成する。生成されたサンプリング画像36φ1は、認識部220に入力される。 As shown in section (b), the preprocessing unit 210 generates a sampled image 36φ1 of the first phase by the subsampled pixels 300sa 1 to 300sa 4. The generated sampled image 36φ1 is input to the recognition unit 220.
 認識部220は、入力されたサンプリング画像36φ1の特徴量50aを、DNNを用いて抽出する(ステップS11)。認識部220は、ステップS11で抽出された特徴量50aを、蓄積部に格納、蓄積する(ステップS12)。認識部220は、蓄積部に既に特徴量が蓄積されている場合、特徴量50aを蓄積部に蓄積すると共に、既に蓄積されている特徴量と統合することができる。図12Aのセクション(b)に、ステップS12の処理として空の蓄積部に対して最初の特徴量50aが格納された様子が示されている。 The recognition unit 220 extracts the feature amount 50a of the input sampled image 36φ1 using DNN (step S11). The recognition unit 220 stores and stores the feature amount 50a extracted in step S11 in the storage unit (step S12). When the feature amount is already accumulated in the storage unit, the recognition unit 220 can accumulate the feature amount 50a in the storage unit and integrate the feature amount with the already accumulated feature amount. Section (b) of FIG. 12A shows how the first feature amount 50a is stored in the empty storage portion as the process of step S12.
 認識部220は、蓄積部に蓄積された特徴量50aに基づき認識処理を実行する(ステップS13)。図12Aの例では、セクション(b)にステップS13として示されるように、中距離に位置する対象物41が認識され認識結果60として得られている。一方、遠距離に位置する対象物42は、認識されていない。 The recognition unit 220 executes the recognition process based on the feature amount 50a accumulated in the storage unit (step S13). In the example of FIG. 12A, as shown in step S13 in section (b), the object 41 located at a medium distance is recognized and obtained as the recognition result 60. On the other hand, the object 42 located at a long distance is not recognized.
 図12Bのセクション(a)において、前処理部210は、フレーム#2の画像データ32bの各分割領域35に対し、図12Aに示したフレーム#1の画像データ32aの各分割領域35に対して設定された画素位置に対して1画素分水平方向にずらした各画素位置を、それぞれサンプリング画素の画素位置として設定するサブサンプリングを行う(ステップS10b)。すなわち、このステップS10bで選択される各サンプリング画素は、図12AにおいてステップS10aで選択された各サンプリング画素の画素位置に対し、図中で右にそれぞれ隣接する画素位置における各画素300である。 In the section (a) of FIG. 12B, the preprocessing unit 210 relates to each divided area 35 of the image data 32b of the frame # 2 with respect to each divided area 35 of the image data 32a of the frame # 1 shown in FIG. 12A. Subsampling is performed in which each pixel position shifted in the horizontal direction by one pixel with respect to the set pixel position is set as the pixel position of the sampling pixel (step S10b). That is, each sampling pixel selected in step S10b is each pixel 300 at a pixel position adjacent to the right side of the pixel position of each sampling pixel selected in step S10a in FIG. 12A.
 前処理部210は、セクション(b)に示されるように、ステップS10bでサブサンプリングされた各サンプリング画素により、第2の位相のサンプリング画像36φ2を生成する。生成されたサンプリング画像36φ2は、認識部220に入力される。 As shown in section (b), the preprocessing unit 210 generates a second phase sampled image 36φ2 from each sampling pixel subsampled in step S10b. The generated sampled image 36φ2 is input to the recognition unit 220.
 認識部220は、入力されたサンプリング画像36φ2の特徴量50bを、DNNを用いて抽出する(ステップS11)。認識部220は、ステップS11で抽出された特徴量50bを、蓄積部に格納、蓄積する(ステップS12)。この例では、セクション(b)にステップS12として示されるように、蓄積部に対し、第1の位相のサンプリング画像36φ1から抽出された特徴量50aが既に蓄積されている。そのため、認識部220は、特徴量50bを蓄積部に蓄積すると共に、特徴量50bを、蓄積されている特徴量50aと統合する。 The recognition unit 220 extracts the feature amount 50b of the input sampled image 36φ2 using DNN (step S11). The recognition unit 220 stores and stores the feature amount 50b extracted in step S11 in the storage unit (step S12). In this example, as shown in step S12 in section (b), the feature amount 50a extracted from the sampled image 36φ1 of the first phase is already stored in the storage unit. Therefore, the recognition unit 220 accumulates the feature amount 50b in the storage unit and integrates the feature amount 50b with the stored feature amount 50a.
 認識部220は、特徴量50aと特徴量50bとが統合された特徴量に基づき認識処理を実行する(ステップS13)。図12Bの例では、セクション(b)にステップS13として示されるように、中距離に位置する対象物41が認識され認識結果60として得られているが、遠距離に位置する対象物42は、この時点では認識されていない。 The recognition unit 220 executes the recognition process based on the feature amount in which the feature amount 50a and the feature amount 50b are integrated (step S13). In the example of FIG. 12B, as shown in step S13 in section (b), the object 41 located at a medium distance is recognized and obtained as the recognition result 60, but the object 42 located at a long distance is Not recognized at this point.
 図12Cのセクション(a)において、前処理部210は、フレーム#3の画像データ32cの各分割領域35に対し、図12Aに示したフレーム#1の画像データ32aの各分割領域35に対して設定された画素位置に対して位置を1画素分、列方向にずらした画素位置を、各サンプリング画素の画素位置として設定するサブサンプリングを行う(ステップS10c)。すなわち、このステップS10cで選択される各サンプリング画素は、図12AにおいてステップS10aで選択された各サンプリング画像の画素位置に対して、図中で下にそれぞれ隣接する画素位置における各画素300である。 In the section (a) of FIG. 12C, the preprocessing unit 210 relates to each divided area 35 of the image data 32c of the frame # 3 with respect to each divided area 35 of the image data 32a of the frame # 1 shown in FIG. 12A. Subsampling is performed in which the pixel position shifted in the column direction by one pixel with respect to the set pixel position is set as the pixel position of each sampling pixel (step S10c). That is, each sampling pixel selected in step S10c is each pixel 300 at a pixel position adjacent to the lower side in the figure with respect to the pixel position of each sampling image selected in step S10a in FIG. 12A.
 前処理部210は、セクション(b)に示されるように、ステップS10cでサブサンプリングされた各サンプリングにより、第3の位相のサンプリング画像36φ3を生成する。生成されたサンプリング画像36φ3は、認識部220に入力される。 As shown in section (b), the preprocessing unit 210 generates a sampled image 36φ3 of the third phase by each sampling subsampled in step S10c. The generated sampled image 36φ3 is input to the recognition unit 220.
 認識部220は、入力されたサンプリング画像36φ3の特徴量50cを、DNNを用いて抽出する(ステップS11)。認識部220は、ステップS11で抽出された特徴量50cを、蓄積部に格納、蓄積する(ステップS12)。この例では、セクション(b)にステップS12として示されるように、蓄積部に対し、第1および第2の位相のサンプリング画像36φ1および36φ2からそれぞれ抽出された特徴量50aおよび50bが既に蓄積されている。そのため、認識部220は、特徴量50cを蓄積部に蓄積すると共に、特徴量50cを、蓄積されている特徴量50aおよび50bと統合する。 The recognition unit 220 extracts the feature amount 50c of the input sampled image 36φ3 using DNN (step S11). The recognition unit 220 stores and stores the feature amount 50c extracted in step S11 in the storage unit (step S12). In this example, as shown in step S12 in section (b), the feature quantities 50a and 50b extracted from the sampled images 36φ1 and 36φ2 of the first and second phases are already stored in the storage unit, respectively. There is. Therefore, the recognition unit 220 accumulates the feature amount 50c in the storage unit and integrates the feature amount 50c with the accumulated feature amounts 50a and 50b.
 認識部220は、特徴量50aおよび50bと、特徴量50cとが統合された特徴量に基づき認識処理を実行する(ステップS13)。図12Cの例では、セクション(b)にステップS13として示されるように、中距離に位置する対象物41が認識され認識結果60として得られているが、遠距離に位置する対象物42は、この時点では認識されていない。 The recognition unit 220 executes the recognition process based on the feature amount in which the feature amounts 50a and 50b and the feature amount 50c are integrated (step S13). In the example of FIG. 12C, as shown in step S13 in section (b), the object 41 located at a medium distance is recognized and obtained as the recognition result 60, but the object 42 located at a long distance is Not recognized at this point.
 図12Dのセクション(a)において、前処理部210は、フレーム#4の画像データ32dの各分割領域35に対し、図12Cに示したフレーム#3の画像データ32cの各分割領域35に対して設定された画素位置に対して1画素分水平方向にずらした各画素位置を、それぞれサンプリング画素の画素位置として設定するサブサンプリングを行う(ステップS10d)。すなわち、このステップS10dで選択される各サンプリング画素は、図12CにおいてステップS10cで選択された各サンプリング画像の画素位置に対し、図中で右にそれぞれ隣接する画素位置における各画素300である。 In the section (a) of FIG. 12D, the preprocessing unit 210 relates to each divided area 35 of the image data 32d of the frame # 4 with respect to each divided area 35 of the image data 32c of the frame # 3 shown in FIG. 12C. Subsampling is performed in which each pixel position shifted in the horizontal direction by one pixel with respect to the set pixel position is set as the pixel position of the sampling pixel (step S10d). That is, each sampling pixel selected in step S10d is each pixel 300 at a pixel position adjacent to the right side in the figure with respect to the pixel position of each sampling image selected in step S10c in FIG. 12C.
 前処理部210は、セクション(b)に示されるように、ステップS10dでサブサンプリングされた各サンプリングにより、第4の位相のサンプリング画像36φ4を生成する。生成されたサンプリング画像36φ4は、認識部220に入力される。 As shown in section (b), the preprocessing unit 210 generates a sampled image 36φ4 of the fourth phase by each sampling subsampled in step S10d. The generated sampled image 36φ4 is input to the recognition unit 220.
 認識部220は、入力されたサンプリング画像36φ4の特徴量50dを、DNNを用いて抽出する(ステップS11)。認識部220は、ステップS11で抽出された特徴量50dを、蓄積部に格納、蓄積する(ステップS12)。この例では、セクション(b)にステップS12として示されるように、蓄積部に対し、第1~第3の位相のサンプリング画像36φ1~36φ3からそれぞれ抽出された各特徴量50a~50cが既に蓄積されている。そのため、認識部220は、特徴量50dを蓄積部に蓄積すると共に、特徴量50dを、蓄積されている特徴量50a~50cと統合する。 The recognition unit 220 extracts the feature amount 50d of the input sampled image 36φ4 using DNN (step S11). The recognition unit 220 stores and stores the feature amount 50d extracted in step S11 in the storage unit (step S12). In this example, as shown in step S12 in section (b), each feature amount 50a to 50c extracted from the sampled images 36φ1 to 36φ3 of the first to third phases has already been accumulated in the storage unit. ing. Therefore, the recognition unit 220 accumulates the feature amount 50d in the storage unit and integrates the feature amount 50d with the accumulated feature amounts 50a to 50c.
 認識部220は、特徴量50a~50cと、特徴量50dとが統合された特徴量に基づき認識処理を実行する(ステップS13)。図12Dの例では、セクション(b)にステップS13として示されるように、中距離に位置する対象物41が認識され認識結果60として得られ、遠距離に位置する対象物42がさらに認識され認識結果61として得られている。 The recognition unit 220 executes the recognition process based on the feature amount in which the feature amounts 50a to 50c and the feature amount 50d are integrated (step S13). In the example of FIG. 12D, as shown in step S13 in section (b), the object 41 located at a medium distance is recognized and obtained as a recognition result 60, and the object 42 located at a long distance is further recognized and recognized. The result is 61.
 図12A~図12Dの処理により、各分割領域35に含まれる16個の画素300の画素位置全てが、サンプリング画素の画素位置として選択されたことになる。したがって、前処理部210は、1フレームに含まれる全ての画素300の画素位置を、サンプリング画素の画素位置として選択する。また、前処理部210は、各分割領域35に含まれる16個の画素300の画素位置を、1画素分ずつ位相をずらして選択するといえる。 By the processing of FIGS. 12A to 12D, all the pixel positions of the 16 pixels 300 included in each division area 35 are selected as the pixel positions of the sampling pixels. Therefore, the preprocessing unit 210 selects the pixel positions of all the pixels 300 included in one frame as the pixel positions of the sampling pixels. Further, it can be said that the preprocessing unit 210 selects the pixel positions of the 16 pixels 300 included in each division region 35 by shifting the phase by one pixel.
 この、各分割領域35あるいは1フレームに対して最初にサンプリング画素の画素位置を選択した時点から、各分割領域35あるいは1フレームに含まれる全ての画素300の画素位置がサンプリング画素の画素位置として選択されるまでの期間を、1周期とする。すなわち、前処理部210は、各分割領域35の各画素位置を一定の周期で巡回して、当該分割領域35内の全ての画素位置を、サンプリング画素を取得するための画素位置として設定する。 From the time when the pixel positions of the sampling pixels are first selected for each division area 35 or one frame, the pixel positions of all the pixels 300 included in each division area 35 or one frame are selected as the pixel positions of the sampling pixels. The period until it is done is one cycle. That is, the preprocessing unit 210 circulates each pixel position of each division area 35 at a constant cycle, and sets all the pixel positions in the division area 35 as pixel positions for acquiring sampling pixels.
 1周期分のサブサンプリングおよび認識処理が終了すると、次の1周期分のサブサンプリングおよび認識処理が開始される。 When the subsampling and recognition processing for one cycle is completed, the subsampling and recognition processing for the next one cycle is started.
 すなわち、図12Eのセクション(a)において、前処理部210は、フレーム#1’の画像データ32a’の各分割領域35に対し、図12Aの例と同様にして、左上隅の画素位置を基点としたサブサンプリングを行う(ステップS10a’)。前処理部210は、セクション(b)に示されるように、ステップS10a’でサブサンプリングされた各サンプリングにより、第1の位相のサンプリング画像36φ1’を生成する。生成されたサンプリング画像36φ1’は、認識部220に入力される。 That is, in the section (a) of FIG. 12E, the preprocessing unit 210 uses the pixel position in the upper left corner as the base point for each divided region 35 of the image data 32a'of the frame # 1'in the same manner as in the example of FIG. 12A. Subsampling is performed (step S10a'). As shown in section (b), the preprocessing unit 210 generates a sampled image 36φ1'of the first phase by each sampling subsampled in step S10a'. The generated sampled image 36φ1'is input to the recognition unit 220.
 認識部220は、入力されたサンプリング画像36φ1’の特徴量50a’を、DNNを用いて抽出する(ステップS11)。認識部220は、ステップS11で抽出された特徴量50a’を、蓄積部に格納、蓄積する(ステップS12)。この例では、セクション(b)にステップS12として示されるように、蓄積部に対し、直前の周期において第1~第4の位相のサンプリング画像36φ1~36φ4からそれぞれ抽出された各特徴量50a~50dが既に蓄積されている。そのため、認識部220は、特徴量50a’を蓄積部に蓄積すると共に、特徴量50a’を、蓄積されている特徴量50a~50dと統合する。 The recognition unit 220 extracts the feature amount 50a'of the input sampled image 36φ1' using DNN (step S11). The recognition unit 220 stores and stores the feature amount 50a'extracted in step S11 in the storage unit (step S12). In this example, as shown in step S12 in section (b), each feature amount 50a to 50d extracted from the sampled images 36φ1 to 36φ4 of the first to fourth phases in the immediately preceding period with respect to the storage unit. Has already been accumulated. Therefore, the recognition unit 220 accumulates the feature amount 50a'in the storage unit and integrates the feature amount 50a'with the accumulated feature amounts 50a to 50d.
 これに限らず、認識部220は、サンプリング画素の画素位置選択の周期毎に蓄積部をリセットするようにしてもよい。蓄積部のリセットは、例えば、蓄積部に蓄積された1周期分の特徴量50a~50dを、蓄積部から削除することで可能である。 Not limited to this, the recognition unit 220 may reset the storage unit every cycle of selecting the pixel position of the sampling pixel. The storage unit can be reset, for example, by deleting the feature amounts 50a to 50d for one cycle accumulated in the storage unit from the storage unit.
 また、認識部220は、蓄積部に対して、常に一定量の特徴量を蓄積するようにもできる。例えば、認識部220は、蓄積部に対して1周期分の特徴量、すなわち、4フレーム分の特徴量を蓄積する。この場合、認識部220は、新たな特徴量50a’が抽出されると、蓄積部に蓄積される特徴量50a~50dのうち、例えば最も古い特徴量50dを削除し、新たな特徴量50a’を蓄積部に格納し、蓄積する。認識部220は、特徴量50dを削除されて残った特徴量50a~50cと、新たな特徴量50a’と、を統合した蓄積量に基づき認識処理を実行する。 Further, the recognition unit 220 can always accumulate a certain amount of features in the storage unit. For example, the recognition unit 220 stores the feature amount for one cycle, that is, the feature amount for four frames with the storage unit. In this case, when the new feature amount 50a'is extracted, the recognition unit 220 deletes, for example, the oldest feature amount 50d among the feature amounts 50a to 50d accumulated in the storage unit, and the new feature amount 50a'. Is stored in the storage unit and stored. The recognition unit 220 executes the recognition process based on the accumulated amount in which the feature amounts 50a to 50c remaining after the feature amount 50d is deleted and the new feature amount 50a'are integrated.
 認識部220は、蓄積部にすでに蓄積されている特徴量50a~50dと、新たに抽出された特徴量50a’とが統合された特徴量に基づき認識処理を実行する(ステップS13)。図12Eの例では、セクション(b)にステップS13として示されるように、中距離に位置する対象物41が認識され認識結果60として得られ、遠距離に位置する対象物42がさらに認識され認識結果61として得られている。 The recognition unit 220 executes the recognition process based on the feature amount in which the feature amounts 50a to 50d already accumulated in the storage unit and the newly extracted feature amount 50a'are integrated (step S13). In the example of FIG. 12E, as shown in step S13 in section (b), the object 41 located at a medium distance is recognized and obtained as a recognition result 60, and the object 42 located at a long distance is further recognized and recognized. The result is 61.
 ここで、サンプリング画像36は、元の画像データ32から画素を間引きした間引き画像である。図11の例では、サンプリング画像36は、画像データ32を行および列方向にそれぞれ1/2に縮小した画像データであって、画素数が元の画像データ32の1/4の縮小画像である。したがって、認識部220は、サンプリング画像36に対する認識処理を、元の画像データ32に含まれる画素300を全て用いた認識処理に対して高速に実行できる。 Here, the sampled image 36 is a thinned image in which pixels are thinned out from the original image data 32. In the example of FIG. 11, the sampled image 36 is image data obtained by reducing the image data 32 by 1/2 in the row and column directions, respectively, and the number of pixels is 1/4 of the original image data 32. .. Therefore, the recognition unit 220 can execute the recognition process for the sampled image 36 at high speed with respect to the recognition process using all the pixels 300 included in the original image data 32.
 また、サンプリング画像36を生成するためにサンプリング画素として設定する画素300の画素位置を、分割領域35内でフレーム毎に1画素分ずつずらして選択している。そのため、フレーム毎に1画素分ずつ位相がずれたサンプリング画像36を得ることができる。またこのとき、分割領域35に含まれる全ての画素300の画素位置が、サンプリング画素として設定する画素300の画素位置として選択されるようにする。 Further, the pixel position of the pixel 300 set as the sampling pixel for generating the sampled image 36 is selected by shifting it by one pixel for each frame in the division area 35. Therefore, it is possible to obtain a sampled image 36 that is out of phase by one pixel for each frame. At this time, the pixel positions of all the pixels 300 included in the division area 35 are selected as the pixel positions of the pixels 300 to be set as sampling pixels.
 このようにサンプリング画像36を生成する画素300の画素位置を選択し、各サンプリング画像36から算出された特徴量を蓄積、統合する。これにより、画像データ32に含まれる全ての画素位置の画素300を、認識処理に関与させることができ、例えば遠方の対象物も容易に認識可能にできる。 In this way, the pixel positions of the pixels 300 that generate the sampled image 36 are selected, and the feature amounts calculated from each sampled image 36 are accumulated and integrated. As a result, the pixels 300 at all the pixel positions included in the image data 32 can be involved in the recognition process, and for example, a distant object can be easily recognized.
 なお、上述では、サンプリング画素を選択するための画素位置を、前処理部210が所定の規則に従い設定するように説明したが、これはこの例に限定されない。例えば、前処理部210は、認識処理部20bの外部、あるいは、当該認識処理部20bが含まれる情報処理装置1bの外部からの指示に応じて、サンプリング画素を選択するための画素位置を設定してもよい。 In the above description, the pixel position for selecting the sampling pixel is set by the preprocessing unit 210 according to a predetermined rule, but this is not limited to this example. For example, the preprocessing unit 210 sets a pixel position for selecting sampling pixels in response to an instruction from the outside of the recognition processing unit 20b or the outside of the information processing device 1b including the recognition processing unit 20b. You may.
(1-2-1-3.各実施形態の前提技術に係るサブサンプリング処理について)
 次に、各実施形態の前提技術におけるサブサンプリング処理について、より具体的に説明する。図13Aおよび図13Bは、各実施形態の前提技術に係る認識処理におけるサブサンプリング処理について説明するための模式図である。ここでは、説明のため、図13Aのセクション(b)に示されるように、分割領域35を2画素×2画素の領域としている。各分割領域35において、左上の画素位置を原点の座標[0,0]とし、右上、左下および右下の画素位置を、それぞれ座標[1,0][0,1]および[1,1]とする。また、画素300のサンプリングは、各分割領域35において、右下の画素位置[1,1]を基点として、座標[1,1]、[1,0]、[0,1]、[0,0]の順に行うものとする。
(1-2-1-3. Subsampling processing related to the prerequisite technology of each embodiment)
Next, the subsampling process in the prerequisite technology of each embodiment will be described more specifically. 13A and 13B are schematic views for explaining the subsampling process in the recognition process according to the prerequisite technology of each embodiment. Here, for the sake of explanation, as shown in the section (b) of FIG. 13A, the divided region 35 is defined as a region of 2 pixels × 2 pixels. In each division region 35, the upper left pixel position is the origin coordinate [0,0], and the upper right, lower left, and lower right pixel positions are the coordinates [1,0] [0,1] and [1,1], respectively. And. Further, sampling of the pixel 300 is performed in each division region 35 with the coordinates [1,1], [1,0], [0,1], [0,1] starting from the lower right pixel position [1,1]. 0] shall be performed in this order.
 図13Aのセクション(a)において、図の下から上に向けて、時間の経過を表す。図13Aの例では、上述した図12A~図12Eと対応し、画像データ32aが最も新しい時間Tの画像[T]であり、以降、画像データ32b、画像データ32c、画像データ32dの順に、時間T-1、T-2、T-3と、1フレームずつ古い画像データ32による画像[T-1]、画像[T-2]、画像[T-3]となっている。 In section (a) of FIG. 13A, the passage of time is shown from the bottom to the top of the figure. In the example of FIG. 13A, corresponding to the above-mentioned FIGS. 12A to 12E, the image data 32a is the image [T] having the newest time T, and thereafter, the time is in the order of the image data 32b, the image data 32c, and the image data 32d. The images are T-1, T-2, and T-3, and the image [T-1], the image [T-2], and the image [T-3] based on the old image data 32 frame by frame.
 前処理部210は、時間T-3において、画像データ32aについて、各分割領域35の座標[1,1]の画素300をサンプリング画素として選択し(ステップS10a)、認識部220は、選択されたサンプリング画素によるサンプリング画像36φ1の特徴量を抽出する(ステップS11)。認識部220は、サンプリング画像36φ1から抽出された特徴量50aを、例えばそれ以前の所定期間に抽出された特徴量と統合し(ステップS12)、統合された特徴量に基づき認識処理を行う(ステップS13)。 At time T-3, the preprocessing unit 210 selected the pixels 300 at the coordinates [1,1] of each division region 35 as sampling pixels for the image data 32a (step S10a), and the recognition unit 220 was selected. The feature amount of the sampled image 36φ1 by the sampling pixel is extracted (step S11). The recognition unit 220 integrates the feature amount 50a extracted from the sampled image 36φ1 with, for example, the feature amount extracted in a predetermined period before that (step S12), and performs recognition processing based on the integrated feature amount (step S12). S13).
 ここで、例えば、上述した画像データ32aの各分割領域35におけるサブサンプリング処理(ステップS10a)により、画像データ32aを均一に間引いたサンプリング画像36φ1を得ることができる。このサンプリング画像36φ1からステップS11により抽出された特徴量50aを用いて、画像データ32aの全体に対する認識処理を実行することができる。この、画像データ32からサブサンプリングにより選択したサンプリング画素によるサンプリング画像に対する認識処理により、画像データ32に対する認識処理を完結させることが可能である。 Here, for example, the sampled image 36φ1 obtained by uniformly thinning out the image data 32a can be obtained by the subsampling process (step S10a) in each of the divided regions 35 of the image data 32a described above. Using the feature amount 50a extracted from the sampled image 36φ1 in step S11, the recognition process for the entire image data 32a can be executed. It is possible to complete the recognition process for the image data 32 by the recognition process for the sampled image by the sampling pixels selected by subsampling from the image data 32.
 この、画像データ32からサンプリング画像を生成し、生成されたサンプリング画像から特徴量を抽出し、抽出された特徴量に基づき認識処理を行う一連の処理を、1単位の処理と呼ぶ。図13Aの例では、例えばステップS10aのサブサンプリング処理と、当該サブサンプリング処理により生成されるサンプリング画像36φ1に対するステップS11による特徴量抽出処理と、ステップS12による特徴量の統合処理と、ステップS13による認識処理と、が、1単位の処理に含まれる。認識部220は、この1単位の処理毎に、間引きされた画像データ32に対する認識処理を実行できる(ステップS13)。 This series of processes in which a sampled image is generated from the image data 32, a feature amount is extracted from the generated sampled image, and recognition processing is performed based on the extracted feature amount is called a unit process. In the example of FIG. 13A, for example, the subsampling process of step S10a, the feature amount extraction process of step S11 for the sampled image 36φ1 generated by the subsampling process, the feature amount integration process of step S12, and the recognition by step S13. Processing is included in one unit of processing. The recognition unit 220 can execute the recognition process for the thinned-out image data 32 for each process of this one unit (step S13).
 以降、同様にして、認識処理部20bは、フレーム周期で順次に更新される各画像データ32b、32cおよび32dについて、上述した1単位の処理をそれぞれ実行し、認識処理を実行する。このとき、ステップS12の特徴量の統合処理、および、ステップS13の認識処理は、各単位の処理において共通とすることができる。 After that, in the same manner, the recognition processing unit 20b executes the above-mentioned one-unit processing for each of the image data 32b, 32c, and 32d that are sequentially updated in the frame cycle, and executes the recognition processing. At this time, the feature amount integration process in step S12 and the recognition process in step S13 can be common in the process of each unit.
 上述の、画像データ32a~32dそれぞれに対して1単位の処理が行われることで、各分割領域35に含まれる各画素位置に対するサンプリング画素の選択が一巡する。図13Bは、この各分割領域35に含まれる各画素位置に対するサンプリング画素の選択の一巡後の、次の1単位の処理について示している。すなわち、各画像データ32a、32b、32cおよび32dに対する1単位の処理が一巡すると、認識処理部20bに入力される次のフレームの画像データ32a’に対する1単位分の処理が実行される。 By performing one unit of processing for each of the image data 32a to 32d described above, the selection of sampling pixels for each pixel position included in each division area 35 is completed. FIG. 13B shows the next one unit of processing after one cycle of sampling pixel selection for each pixel position included in each divided region 35. That is, when one unit of processing for each of the image data 32a, 32b, 32c and 32d has completed, one unit of processing for the image data 32a'of the next frame input to the recognition processing unit 20b is executed.
 この例では、最も古い画像データ32dに基づき抽出された特徴量50dを破棄し、新たな画像データ32a’から特徴量50a’を抽出する。すなわち、前処理部210は、画像データ32a’の各分割領域35の座標[1,1]の各画素300をサンプリング画素として選択し、サンプリング画像36φ1を生成する。認識部220は、この画像データ32a’から選択されたサンプリング画像36φ1から特徴量50a’を抽出する。認識部220は、この特徴量50a’と、直前までに抽出された特徴量50a、50bおよび50cと、を統合し、統合した特徴量に基づき認識処理を行う。この場合、認識部220は、新たに取得された画像データ32a’についてのみ、特徴量の抽出処理を行えばよい。 In this example, the feature amount 50d extracted based on the oldest image data 32d is discarded, and the feature amount 50a'is extracted from the new image data 32a'. That is, the preprocessing unit 210 selects each pixel 300 of the coordinates [1,1] of each division region 35 of the image data 32a'as a sampling pixel, and generates a sampled image 36φ1. The recognition unit 220 extracts the feature amount 50a'from the sampled image 36φ1 selected from the image data 32a'. The recognition unit 220 integrates the feature amount 50a'and the feature amounts 50a, 50b, and 50c extracted up to the previous time, and performs recognition processing based on the integrated feature amount. In this case, the recognition unit 220 may perform the feature amount extraction process only for the newly acquired image data 32a'.
 このように、各実施形態の前提技術に係る認識処理は、認識処理部20bにおいて、同一の処理系において1単位分の処理を実行することで行っている。より具体的には、認識処理部20bは、1単位分の処理として、画像データ32に対するサブサンプリング処理および特徴量抽出処理による処理系をフレーム毎に繰り返し、この繰り返しにより抽出された特徴量を統合し、認識処理を行っている。 As described above, the recognition process related to the prerequisite technology of each embodiment is performed by executing the process for one unit in the same processing system in the recognition processing unit 20b. More specifically, the recognition processing unit 20b repeats the processing system of the subsampling process and the feature amount extraction process for the image data 32 for each frame as the processing for one unit, and integrates the feature amounts extracted by this repetition. And the recognition process is being performed.
 また、認識処理部20bは、画像データ32に含まれる全画素300の画素位置を含むサブサンプリング処理を、サンプリング画素を選択する画素位置を周期的にずらしながら行っている。さらに、認識処理部20bは、ステップS11で各フレームの画像データ32から選択されたサンプリング画素によるサンプリング画像から抽出した、中間データとしての特徴量を統合して認識処理を行っている。 Further, the recognition processing unit 20b performs the subsampling process including the pixel positions of all the pixels 300 included in the image data 32 while periodically shifting the pixel positions for selecting the sampling pixels. Further, the recognition processing unit 20b integrates the feature amounts as intermediate data extracted from the sampled image by the sampling pixels selected from the image data 32 of each frame in step S11 to perform the recognition process.
 このように構成された各実施形態の前提技術に係る認識処理は、1単位分の処理で完結可能な処理系とされているため、認識結果をより迅速に得ることができる。また、1単位で画像データ32の全体からサンプリング画素を選択するため、1単位の処理で広範囲の認識結果を確認できる。さらに、複数の画像データ32に基づく中間データ(特徴量)を統合するため、複数の単位に跨ることで取得される、より詳細な認識結果を取得可能である。 Since the recognition process related to the prerequisite technology of each embodiment configured in this way is a processing system that can be completed in the process of one unit, the recognition result can be obtained more quickly. Further, since the sampling pixel is selected from the entire image data 32 in one unit, a wide range of recognition results can be confirmed by one unit of processing. Further, since the intermediate data (feature amount) based on the plurality of image data 32 is integrated, it is possible to acquire a more detailed recognition result acquired by straddling a plurality of units.
 すなわち、各実施形態の前提技術に係る情報処理装置1bを用いることで、認識結果の同時性の向上と、撮像画像の解像度を活用した認識結果の取得とを両立させることが可能となり、撮像画像を用いた認識処理の特性を向上させることができる。 That is, by using the information processing device 1b according to the prerequisite technology of each embodiment, it is possible to improve the simultaneity of the recognition results and acquire the recognition results by utilizing the resolution of the captured image. It is possible to improve the characteristics of the recognition process using.
(1-3.各実施形態に係る認識処理の基本的なアーキテクチャ)
 次に、本開示の各実施形態に係る認識処理の基本的なアーキテクチャについて説明する。図14Aは、既存技術に係る認識処理の基本的なアーキテクチャを説明するための模式図である。既存技術における認識器は、図14Aに示すように、1つの入力情報(例えば画像)に対して認識処理を実行し、基本的には、当該入力情報に対して1つの認識結果を出力する。
(1-3. Basic architecture of recognition processing according to each embodiment)
Next, the basic architecture of the recognition process according to each embodiment of the present disclosure will be described. FIG. 14A is a schematic diagram for explaining the basic architecture of the recognition process according to the existing technology. As shown in FIG. 14A, the recognizer in the existing technique executes the recognition process for one input information (for example, an image), and basically outputs one recognition result for the input information.
 図14Bは、各実施形態に係る認識処理の基本的なアーキテクチャを説明するための模式図である。各実施形態に係る認識器は、例えば図9の認識部220に対応し、図14Bに示すように、1つの入力情報(例えば画像)に対して時間軸展開により認識処理を実行し、当該認識処理に応じて複数の認識結果を出力することができる。ここで、時間軸展開による認識処理は、図10、図11、図12A~図12Eなどを用いて説明したように、分割領域35毎に画素間引きによるサブサンプリングを行い、サブサンプリングされたサンプリング画素によるサンプリング画像毎に、認識処理を実行する処理となる。 FIG. 14B is a schematic diagram for explaining the basic architecture of the recognition process according to each embodiment. The recognizer according to each embodiment corresponds to, for example, the recognition unit 220 of FIG. 9, and as shown in FIG. 14B, executes recognition processing for one input information (for example, an image) by time axis expansion, and performs the recognition process. It is possible to output a plurality of recognition results according to the processing. Here, in the recognition process based on the time axis expansion, as described with reference to FIGS. 10, 11, 12A to 12E, subsampling is performed by thinning out the pixels for each division region 35, and the subsampled sampling pixels are used. It is a process of executing the recognition process for each sampled image by.
 図14Bの例では、各実施形態に係る認識器は、1つの入力情報に対して、時間軸展開での認識処理により、応答性の高い速報結果と、精度の高い統合結果と、の2つの認識結果を出力可能としている。これらのうち、速報結果は、例えば、各分割領域35において最初のサブサンプリングにより取得されたサンプリング画像に対して行った認識処理による認識結果である。一方、統合結果は、例えば、各分割領域35において、各サブサンプリングによりそれぞれ取得された各サンプリング画像から抽出された特徴量を統合した特徴量に基づき行った認識処理による認識結果である。 In the example of FIG. 14B, the recognizer according to each embodiment has two types of input information, one is a highly responsive breaking news result and the other is a highly accurate integrated result, by the recognition process in the time axis expansion. The recognition result can be output. Of these, the breaking news result is, for example, the recognition result by the recognition process performed on the sampled image acquired by the first subsampling in each divided region 35. On the other hand, the integration result is, for example, a recognition result obtained by a recognition process performed based on the integrated feature amount of the feature amounts extracted from each sampled image acquired by each subsampling in each divided region 35.
 図14Bに示す各実施形態に係る認識器において実行される認識処理の計算量は、図14Aに示す既存技術による認識器において実行される認識処理の計算量と略同一である。したがって、各実施形態に係る認識器によれば、より応答性の高い速報結果と、より精度の高い統合結果と、の両方の認識結果を、既存技術による認識器と略同程度の計算量により取得することが可能である。 The calculation amount of the recognition process executed in the recognizer according to each embodiment shown in FIG. 14B is substantially the same as the calculation amount of the recognition process executed in the recognizer according to the existing technology shown in FIG. 14A. Therefore, according to the recognizer according to each embodiment, the recognition result of both the more responsive breaking news result and the more accurate integrated result can be obtained by the amount of calculation substantially the same as that of the recognizer by the existing technology. It is possible to get it.
(1-3-1.より具体的な構成)
 次に、各実施形態に係る認識処理の基本的なアーキテクチャのより具体的な構成について説明する。
(1-3-1. More specific configuration)
Next, a more specific configuration of the basic architecture of the recognition process according to each embodiment will be described.
(1-3-1-1.第1の例)
 図15は、各実施形態に係る認識処理の基本的なアーキテクチャにおける読み出しおよび認識処理の、第1の例を示す一例のタイムチャートである。なお、この図15および後述する図16では、図11のセクション(b)にて説明した、4画素×4画素のサイズの分割領域35において、1画素おきにサンプリング画素を選択するものとしている。この場合、各分割領域35は、4回のサブサンプリングにより全ての画素位置が選択され、1フレームの画像データ32が第1~第4の位相の4つのサンプリング画像36φ1~36φ4に分割されることになる。
(1-3-1-1. First example)
FIG. 15 is an example time chart showing a first example of reading and recognition processing in the basic architecture of recognition processing according to each embodiment. In FIG. 15 and FIG. 16 described later, sampling pixels are selected every other pixel in the divided region 35 having a size of 4 pixels × 4 pixels described in the section (b) of FIG. In this case, in each division region 35, all pixel positions are selected by four subsamplings, and the image data 32 of one frame is divided into four sampled images 36φ1 to 36φ4 in the first to fourth phases. become.
 この第1の例では、サブサンプリングによる第1~第4の位相のサンプリング画像36φ1~36φ4を、時系列で連なる複数のフレームの画像データ32それぞれから抽出する例である。すなわち、この第1の例では、第1~第4の位相のサンプリング画像36φ1~36φ4を、時系列で連なる複数のフレームの画像データ32を跨いで抽出する。この第1の例による認識処理は、複数フレーム間で行われる認識処理であり、適宜、インターフレーム(inter-frame)処理と呼ぶ。 In this first example, the sampled images 36φ1 to 36φ4 of the first to fourth phases by subsampling are extracted from each of the image data 32 of a plurality of frames connected in chronological order. That is, in this first example, the sampled images 36φ1 to 36φ4 of the first to fourth phases are extracted across the image data 32 of a plurality of frames connected in chronological order. The recognition process according to the first example is a recognition process performed between a plurality of frames, and is appropriately referred to as an inter-frame process.
 図15において、撮像周期はフレーム周期であって、例えば50[ms](20[fps(frame per second)])である。また、ここでは、画素アレイ部1001に行列状の配列で配置される画素回路1000からの読み出しを、ローリングシャッタ方式によりライン順次で行う。ここで、図15において、右方向に時間の経過を表し、上から下に向けてライン位置を表す。 In FIG. 15, the imaging cycle is a frame cycle, for example, 50 [ms] (20 [fps (frame per second)]). Further, here, reading from the pixel circuit 1000 arranged in a matrix arrangement in the pixel array unit 1001 is performed line-sequentially by a rolling shutter method. Here, in FIG. 15, the passage of time is shown to the right, and the line position is shown from top to bottom.
 例えばフレーム#1の撮像処理において、各ラインで所定時間の露光が行われ、露光の終了後、各画素回路1000から画素信号が垂直信号線VSLを介してAD変換部1003へ転送され、AD変換部1003において、各AD変換器1007により、転送されたアナログ方式の画素信号がデジタル信号である画素データに変換される。全てのラインについて、画素信号の画素データへの変換が行われると、フレーム#1の画素データによる画像データ32aが前処理部210に入力される。 For example, in the imaging process of frame # 1, each line is exposed for a predetermined time, and after the exposure is completed, the pixel signal is transferred from each pixel circuit 1000 to the AD conversion unit 1003 via the vertical signal line VSL to perform AD conversion. In unit 1003, each AD converter 1007 converts the transferred analog pixel signal into pixel data which is a digital signal. When the pixel signal is converted into pixel data for all lines, the image data 32a based on the pixel data of frame # 1 is input to the preprocessing unit 210.
 前処理部210は、入力された画像データ32aに対して上述したようなサブサンプリング処理(図中「SS」として示す)により、第1の位相φ1のサブサンプリングを施す。前処理部210は、第1の位相φ1のサブサンプリングにより、分割領域35毎に選択されたサンプリング画素の画素位置から画素300を取得し、サンプリング画像36φ1を生成する(ステップS10a)。 The preprocessing unit 210 performs subsampling of the first phase φ1 on the input image data 32a by the subsampling process (indicated as “SS” in the figure) as described above. The pre-processing unit 210 acquires the pixels 300 from the pixel positions of the sampling pixels selected for each division region 35 by the subsampling of the first phase φ1, and generates the sampled image 36φ1 (step S10a).
 前処理部210は、サンプリング画像36φ1を認識部220に渡す。このとき前処理部210から認識部220に渡されるサンプリング画像36φ1は、サブサンプリング処理により間引きされ画像データ32aに対して画素数が削減された画像である。認識部220は、このサンプリング画像36φ1に対して、認識処理を実行する。ここでは、認識処理として、特徴量抽出処理(ステップS11)、特徴量統合処理(ステップS12)および認識処理(ステップS13)を含んでいるものとして示している。サンプリング画像36φ1に基づく認識結果φ1は、認識処理部20bの外部に出力される。 The preprocessing unit 210 passes the sampled image 36φ1 to the recognition unit 220. At this time, the sampled image 36φ1 passed from the preprocessing unit 210 to the recognition unit 220 is an image in which the number of pixels is reduced with respect to the image data 32a by thinning out by the subsampling process. The recognition unit 220 executes a recognition process on the sampled image 36φ1. Here, as the recognition process, it is shown that the feature amount extraction process (step S11), the feature amount integration process (step S12), and the recognition process (step S13) are included. The recognition result φ1 based on the sampled image 36 φ1 is output to the outside of the recognition processing unit 20b.
 これらステップS11~ステップS13の処理は、1フレームの期間内に行われる。ここで、処理対象となるサンプリング画像36φ1は、サブサンプリング処理により間引きされ画像データ32aに対して画素数が削減された画像である。そのため、画像データ32aに対して実行される処理量は、間引きがされない1フレーム分の画像データ32に対して実行される処理量よりも少なくなる。図15の例では、画像データ32aに基づくサンプリング画像36φ1に対するステップS11~ステップS13の処理が、1フレーム期間の略1/4の期間で完了している。 The processes of steps S11 to S13 are performed within a period of one frame. Here, the sampled image 36φ1 to be processed is an image in which the number of pixels is reduced with respect to the image data 32a by thinning out by the subsampling process. Therefore, the amount of processing executed for the image data 32a is smaller than the amount of processing executed for the image data 32 for one frame that is not thinned out. In the example of FIG. 15, the processing of steps S11 to S13 for the sampled image 36φ1 based on the image data 32a is completed in a period of approximately 1/4 of the one-frame period.
 上述のフレーム#1に対する処理と並行して、次のフレーム#2に対する処理が実行される。フレーム#2の画素データからなる画像データ32bが前処理部210に入力される。前処理部210は、入力された画像データ32bに対して、画像データ32aとは異なる第2の位相φ2でサブサンプリング処理を施して、サンプリング画像36φ2を生成する。 In parallel with the above-mentioned processing for frame # 1, the processing for the next frame # 2 is executed. Image data 32b composed of pixel data of frame # 2 is input to the preprocessing unit 210. The preprocessing unit 210 performs subsampling processing on the input image data 32b in a second phase φ2 different from that of the image data 32a to generate a sampled image 36φ2.
 前処理部210は、サブサンプリングにより画像データ32bより画素数が削減されたサンプリング画像36φ2を認識部220に渡す。認識部220は、このサンプリング画像36φ2に対して、1フレームの期間内に認識処理を実行する。この場合においても、上述と同様に、当該認識処理が1フレーム期間の略1/4の期間で完了している。 The pre-processing unit 210 passes the sampled image 36φ2, which has a smaller number of pixels than the image data 32b by subsampling, to the recognition unit 220. The recognition unit 220 executes the recognition process on the sampled image 36φ2 within a period of one frame. In this case as well, as described above, the recognition process is completed in a period of approximately 1/4 of the one-frame period.
 このとき、認識部220は、サンプリング画像36φ2から抽出した特徴量50bと、画像データ32aに対する特徴量抽出処理により抽出された特徴量50aとを、ステップS12で特徴量統合処理により統合する。認識部220は、この統合された特徴量を用いて、認識処理を実行する。この認識処理による認識結果φ2は、認識処理部20bの外部に出力される。 At this time, the recognition unit 220 integrates the feature amount 50b extracted from the sampled image 36φ2 and the feature amount 50a extracted by the feature amount extraction process for the image data 32a by the feature amount integration process in step S12. The recognition unit 220 executes the recognition process using the integrated feature amount. The recognition result φ2 by this recognition process is output to the outside of the recognition process unit 20b.
 以降、同様にして、前処理部210は、次のフレーム#3の画像データ32cについて、直前のフレーム#2の画像データ32bに対する処理と並行して、第3の位相φ3によるサブサンプリング処理を実行し、認識部220は、サブサンプリング処理により生成されたサンプリング画像36φ3から特徴量50cを抽出する。認識部220は、画像データ32aおよび32bそれぞれから抽出された特徴量50aおよび50bが統合された特徴量と、抽出した特徴量50cと、をさらに統合し、統合された特徴量に基づき認識処理を実行する。認識部220は、この認識処理により得られた認識結果φ3を、外部に出力する。この場合においても、上述と同様に、当該認識処理が1フレーム期間の略1/4の期間で完了している。 After that, in the same manner, the preprocessing unit 210 executes subsampling processing with the third phase φ3 for the image data 32c of the next frame # 3 in parallel with the processing for the image data 32b of the immediately preceding frame # 2. Then, the recognition unit 220 extracts the feature amount 50c from the sampled image 36φ3 generated by the subsampling process. The recognition unit 220 further integrates the feature amount 50a and 50b extracted from the image data 32a and 32b, respectively, and the extracted feature amount 50c, and performs recognition processing based on the integrated feature amount. Run. The recognition unit 220 outputs the recognition result φ3 obtained by this recognition process to the outside. In this case as well, as described above, the recognition process is completed in a period of approximately 1/4 of the one-frame period.
 認識処理部20bは、次のフレーム#4の画像データ32dについても、同様にして、直前のフレーム#3の画像データ32cに対する処理と並行して、第4の位相φ4によるサブサンプリング処理、特徴量抽出処理を行い、特徴量50dを取得する。認識処理部20bは、認識部220により、画像データ32a~32cそれぞれから抽出された特徴量50a~50cが統合された特徴量と、抽出した特徴量50dと、をさらに統合し、統合された特徴量に基づき認識処理を実行する。認識部220は、この認識処理により得られた認識結果φ4を、外部に出力する。この場合においても、上述と同様に、当該認識処理が1フレーム期間の略1/4の期間で完了している。 Similarly, for the image data 32d of the next frame # 4, the recognition processing unit 20b performs subsampling processing and feature quantity by the fourth phase φ4 in parallel with the processing for the image data 32c of the immediately preceding frame # 3. Extraction processing is performed to obtain a feature amount of 50d. The recognition processing unit 20b further integrates the feature amount 50a to 50c extracted from each of the image data 32a to 32c by the recognition unit 220 and the extracted feature amount 50d, and the integrated feature. Execute recognition processing based on the quantity. The recognition unit 220 outputs the recognition result φ4 obtained by this recognition process to the outside. In this case as well, as described above, the recognition process is completed in a period of approximately 1/4 of the one-frame period.
 ここで、図15において、垂直方向の矢印、すなわち、各画像データ32a~32d、各ステップS10a~ステップS10dから各認識処理、および、各認識処理による各認識結果φ1~φ4の出力を示す矢印は、その太さが情報量を概略的に示している。 Here, in FIG. 15, the vertical arrows, that is, the arrows indicating the output of each recognition process from each image data 32a to 32d, each step S10a to step S10d, and each recognition result φ1 to φ4 by each recognition process are shown. , Its thickness outlines the amount of information.
 より具体的には、図15の例では、ステップS10a~ステップS10dの処理のために前処理部210に入力される各画像データ32a~32dのデータ量に対して、前処理部210からステップS10a~ステップS10dの処理によりサブサンプリングされて認識部220に渡されるサンプリング画像36φ1~φ4の方がデータ量が少ない。 More specifically, in the example of FIG. 15, the preprocessing unit 210 to step S10a with respect to the amount of data of each image data 32a to 32d input to the preprocessing unit 210 for the processing of steps S10a to S10d. The amount of data in the sampled images 36φ1 to φ4 subsampled by the process of step S10d and passed to the recognition unit 220 is smaller.
 一方、各画像データ32a~32dに基づく認識処理による各認識結果φ1~φ4の情報量は、認識処理を重ねる毎に多くなり、得られる認識結果が、認識処理毎により詳細となっていくことを示している。これは、認識処理毎に、直前までにサンプリング画像の位相をずらしつつ取得した特徴量と、直前のサンプリング画像に対してさらに位相をずらして新たに取得された特徴量と、を統合した特徴量を用いているためである。 On the other hand, the amount of information of each recognition result φ1 to φ4 by the recognition process based on each image data 32a to 32d increases as the recognition process is repeated, and the obtained recognition result becomes more detailed for each recognition process. Shown. This is a feature amount that integrates the feature amount acquired while shifting the phase of the sampled image up to the previous time and the feature amount newly acquired by further shifting the phase with respect to the sampled image immediately before each recognition process. This is because it uses.
(1-3-1-2.第2の例)
 図16は、各実施形態に係る認識処理の基本的なアーキテクチャにおける読み出しおよび認識処理の、第2の例を示す一例のタイムチャートである。この第2の例では、サブサンプリングによる第1~第4の位相のサンプリング画像36φ1~36φ4を、1フレームの画像データ32からそれぞれ抽出する例である。すなわち、この第2の例では、第1~第4の位相のサンプリング画像36φ1~36φ4による認識処理が1フレームで完結するもので、以下、適宜、イントラフレーム(intra-frame)処理と呼ぶ。
(1-3-1-2. Second example)
FIG. 16 is an example time chart showing a second example of reading and recognition processing in the basic architecture of recognition processing according to each embodiment. In this second example, the sampled images 36φ1 to 36φ4 of the first to fourth phases by subsampling are extracted from the image data 32 of one frame, respectively. That is, in this second example, the recognition process by the sampled images 36φ1 to 36φ4 of the first to fourth phases is completed in one frame, and is hereinafter appropriately referred to as an intra-frame process.
 図16における各部の意味は、上述した図15と同様であるので、ここでの詳細な説明を省略する。 Since the meaning of each part in FIG. 16 is the same as that in FIG. 15 described above, detailed description here will be omitted.
 例えばフレーム#1の撮像処理において、各ラインで所定時間の露光が行われ、露光の終了後、各画素回路1000から画素信号が垂直信号線VSLを介してAD変換部1003へ転送され、AD変換部1003において、各AD変換器1007により、転送されたアナログ方式の画素信号がデジタル信号である画素データに変換される。全てのラインについて、画素信号の画素データへの変換が行われると、フレーム#1の画素データによる画像データ32aが前処理部210に入力される。 For example, in the imaging process of frame # 1, each line is exposed for a predetermined time, and after the exposure is completed, the pixel signal is transferred from each pixel circuit 1000 to the AD conversion unit 1003 via the vertical signal line VSL to perform AD conversion. In unit 1003, each AD converter 1007 converts the transferred analog pixel signal into pixel data which is a digital signal. When the pixel signal is converted into pixel data for all lines, the image data 32a based on the pixel data of frame # 1 is input to the preprocessing unit 210.
 前処理部210は、例えば図16において最初の1フレームの画像データ32aに対して上述したような第1の位相φ1のサブサンプリングを施し、分割領域35毎に選択されたサンプリング画素の画素位置から画素300を取得し、第1の位相φ1によるサンプリング画像36φ1を生成する(ステップS10a)。 For example, in FIG. 16, the preprocessing unit 210 performs subsampling of the first phase φ1 as described above on the image data 32a of the first frame in FIG. 16, and starts from the pixel positions of the sampling pixels selected for each division region 35. Pixels 300 are acquired and a sampled image 36φ1 with the first phase φ1 is generated (step S10a).
 前処理部210は、当該画像データ32aに対する第1の位相φ1のサブサンプリングが終了すると、当該画像データ32bに対する第2の位相φ2のサブサンプリングを実行する。前処理部210は、この第2の位相φ2のサブサンプリングにより取得された各サンプリング画素により第2の位相φ2によるサンプリング画像36φ2を生成する(ステップS10b)。以降、前処理部210は、当該画像データ32aに対する位相の異なるサブサンプリング(第3の位相φ3のサブサンプリング、第4の位相φ4のサブサンプリング)をそれぞれ実行し、第3の位相φ3によるサンプリング画像36φ3、および、第4の位相φ4によるサンプリング画像36φ4をそれぞれ生成する(ステップS10c、ステップS10d)。 When the subsampling of the first phase φ1 for the image data 32a is completed, the preprocessing unit 210 executes the subsampling of the second phase φ2 for the image data 32b. The preprocessing unit 210 generates a sampled image 36φ2 in the second phase φ2 from each sampling pixel acquired by the subsampling of the second phase φ2 (step S10b). After that, the preprocessing unit 210 executes subsampling with different phases (subsampling of the third phase φ3, subsampling of the fourth phase φ4) with respect to the image data 32a, and the sampled image by the third phase φ3. A sampled image 36φ4 with 36φ3 and a fourth phase φ4 is generated (step S10c, step S10d), respectively.
 このように、前処理部210は、これら第1~第4の位相φ1~φ4によるサブサンプリングを、1フレームの画像データ32aに対して、1フレーム期間内にそれぞれ実行する。 In this way, the preprocessing unit 210 executes subsampling according to the first to fourth phases φ1 to φ4 for one frame of image data 32a within one frame period, respectively.
 認識部220は、前処理部210により画像データ32aに基づき生成した第1の位相φ1のサンプリング画像36φ1に対して特徴量抽出処理を実行し(ステップS11a)、特徴量を抽出する。認識部220は、統合可能な特徴量が蓄積されている場合、ステップS11aで抽出した特徴量を、蓄積された統合可能な特徴量と統合することができる(ステップS12a)。認識部220は、例えばステップS12aで統合された特徴量に基づき認識処理を実行し(ステップS13a)、第1の位相による認識結果φ1を出力する。 The recognition unit 220 executes a feature amount extraction process on the sampled image 36φ1 of the first phase φ1 generated based on the image data 32a by the preprocessing unit 210 (step S11a), and extracts the feature amount. When the feature amount that can be integrated is accumulated, the recognition unit 220 can integrate the feature amount extracted in step S11a with the accumulated feature amount that can be integrated (step S12a). For example, the recognition unit 220 executes the recognition process based on the feature quantity integrated in step S12a (step S13a), and outputs the recognition result φ1 by the first phase.
 認識部220は、前処理部210により画像データ32aに基づき生成した第2の位相φ2のサンプリング画像36φ2に対して特徴量抽出処理を実行し(ステップS11b)、特徴量を抽出する。認識部220は、統合可能な特徴量が蓄積されている場合、ステップS11bで抽出した特徴量を、蓄積された統合可能な特徴量と統合することができる(ステップS12b)。この例では、例えば、当該ステップS11bで抽出した特徴量と、上述したステップS11aで抽出した特徴量とを統合することができる、認識部220は、統合された特徴量に対して認識処理を行い(ステップS13b)、第2の位相φ2による認識結果φ2を出力する。 The recognition unit 220 executes a feature amount extraction process on the sampled image 36φ2 of the second phase φ2 generated based on the image data 32a by the preprocessing unit 210 (step S11b), and extracts the feature amount. When the feature amount that can be integrated is accumulated, the recognition unit 220 can integrate the feature amount extracted in step S11b with the accumulated feature amount that can be integrated (step S12b). In this example, for example, the recognition unit 220, which can integrate the feature amount extracted in step S11b and the feature amount extracted in step S11a described above, performs recognition processing on the integrated feature amount. (Step S13b), the recognition result φ2 by the second phase φ2 is output.
 以降、同様にして、認識部220は、前処理部210により画像データ32aに基づき生成した第3および第4の位相φ3およびφ4のサンプリング画像36φ3および36φ4に対して特徴量抽出処理を実行し(ステップS11c、ステップS11d)、特徴量を抽出する。認識部220は、ステップS11cおよびステップS11dにより抽出された各特徴量を、それぞれ直前の統合処理までにおいて統合された特徴量と順次に統合する(ステップS12c、ステップS12d)。認識部220は、例えば各位相φ3およびφ4において統合された各特徴量に基づき認識処理を実行し、各位相φ3およびφ4の認識結果φ3およびφ4をそれぞれ出力する。 After that, in the same manner, the recognition unit 220 executes the feature amount extraction process on the sampled images 36φ3 and 36φ4 of the third and fourth phases φ3 and φ4 generated by the preprocessing unit 210 based on the image data 32a (). Step S11c, step S11d), the feature amount is extracted. The recognition unit 220 sequentially integrates each feature amount extracted in step S11c and step S11d with the feature amount integrated up to the immediately preceding integration process (step S12c, step S12d). The recognition unit 220 executes recognition processing based on, for example, each feature quantity integrated in each phase φ3 and φ4, and outputs recognition results φ3 and φ4 of each phase φ3 and φ4, respectively.
 図16の例では、上述した各位相φ1~φ4における各特徴量抽出処理(ステップS11a~ステップS11d)と、各統合処理(ステップS12a~ステップS12d)と、各認識処理(ステップS13a~ステップS13d)と、を1フレームの期間内に実行している。すなわち、認識部220は、1フレームの画像データ32aをサブサンプリング処理により画素を間引いた各サンプリング画像36φ1~36φ4に対して認識処理を行う。そのため、認識部220におけるそれぞれの認識処理の計算量が少なくて済み、各認識処理を短時間で実行することが可能である。 In the example of FIG. 16, each feature amount extraction process (step S11a to step S11d), each integration process (step S12a to step S12d), and each recognition process (step S13a to step S13d) in each of the phases φ1 to φ4 described above. And are executed within the period of one frame. That is, the recognition unit 220 performs recognition processing on each sampled image 36φ1 to 36φ4 in which pixels are thinned out by subsampling the image data 32a of one frame. Therefore, the amount of calculation of each recognition process in the recognition unit 220 is small, and each recognition process can be executed in a short time.
 図17は、上述した第2の例による処理(イントラフレーム処理)による効果を説明するための模式図である。図17Aは、上述した第2の例による処理と、既存技術による処理とを比較する一例のタイムチャートであり、右方向に向けて時間の経過を表している。図17Aにおいて、セクション(a)は、既存技術による読み出しおよび認識処理の例を示す。また、セクション(b)は、上述した第2の例による読み出しおよび認識処理の例を示す。 FIG. 17 is a schematic diagram for explaining the effect of the processing (intraframe processing) according to the second example described above. FIG. 17A is an example time chart comparing the processing according to the second example described above with the processing according to the existing technique, and shows the passage of time toward the right. In FIG. 17A, section (a) shows an example of reading and recognition processing by existing technology. Further, section (b) shows an example of reading and recognizing processing according to the second example described above.
 セクション(a)および(b)において、時間t0~t1の期間に撮像処理が実行される。撮像処理は、画素アレイ部1001における所定時間の露光と、露光に応じて光電変換素子により生成された電荷に基づく各画素データの転送処理と、を含む。撮像処理により画素アレイ部1001から転送された各画素データは、例えば1フレーム分の画像データとしてフレームメモリに記憶される。 In sections (a) and (b), the imaging process is performed during a period of time t 0 to t 1. The imaging process includes exposure in the pixel array unit 1001 for a predetermined time, and transfer processing of each pixel data based on the electric charge generated by the photoelectric conversion element in response to the exposure. Each pixel data transferred from the pixel array unit 1001 by the imaging process is stored in the frame memory as, for example, one frame of image data.
 セクション(a)および(b)において、例えば時間t1からフレームメモリに記憶された画像データの読み出しが開始される。ここで、セクション(a)の既存技術による処理では、1フレーム分の画像データの読み出しが終了(時間t4)した後に、当該1フレーム分の画像データに対する認識処理が開始される。ここでは、説明のため、この認識処理は、時間t4から1フレーム期間が経過した時間t6で終了するものとする。 In Section (a) and (b), reading of the image data stored in the frame memory is started, for example, from time t 1. Here, in the processing by the existing technique of the section (a), the recognition processing for the image data for one frame is started after the reading of the image data for one frame is completed (time t 4). Here, for the sake of explanation, it is assumed that this recognition process ends at the time t 6 when one frame period elapses from the time t 4.
 一方、セクション(b)の第2の例による処理では、セクション(a)の例と同様に、時間t1の後にフレームメモリからの画像データの読み出しが開始される。ここで、第2の例では、第1の位相φ1のサブサンプリングによるサンプリング画像36φ1の読み出しが、例えば1フレーム期間の1/4の時間である時間t1~t2の期間に実行され、同様に、当該サンプリング画像36φ1に対する認識処理が、例えば1フレーム期間の1/4の時間である時間t2~t3の期間に実行され、認識結果φ1が出力される。 On the other hand, in the process according to the second example of the section (b), the reading of the image data from the frame memory is started after the time t 1 as in the example of the section (a). Here, in the second example, the reading of the sampled image 36φ1 by the subsampling of the first phase φ1 is executed, for example, during the period t 1 to t 2 which is 1/4 of the one frame period, and the same is true. In addition, the recognition process for the sampled image 36φ1 is executed, for example, during the period t 2 to t 3 , which is 1/4 of the one frame period, and the recognition result φ1 is output.
 第2の例による処理では、以降、同様にして、第2~第4の位相φ2~φ4のサブサンプリングによるサンプリング画像36φ2~36φ4の読み出しが、それぞれ例えば1フレーム期間の1/4の時間である時間t2~t3、…に実行され、例えば時間t4において終了される。 In the processing according to the second example, thereafter, in the same manner, the reading of the sampled images 36φ2 to 36φ4 by the subsampling of the second to fourth phases φ2 to φ4 is, for example, 1/4 of the time of one frame period. It is executed at times t 2 to t 3 , ..., And ends at time t 4, for example.
 サンプリング画像36φ2に対する認識処理が時間t2に開始され、例えば1フレーム期間の1/4の時間を経過した時間t3に終了され、認識結果φ2が出力される。他のサンプリング画像36φ3、36φ4に対する認識処理も、直前のサンプリング画像に対する認識処理に続けて実行され、例えばそれぞれ1フレーム期間の1/4の時間で終了され、それぞれ認識結果φ3およびφ4が出力される。図17Aの例では、1フレームの画像データ32における最後のサブサンプリングによるサンプリング画像36φ4に対する認識処理が、時間t5で終了している。 Recognition processing for the sampling image 36φ2 is started to a time t 2, for example, be terminated elapsed time 1/4 time of one frame period t 3, the recognition result φ2 is output. The recognition process for the other sampled images 36φ3 and 36φ4 is also executed following the recognition process for the immediately preceding sampled image. .. In the example of FIG. 17A, the recognition processing for the sampling image 36φ4 by the last sub-sampling of the image data 32 for one frame has ended at time t 5.
 図17Bは、第2の例による各認識結果を模式的に示す図である。図17Bにおいて、上段、中段および下段は、それぞれ第1の位相φ1、第2の位相φ2および第4の位相φ4に対する認識処理による各認識結果φ1、φ2およびφ4の例を、それぞれ示している。 FIG. 17B is a diagram schematically showing each recognition result according to the second example. In FIG. 17B, the upper stage, the middle stage, and the lower stage show examples of the recognition results φ1, φ2, and φ4 by the recognition processing for the first phase φ1, the second phase φ2, and the fourth phase φ4, respectively.
 また、図17Bの上段、中段および下段の各図において、認識対象が人であって、センサ部10b(情報処理装置1b)からそれぞれ異なる距離にいる3人の画像が1フレームに含まれている場合を示している。図17Bの上段、中段および下段の各図において、フレーム95に対して、それぞれ人の画像であって、大きさの異なる3つのオブジェクト96L、96Mおよび96Sが含まれている。これらのうち、オブジェクト96Lが最も大きく、フレーム95に含まれる3人のうち、当該オブジェクト96Lに対応する人が当該センサ部10bの最も近距離にいることになる。また、オブジェクト96L、96Mおよび96Sのうち最も小さいオブジェクト96Sは、フレーム95に含まれる3人のうち、当該オブジェクト96Sに対応する人が当該センサ部10bに対して最も遠距離にいる人を表している。 Further, in each of the upper, middle, and lower views of FIG. 17B, images of three people whose recognition targets are people and are at different distances from the sensor unit 10b (information processing device 1b) are included in one frame. Shows the case. In the upper, middle, and lower views of FIG. 17B, three objects 96L, 96M, and 96S, which are images of people and have different sizes, are included with respect to the frame 95. Of these, the object 96L is the largest, and of the three persons included in the frame 95, the person corresponding to the object 96L is the closest to the sensor unit 10b. Further, the smallest object 96S among the objects 96L, 96M and 96S represents the person whose person corresponding to the object 96S is the farthest from the sensor unit 10b among the three people included in the frame 95. There is.
 図17Bにおいて、認識結果φ1は、上述したサンプリング画像36φ1に対して認識処理を実行し、最も大きなオブジェクト96Lが認識された例である。認識結果φ2は、認識結果φ1における特徴量に対して、さらにサンプリング画像36φ2から抽出された特徴量が統合され、次に大きなオブジェクト96Mが認識された例である。また、認識結果φ4は、サンプリング画像36φ4から抽出された特徴量と、サンプリング画像36φ2から抽出された特徴量と、次のサンプリング画像36φから抽出された特徴量と、が統合され、オブジェクト96Lおよび96Mに加え、最も小さなオブジェクト96Sが認識された様子が示されている。 In FIG. 17B, the recognition result φ1 is an example in which the recognition process is executed on the above-mentioned sampled image 36φ1 and the largest object 96L is recognized. The recognition result φ2 is an example in which the feature amount extracted from the sampled image 36φ2 is further integrated with the feature amount in the recognition result φ1 and the next largest object 96M is recognized. Further, in the recognition result φ4, the feature amount extracted from the sampled image 36φ4, the feature amount extracted from the sampled image 36φ2, and the feature amount extracted from the next sampled image 36φ are integrated, and the objects 96L and 96M are integrated. In addition, it is shown that the smallest object 96S is recognized.
 このように、1フレームの画像データ32から、各サンプリング画像36φ1、36φ2、…の特徴量を抽出し、抽出した特徴量を蓄積および統合していくことで、順次、より遠方にいる人を認識できるようになる。このとき、認識結果φ1として示されるように、最初のサブサンプリングによるサンプリング画像36φ1に基づく認識処理により、最も大きなオブジェクト96Lが認識されている。 In this way, by extracting the feature amounts of each sampled image 36φ1, 36φ2, ... From the image data 32 of one frame and accumulating and integrating the extracted feature amounts, a person who is farther away is sequentially recognized. become able to. At this time, as shown as the recognition result φ1, the largest object 96L is recognized by the recognition process based on the sampled image 36φ1 by the first subsampling.
 このように、第2の例では、フレームに対する最初のサブサンプリングによるサンプリング画像36φ1に基づき、概略的な認識結果φ1を得ることができる。認識結果φ1は、図17Aにおいては時間t3に出力が可能とされ、図中に矢印Bにより示されるように、既存技術により認識結果が出力される時間t6に対して低レイテンシ化が実現できる。 As described above, in the second example, a rough recognition result φ1 can be obtained based on the sampled image 36φ1 by the first subsampling for the frame. The recognition result φ1 can be output at time t 3 in FIG. 17A, and as shown by the arrow B in the figure, low latency is realized with respect to the time t 6 at which the recognition result is output by the existing technology. can.
 この第2の例による、フレームに対する最初のサブサンプリングによるサンプリング画像36φ1に基づく認識結果φ1が、速報結果となる。この速報結果は、上述した第1の例にも適用可能である。 The recognition result φ1 based on the sampled image 36 φ1 by the first subsampling for the frame according to this second example is the breaking news result. This breaking news result is also applicable to the first example described above.
 また、第2の例では、フレームに対する最後のサブサンプリングにおける認識処理は、当該フレームにおける各サンプリング画像36φ1~36φ4から抽出された各特徴量を統合した特徴量に基づき行われるため、より高精度の認識結果φ4を得ることができる。この認識結果φ4は、例えば既存技術による認識処理と同等の精度を実現可能である。また、この最後のサブサンプリングは、既存技術により読み出し処理が終了する時間t4に対して例えば1/4フレーム期間分が経過した時間t5に終了する。このように、第2の例では、既存技術と同等の精度を、図中に矢印Aで示されるように既存技術による認識処理に対してより短時間で取得することが可能となり、低レイテンシ化を図ることができる。 Further, in the second example, the recognition process in the final subsampling for the frame is performed based on the feature quantity that integrates the feature quantities extracted from each sampled image 36φ1 to 36φ4 in the frame, so that the accuracy is higher. The recognition result φ4 can be obtained. This recognition result φ4 can realize, for example, the same accuracy as the recognition processing by the existing technology. Moreover, this last sub-sampling, 1/4 frame period, for example with respect to the time t 4 when read process is completed by an existing technique is terminated and the time t 5 elapses. As described above, in the second example, the accuracy equivalent to that of the existing technology can be obtained in a shorter time than the recognition processing by the existing technology as shown by the arrow A in the figure, resulting in lower latency. Can be planned.
 この第2の例による、フレームに対する最後のサブサンプリングによるサンプリング画像36φ4、および、当該サンプリング画像36φ4より以前に取得された各サンプリング画像36φ1~36φ3からそれぞれ抽出された特徴量を統合した特徴量に基づく認識結果φ4が、統合結果となる。この統合結果は、上述した第1の例にも適用可能である。 Based on the feature quantity that integrates the sampled image 36φ4 by the last subsampling for the frame and the feature quantity extracted from each sampled image 36φ1 to 36φ3 acquired before the sampled image 36φ4 according to this second example. The recognition result φ4 is the integration result. This integration result is also applicable to the first example described above.
 以下では、特に記載の無い場合、画像データ32からのサブサンプリングによる読み出しおよび認識処理に関して、上述した第1の例および第2の例のうち、第2の例を適用するものとして説明を行う。    In the following, unless otherwise specified, the second example of the first and second examples described above will be described with respect to the reading and recognition processing by subsampling from the image data 32. Twice
[2.第1の実施形態]
(2-1.第1の実施形態の概要)
 次に、本開示の第1の実施形態について説明する。本開示の第1の実施形態では、1フレームの画像データ32からサブサンプリングにより生成された複数のサンプリング画像36φxのそれぞれに基づき認識処理を行った各認識結果φxの何れか、若しくは、各認識結果φxのうち複数の認識結果の組み合わせを適応的に出力可能としている。このようにすることで、例えば環境や状況に応じた認識結果を得ることが可能となる。
[2. First Embodiment]
(2-1. Outline of the first embodiment)
Next, the first embodiment of the present disclosure will be described. In the first embodiment of the present disclosure, any one of the recognition results φx obtained by performing recognition processing based on each of the plurality of sampled images 36φx generated by subsampling from the image data 32 of one frame, or each recognition result. It is possible to adaptively output a combination of a plurality of recognition results among φx. By doing so, for example, it is possible to obtain a recognition result according to the environment and the situation.
 以下では、前処理部210が、図11を用いて説明した、分割領域35を4画素×4画素とし、1画素おきの間引きによりサブサンプリングを行い、1フレームの画像データ32を時間軸展開して、位相をずらした4つのサンプリング画像36φ1、36φ2、36φ3および36φ4を生成するものとする。 In the following, the preprocessing unit 210 sets the division region 35 as 4 pixels × 4 pixels and performs subsampling by thinning out every other pixel, which is described with reference to FIG. 11, and expands the image data 32 of one frame on the time axis. Therefore, it is assumed that four sampled images 36φ1, 36φ2, 36φ3, and 36φ4 that are out of phase are generated.
 図18は、第1の実施形態に係る認識器の一例の構成を概略的に示す図である。図18において、左端は、1フレームの画像データ32を、第1の位相φ1~第4の位相φ4の4つの位相の画素300φ1、300φ2、300φ3および300φ4に従い4分割した様子を示している。第1~第4の位相φ1~φ4によるサブサンプリング処理(ステップS11a~ステップS11d)により、各位相のサンプリング画像36φ1~36φ4が生成される。 FIG. 18 is a diagram schematically showing a configuration of an example of a recognizer according to the first embodiment. In FIG. 18, the left end shows a state in which the image data 32 of one frame is divided into four according to the pixels 300φ1, 300φ2, 300φ3, and 300φ4 of the four phases of the first phase φ1 to the fourth phase φ4. Sampling images 36φ1 to 36φ4 of each phase are generated by the subsampling process (steps S11a to S11d) according to the first to fourth phases φ1 to φ4.
 ここでは、各位相のサンプリング画像36φ1~36φ4は、第1の位相φ1、第2の位相φ2、第3の位相φ3、第4の位相φ4の順に生成されるものとする。 Here, it is assumed that the sampled images 36φ1 to 36φ4 of each phase are generated in the order of the first phase φ1, the second phase φ2, the third phase φ3, and the fourth phase φ4.
 なお、画像データ32の分割方法は、上述した4画素×4画素のサイズを持つ分割領域35による4分割(=2×2)に限定されない。例えば、分割領域35のサイズを8画素×8画素としても良いし(この場合には、4×4の16分割となる)、分割領域35をさらに他のサイズとしても良い。さらには、分割領域35は、正方形でなくてもよく、また、矩形にも限られない。 The method of dividing the image data 32 is not limited to the above-mentioned four divisions (= 2 × 2) by the division area 35 having the size of 4 pixels × 4 pixels. For example, the size of the divided area 35 may be 8 pixels × 8 pixels (in this case, 4 × 4 is divided into 16), or the divided area 35 may be further set to another size. Furthermore, the divided region 35 does not have to be square and is not limited to a rectangle.
 また、画像データ32の全体、または、所定に設定された分割領域35の任意の画素位置を選択し、選択された画素位置の画素300をサンプリング画素としてもよい。ここで、任意に選択した複数の画素位置は、例えば、離散的および非周期的な複数の画素位置を含む。例えば、前処理部210は、疑似乱数を用いて、当該複数の画素位置を選択することができる。また、選択される画素位置は、フレーム毎に異ならせることが好ましいが、一部の画素位置がフレーム間で重複してもよい。 Further, the entire image data 32 or an arbitrary pixel position of the predetermined division area 35 may be selected, and the pixel 300 at the selected pixel position may be used as the sampling pixel. Here, the plurality of pixel positions arbitrarily selected include, for example, a plurality of discrete and aperiodic pixel positions. For example, the preprocessing unit 210 can select the plurality of pixel positions by using pseudo-random numbers. Further, the selected pixel positions are preferably different for each frame, but some pixel positions may overlap between the frames.
 各位相のサンプリング画像36φ1~36φ4に対して、それぞれ特徴量抽出処理がなされる(ステップS11a~ステップS11d)。ステップS11aにより最初に抽出されるサンプリング画像36φ1の特徴量が、ステップS12aにより、既に蓄積されている特徴量と統合される。図18の例では、当該サンプリング画像36φ1から抽出される特徴量に対して、そのまま認識処理を行い認識結果φ1を取得するように示している(ステップS13a)。 Feature extraction processing is performed on the sampled images 36φ1 to 36φ4 of each phase (steps S11a to S11d). The feature amount of the sampled image 36φ1 first extracted in step S11a is integrated with the feature amount already accumulated in step S12a. In the example of FIG. 18, it is shown that the feature amount extracted from the sampled image 36φ1 is directly subjected to the recognition process and the recognition result φ1 is acquired (step S13a).
 このステップS13aによる認識処理の認識結果φ1は、1フレームの画像データ32から生成される各サンプリング画像36φ1~36φ4に基づく各認識結果φ1~φ4のうち最初に取得される認識結果であるため、速報結果と呼ぶ。 The recognition result φ1 of the recognition process in step S13a is a breaking news because it is the first recognition result of the recognition results φ1 to φ4 based on the sampling images 36 φ1 to 36φ4 generated from the image data 32 of one frame. Call it the result.
 次にステップS11bにより抽出されたサンプリング画像36φ2の特徴量が、ステップS12bにより、ステップS11aでサンプリング画像36φ1から抽出された特徴量と統合される。ステップS12bで統合された特徴量に対して、認識処理が行われ認識結果φ2が取得される(ステップS13b)。それと共に、当該統合された特徴量が、ステップS11cにより抽出されたサンプリング画像36φ3の特徴量と統合される(ステップS12c)。すなわち、ステップS12cでは、サンプリング画像36φ1、36φ2および36φ3からそれぞれ抽出された特徴量が統合される。 Next, the feature amount of the sampled image 36φ2 extracted in step S11b is integrated with the feature amount extracted from the sampled image 36φ1 in step S11a in step S12b. The recognition process is performed on the features integrated in step S12b, and the recognition result φ2 is acquired (step S13b). At the same time, the integrated feature amount is integrated with the feature amount of the sampled image 36φ3 extracted in step S11c (step S12c). That is, in step S12c, the feature quantities extracted from the sampled images 36φ1, 36φ2, and 36φ3 are integrated.
 ステップS12cで統合された特徴量に対して、認識処理が行われ認識結果φ3が取得される(ステップS13c)。それと共に、当該統合された特徴量が、ステップS11dにより抽出されたサンプリング画像36φ4の特徴量と統合される(ステップS12d)。すなわち、ステップS12dでは、サンプリング画像36φ1、36φ2、36φ3および36φ4からそれぞれ抽出された特徴量が統合される。この統合された特徴量に対して、ステップS13dで認識処理が行われ、認識結果φ4が取得される。 Recognition processing is performed on the feature amount integrated in step S12c, and the recognition result φ3 is acquired (step S13c). At the same time, the integrated feature amount is integrated with the feature amount of the sampled image 36φ4 extracted in step S11d (step S12d). That is, in step S12d, the feature quantities extracted from the sampled images 36φ1, 36φ2, 36φ3, and 36φ4 are integrated. Recognition processing is performed on the integrated feature amount in step S13d, and the recognition result φ4 is acquired.
 このステップS13dによる認識処理の認識結果φ4は、サンプリング画像36φ1~36φ4の全てから抽出された特徴量を統合した統合特徴量に基づき取得されるため、統合結果と呼ぶ。統合結果は、1フレームの画像データ32の全ての画素位置の画素300をサンプリング画素とした場合の、認識結果に対応する。 The recognition result φ4 of the recognition process in this step S13d is called an integration result because it is acquired based on the integrated feature amount that integrates the feature amounts extracted from all of the sampled images 36φ1 to 36φ4. The integration result corresponds to the recognition result when the pixels 300 at all the pixel positions of the image data 32 of one frame are used as sampling pixels.
 また、ステップS13bおよびステップS13cによる認識処理の各認識結果φ2およびφ3は、統合結果を取得するまでの途中の認識処理により取得された認識結果であって、中間結果と呼ぶ。 Further, the recognition results φ2 and φ3 of the recognition processing in steps S13b and S13c are recognition results acquired by the recognition processing in the middle of acquiring the integration result, and are called intermediate results.
 上述したように第1の実施形態に係る認識器は、事前に検出された事前情報や、環境情報などに応じて、これら各認識結果φ1~φ4の何れか、若しくは、各認識結果φ1~φ4のうち複数の認識結果の組み合わせを適応的に出力可能とする。 As described above, the recognizer according to the first embodiment has any of these recognition results φ1 to φ4, or each recognition result φ1 to φ4, depending on the prior information detected in advance, the environmental information, and the like. Of these, a combination of multiple recognition results can be output adaptively.
 図19は、第1の実施形態に係る、認識結果をどのように出力するか、すなわち、各認識結果φ1~φ4の何れか、若しくは、各認識結果φ1~φ4のうち複数の認識結果の組み合わせ、のうち、どの認識結果を出力するか、また、出力する認識結果を何を条件として決定するか、を判定するタイミングの例を示す一例のタイムチャートである。なお、図19において、右方向に時間の経過を示している。 FIG. 19 shows how to output the recognition result according to the first embodiment, that is, any one of the recognition results φ1 to φ4, or a combination of a plurality of recognition results among the recognition results φ1 to φ4. It is an example time chart showing an example of the timing for determining which recognition result is to be output and what condition is to determine the recognition result to be output. In FIG. 19, the passage of time is shown in the right direction.
 図19の例では、判定タイミングとして、時系列で早い方から、タイミングP、タイミングQおよびタイミングRの3つのタイミングが示されている。 In the example of FIG. 19, three timings, timing P, timing Q, and timing R, are shown as determination timings from the earliest in the time series.
 タイミングPは、一連の認識処理が開始される前に、事前情報を検出する。事前情報は、例えば、認識処理を実行するために事前に検出する情報であって、事前情報に基づき、例えば認識結果φ1~φ4のうち何れの認識結果を出力するかを決定する。タイミングQは、所定期間100における認識結果に基づき、出力する認識結果を設定する。このとき、所定期間100は、例えば1フレーム期間以上のフレーム単位の期間とすることができる。タイミングRは、フレーム内の所定期間101における認識結果(例えば上述した速報結果あるいは中間結果)に基づき、出力する認識結果を設定する。 Timing P detects prior information before a series of recognition processes are started. The prior information is, for example, information to be detected in advance for executing the recognition process, and based on the prior information, for example, which of the recognition results φ1 to φ4 is to be output is determined. The timing Q sets the recognition result to be output based on the recognition result in the predetermined period 100. At this time, the predetermined period 100 can be, for example, a period of one frame or more in frame units. The timing R sets the recognition result to be output based on the recognition result (for example, the above-mentioned breaking news result or the intermediate result) in the predetermined period 101 in the frame.
 第1の実施形態では、上述したタイミングPの処理に対応し、認識処理が開始される前に検出した事前情報に基づき出力する認識結果を決定する。 In the first embodiment, the recognition result to be output is determined based on the prior information detected before the recognition process is started, corresponding to the above-mentioned timing P process.
(2-2.第1の実施形態に係るより具体的な構成例)
 次に、第1の実施形態に係るより具体的な構成例について説明する。図20Aは、第1の実施形態に係る前処理部210のより詳細な機能を説明するための一例の機能ブロック図である。図20Aにおいて、前処理部210は、利用領域取得部211と、認識結果設定部212と、認識結果出力演算部213と、蓄積部214と、を含む。なお、蓄積部214は、メモリと、当該メモリに対する読み書きを制御するためのメモリ制御部と、を含む。
(2-2. More specific configuration example according to the first embodiment)
Next, a more specific configuration example according to the first embodiment will be described. FIG. 20A is a functional block diagram of an example for explaining a more detailed function of the pretreatment unit 210 according to the first embodiment. In FIG. 20A, the preprocessing unit 210 includes a utilization area acquisition unit 211, a recognition result setting unit 212, a recognition result output calculation unit 213, and a storage unit 214. The storage unit 214 includes a memory and a memory control unit for controlling reading and writing to the memory.
 これら利用領域取得部211、認識結果設定部212、認識結果出力演算部213および蓄積部214(メモリ制御部)は、例えばCPU1205上で動作する情報処理プログラムにより実現される。この情報処理プログラムは、ROM1206に予め記憶させておくことができる。これに限らず、情報処理プログラムは、インタフェース1204を介して外部から供給し、ROM1206に書き込むこともできる。 The used area acquisition unit 211, the recognition result setting unit 212, the recognition result output calculation unit 213, and the storage unit 214 (memory control unit) are realized by, for example, an information processing program running on the CPU 1205. This information processing program can be stored in ROM 1206 in advance. Not limited to this, the information processing program can also be supplied from the outside via the interface 1204 and written to the ROM 1206.
 さらに、利用領域取得部211、認識結果設定部212、認識結果出力演算部213および蓄積部214(メモリ制御部)は、情報処理プログラムに従い、CPU1205およびDSP1203がそれぞれ動作することで実現されてもよい。さらにまた、利用領域取得部211、認識結果設定部212、認識結果出力演算部213および蓄積部214(メモリ制御部)の一部または全部を、互いに協働して動作するハードウェア回路により構成してもよい。 Further, the utilization area acquisition unit 211, the recognition result setting unit 212, the recognition result output calculation unit 213, and the storage unit 214 (memory control unit) may be realized by operating the CPU 1205 and the DSP 1203, respectively, according to the information processing program. .. Furthermore, a part or all of the usage area acquisition unit 211, the recognition result setting unit 212, the recognition result output calculation unit 213, and the storage unit 214 (memory control unit) are configured by a hardware circuit that operates in cooperation with each other. You may.
 前処理部210において、利用領域取得部211は、センサ部10bから画像データ32を読み出す読出部を含む。利用領域取得部211は、読出部によりセンサ部10bから読み出された画像データ32に対して、予め定められたパターン(例えば4画素×4画素のサイズの分割領域35)に従いサブサンプリング処理を施し、サンプリング画素を抽出し、抽出したサンプリング画素により位相φxのサンプリング画像36φxを生成する。すなわち、利用領域取得部211は、サンプリング画像を生成する生成部の機能が実現される。 In the preprocessing unit 210, the used area acquisition unit 211 includes a reading unit that reads the image data 32 from the sensor unit 10b. The used area acquisition unit 211 performs subsampling processing on the image data 32 read from the sensor unit 10b by the reading unit according to a predetermined pattern (for example, a divided area 35 having a size of 4 pixels × 4 pixels). , Sampling pixels are extracted, and a sampled image 36φx having a phase φx is generated from the extracted sampling pixels. That is, the utilization area acquisition unit 211 realizes the function of the generation unit that generates the sampled image.
 利用領域取得部211は、生成したサンプリング画像36φxを認識部220に渡す。なお、利用領域取得部211は、センサ部10bに対して、読み出しを行うラインなどを指定する読出制御を行うことができる。 The used area acquisition unit 211 passes the generated sampled image 36φx to the recognition unit 220. The use area acquisition unit 211 can perform read control on the sensor unit 10b to specify a line or the like for reading.
 図20Bは、第1の実施形態に係る認識部220のより詳細な機能を説明するための一例の機能ブロック図である。図20Bにおいて、認識部220は、特徴量算出部221と、特徴量蓄積制御部222と、特徴量蓄積部223と、認識処理実行部224と、を含む。 FIG. 20B is a functional block diagram of an example for explaining a more detailed function of the recognition unit 220 according to the first embodiment. In FIG. 20B, the recognition unit 220 includes a feature amount calculation unit 221, a feature amount accumulation control unit 222, a feature amount accumulation unit 223, and a recognition process execution unit 224.
 これら特徴量算出部221、特徴量蓄積制御部222、特徴量蓄積部223および認識処理実行部224は、例えばCPU1205上で動作する情報処理プログラムにより実現される。この情報処理プログラムは、ROM1206に予め記憶させておくことができる。これに限らず、情報処理プログラムは、インタフェース1204を介して外部から供給し、ROM1206に書き込むこともできる。 The feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount storage unit 223, and the recognition process execution unit 224 are realized by, for example, an information processing program running on the CPU 1205. This information processing program can be stored in ROM 1206 in advance. Not limited to this, the information processing program can also be supplied from the outside via the interface 1204 and written to the ROM 1206.
 さらに、特徴量算出部221、特徴量蓄積制御部222、特徴量蓄積部223および認識処理実行部224は、情報処理プログラムに従い、CPU1205およびDSP1203がそれぞれ動作することで実現されてもよい。さらにまた、特徴量算出部221、特徴量蓄積制御部222、特徴量蓄積部223および認識処理実行部224の一部または全部を、互いに協働して動作するハードウェア回路により構成してもよい。 Further, the feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount storage unit 223, and the recognition process execution unit 224 may be realized by operating the CPU 1205 and the DSP 1203, respectively, according to the information processing program. Furthermore, a part or all of the feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount storage unit 223, and the recognition processing execution unit 224 may be configured by a hardware circuit that operates in cooperation with each other. ..
 認識部220において、特徴量算出部221、特徴量蓄積制御部222、特徴量蓄積部223および認識処理実行部224は、画像データに基づき認識処理を実行する認識器を構成する。認識部220は、パラメータ記憶部230から渡される認識器情報に応じて、認識器の構築および構成の変更を行うことができる。 In the recognition unit 220, the feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount accumulation unit 223, and the recognition process execution unit 224 constitute a recognizer that executes recognition processing based on image data. The recognition unit 220 can construct the recognizer and change the configuration according to the recognizer information passed from the parameter storage unit 230.
 認識部220において、利用領域取得部211から渡されたサンプリング画像36φxは、特徴量算出部221に入力される。特徴量算出部221は、それぞれ特徴量の演算を行うための1以上の特徴演算部を含み、渡されたサンプリング画像36φxに基づき特徴量を算出する。すなわち、特徴量算出部221は、サンプリング画素により構成されるサンプリング画像36φxの特徴量を算出する算出部として機能する。これに限らず、特徴量算出部221は、例えばセンサ部10bから露出やアナログゲインを設定するための情報を取得し、取得したこれらの情報をさらに用いて特徴量を算出してもよい。特徴量算出部221は、算出した特徴量を、特徴量蓄積制御部222に渡す。 In the recognition unit 220, the sampled image 36φx passed from the usage area acquisition unit 211 is input to the feature amount calculation unit 221. The feature amount calculation unit 221 includes one or more feature calculation units for calculating the feature amount, and calculates the feature amount based on the passed sampled image 36φx. That is, the feature amount calculation unit 221 functions as a calculation unit for calculating the feature amount of the sampled image 36φx composed of sampling pixels. Not limited to this, the feature amount calculation unit 221 may acquire information for setting the exposure and analog gain from, for example, the sensor unit 10b, and further use the acquired information to calculate the feature amount. The feature amount calculation unit 221 passes the calculated feature amount to the feature amount accumulation control unit 222.
 特徴量蓄積制御部222は、特徴量算出部221から渡された特徴量を、特徴量蓄積部223に蓄積する。このとき、特徴量蓄積制御部222は、既に特徴量蓄積部223に蓄積された過去の特徴量と、特徴量算出部221から渡された特徴量とを統合し、統合された特徴量を生成することができる。また、特徴量蓄積制御部222は、特徴量蓄積部223が例えば初期化され特徴量が存在しない場合、特徴量算出部221から渡された特徴量を、最初の特徴量として、特徴量蓄積部223に蓄積する。 The feature amount accumulation control unit 222 accumulates the feature amount passed from the feature amount calculation unit 221 in the feature amount accumulation unit 223. At this time, the feature amount accumulation control unit 222 integrates the past feature amount already accumulated in the feature amount storage unit 223 and the feature amount passed from the feature amount calculation unit 221 to generate the integrated feature amount. can do. Further, when the feature amount storage unit 223 is initialized and the feature amount does not exist, the feature amount accumulation control unit 222 uses the feature amount passed from the feature amount calculation unit 221 as the first feature amount as the feature amount storage unit. Accumulate in 223.
 また、特徴量蓄積制御部222は、特徴量蓄積部223に蓄積された特徴量のうち、不要になった特徴量を削除することができる。不要になった特徴量は、例えば前フレームに係る特徴量や、新たな特徴量が算出されたフレーム画像とは異なるシーンのフレーム画像に基づき算出され既に蓄積された特徴量などである。これに限らず、特徴量蓄積制御部222は、外部からの指示に応じて削除する特徴量を特定することもできる。また、特徴量蓄積制御部222は、特徴量蓄積部223に蓄積された全ての特徴量を、必要に応じて削除して初期化することもできる。 Further, the feature amount accumulation control unit 222 can delete unnecessary feature amounts from the feature amounts accumulated in the feature amount accumulation unit 223. The unnecessary feature amount is, for example, a feature amount related to the previous frame, a feature amount calculated based on a frame image of a scene different from the frame image in which a new feature amount is calculated, and an already accumulated feature amount. Not limited to this, the feature amount accumulation control unit 222 can also specify the feature amount to be deleted in response to an instruction from the outside. Further, the feature amount accumulation control unit 222 can also delete and initialize all the feature amounts accumulated in the feature amount accumulation unit 223, if necessary.
 特徴量蓄積制御222は、特徴量算出部221から特徴量蓄積制御部222に渡された特徴量、あるいは、特徴量蓄積部223に蓄積された特徴量と、特徴量算出部221から渡された特徴量とを統合した特徴量を、認識処理実行部224に渡す。 The feature amount accumulation control 222 is the feature amount passed from the feature amount calculation unit 221 to the feature amount accumulation control unit 222, or the feature amount accumulated in the feature amount accumulation unit 223, and the feature amount is passed from the feature amount calculation unit 221. The feature amount integrated with the feature amount is passed to the recognition processing execution unit 224.
 認識処理実行部224は、特徴量蓄積制御部222から渡された特徴量に基づき物体検出、人検出、顔検出などを行う認識処理を実行する。例えば、認識処理実行部224は、当該特徴量が特徴量算出部221から特徴量蓄積制御部222に渡された特徴量、すなわち、他の特徴量と統合されていない特徴量である場合、認識処理の結果の認識結果として速報結果を出力する。 The recognition process execution unit 224 executes a recognition process that performs object detection, person detection, face detection, etc. based on the feature amount passed from the feature amount accumulation control unit 222. For example, the recognition processing execution unit 224 recognizes when the feature amount is a feature amount passed from the feature amount calculation unit 221 to the feature amount accumulation control unit 222, that is, a feature amount that is not integrated with other feature amounts. The breaking news result is output as the recognition result of the processing result.
 また例えば、認識処理実行部224は、当該特徴量が1フレームの画像データ32から生成される全てのサンプリング画像36φxに基づく全ての特徴量が統合されたものである場合、認識処理の結果の認識結果として統合結果を出力する。さらに、認識処理実行部224は、速報結果と統合結果との中間の認識結果である中間結果を出力することもできる。 Further, for example, the recognition processing execution unit 224 recognizes the result of the recognition processing when the feature amount is a combination of all the feature amounts based on all the sampled images 36φx generated from the image data 32 of one frame. As a result, the integration result is output. Further, the recognition processing execution unit 224 can also output an intermediate result which is an intermediate recognition result between the breaking news result and the integration result.
 説明を図20Aに戻し、蓄積部214は、認識部220により出力された認識結果が累積される。蓄積部214に累積された認識結果は、認識結果出力演算部213に渡される。これに限らず、蓄積部214は、認識処理実行部224から出力された認識結果を、直接的に認識結果出力演算部213に渡すこともできる。認識結果出力演算部213は、蓄積部214から渡された認識結果に基づき、認識結果φ1~φ4のうち、認識部220から出力させる1以上の認識結果を求める。認識結果出力演算部213は、求めた認識結果を、認識結果出力設定部212に渡す。 Returning the explanation to FIG. 20A, the storage unit 214 accumulates the recognition results output by the recognition unit 220. The recognition result accumulated in the storage unit 214 is passed to the recognition result output calculation unit 213. Not limited to this, the storage unit 214 can also directly pass the recognition result output from the recognition processing execution unit 224 to the recognition result output calculation unit 213. The recognition result output calculation unit 213 obtains one or more recognition results to be output from the recognition unit 220 among the recognition results φ1 to φ4 based on the recognition result passed from the storage unit 214. The recognition result output calculation unit 213 passes the obtained recognition result to the recognition result output setting unit 212.
 認識結果出力設定部212は、認識結果出力演算部213から渡された認識結果、あるいは、例えば認識処理部20bの外部から供給される事前情報に基づき、認識部220に出力させる認識結果を設定する。すなわち、認識結果出力設定部212は、認識結果あるいは事前情報に基づき、図19を用いて説明したタイミングP、QおよびRの処理の何れかを実行する。このように、認識結果出力設定部および認識結果出力演算部213は、認識部220による認識結果の出力を制御する出力制御部として機能する。 The recognition result output setting unit 212 sets the recognition result to be output to the recognition unit 220 based on the recognition result passed from the recognition result output calculation unit 213 or, for example, prior information supplied from the outside of the recognition processing unit 20b. .. That is, the recognition result output setting unit 212 executes any of the timing P, Q, and R processes described with reference to FIG. 19 based on the recognition result or prior information. In this way, the recognition result output setting unit and the recognition result output calculation unit 213 function as an output control unit that controls the output of the recognition result by the recognition unit 220.
(2-3.第1の実施形態に係るより具体的な処理)
 次に、第1の実施形態に係る処理について、より具体的に説明する。図21は、第1の実施形態に係る認識処理を示す一例のフローチャートである。なお、以下では、第1の実施形態に係る情報処理装置1bが車載用途とされるものとして説明を行う。
(2-3. More specific processing according to the first embodiment)
Next, the process according to the first embodiment will be described more specifically. FIG. 21 is an example flowchart showing the recognition process according to the first embodiment. In the following description, the information processing device 1b according to the first embodiment will be described as being used for in-vehicle use.
 ステップS100で、認識処理部20bは、認識結果出力設定部212により、事前情報を検出する。事前情報は、例えば、当該認識処理部20bを含む情報処理装置1bが搭載される車両の車体情報、当該車両(情報処理装置1b)の現在位置を示す位置情報、現在の日時を示す時間情報を適用することができる。 In step S100, the recognition processing unit 20b detects advance information by the recognition result output setting unit 212. The prior information includes, for example, vehicle body information of a vehicle on which the information processing device 1b including the recognition processing unit 20b is mounted, position information indicating the current position of the vehicle (information processing device 1b), and time information indicating the current date and time. Can be applied.
 より具体的には、車体情報としては車両の走行速度を適用できる。また、位置情報は、例えば当該車両あるいは情報処理装置1b自身にGNSS(Global Navigation Satellite System)やSLAM(Simultaneous Localization and Mapping)といった自己位置取得手段を設け、この自己位置取得手段により取得することができる。取得された位置情報に基づきマップ情報を参照することで、国や地域を特定できる。この場合、地域は、日本の場合、県など広範な地域や、市街地内の特定地域(商店街、スクールゾーンなど)を含む。時間情報は、例えば当該車両あるいは情報処理装置1b自身に搭載されるタイマやカレンダから取得することができ、昼夜の別や、季節を知ることができる。 More specifically, the traveling speed of the vehicle can be applied as the vehicle body information. Further, the position information can be acquired by providing a self-position acquisition means such as GNSS (Global Navigation Satellite System) or SLAM (Simultaneous Localization and Mapping) in the vehicle or the information processing device 1b itself. .. By referring to the map information based on the acquired location information, the country or region can be identified. In this case, in the case of Japan, the area includes a wide area such as a prefecture and a specific area (shopping district, school zone, etc.) in the urban area. The time information can be acquired from, for example, a timer or calendar mounted on the vehicle or the information processing device 1b itself, and it is possible to know the day and night and the season.
 次のステップS101で、認識処理部20bは、認識結果出力設定部212により、ステップS100で検出した事前情報に基づき、どのタイミングの認識結果(例えば認識結果φ1~φ4の何れか)を出力するかを決定する。例えば、事前情報が走行速度情報の場合、走行速度が所定以上であれば、速報結果(認識結果φ1)を出力し、そうでない場合には、統合結果(認識結果φ4)を出力することが考えられる。また例えば、事前情報が位置情報の場合、現在位置がスクールゾーンであれば、近距離の対象物の認識を優先するため速報結果(認識結果φ1)を出力し、高速道路であれば、遠方の交通状況や対向車の認識を優先するため統合結果(認識結果φ4)を出力することが考えられる。 In the next step S101, the recognition processing unit 20b outputs the recognition result (for example, any of recognition results φ1 to φ4) at which timing based on the prior information detected in step S100 by the recognition result output setting unit 212. To determine. For example, when the prior information is running speed information, if the running speed is equal to or higher than a predetermined value, a breaking news result (recognition result φ1) may be output, and if not, an integration result (recognition result φ4) may be output. Be done. Also, for example, when the prior information is location information, if the current position is the school zone, the breaking news result (recognition result φ1) is output in order to prioritize the recognition of short-distance objects, and if it is a highway, it is far away. It is conceivable to output the integration result (recognition result φ4) in order to prioritize the recognition of traffic conditions and oncoming vehicles.
 次のステップS102で、認識処理部20bは、利用領域取得部211により、例えば予め設定されたパターン(例えば分割領域35)に従い画像データ32がサブサンプリングされたサンプリング画像36φxを取得する。次のステップS103で、認識結果出力設定部212は、ステップS101における認識結果出力設定部212の決定に従い、認識結果φ1~φ4のうちどのタイミングの認識結果を出力するかを、認識部220に指定する。 In the next step S102, the recognition processing unit 20b acquires the sampled image 36φx in which the image data 32 is subsampled according to, for example, a preset pattern (for example, the division area 35) by the utilization area acquisition unit 211. In the next step S103, the recognition result output setting unit 212 specifies to the recognition unit 220 which timing of the recognition results φ1 to φ4 is to be output according to the determination of the recognition result output setting unit 212 in step S101. do.
 次のステップS104で、認識処理部20bは、認識部220により、利用領域取得部211から渡されたサンプリング画像36φxに対して認識処理を実行する。次のステップS105で、認識処理部20bは、認識結果出力設定部212により、認識部220により認識処理された認識結果φxがステップS103で認識部220に指定した認識結果であるか否かを判定する。例えば、認識結果出力設定部212は、認識部220から出力された認識結果φxを、蓄積部214および認識結果出力演算部213を介して取得することで、当該判定を実行できる。認識結果出力設定部212は、認識部220により認識処理された認識結果φxがステップS103で認識部220に指定した認識結果ではないと判定した場合(ステップS105、「No」)、処理をステップS102に戻す。 In the next step S104, the recognition processing unit 20b executes the recognition processing on the sampled image 36φx passed from the used area acquisition unit 211 by the recognition unit 220. In the next step S105, the recognition processing unit 20b determines whether or not the recognition result φx recognized by the recognition unit 220 is the recognition result specified in the recognition unit 220 by the recognition result output setting unit 212. do. For example, the recognition result output setting unit 212 can execute the determination by acquiring the recognition result φx output from the recognition unit 220 via the storage unit 214 and the recognition result output calculation unit 213. When the recognition result output setting unit 212 determines that the recognition result φx recognized by the recognition unit 220 is not the recognition result specified in the recognition unit 220 in step S103 (step S105, “No”), the recognition result output setting unit 212 performs the processing in step S102. Return to.
 一方、認識結果出力設定部212は、認識部220により認識処理された認識結果φxがステップS103で認識部220に指定した認識結果であると判定した場合(ステップS105、「Yes」)、処理をステップS106に移行させる。ステップS106で、認識結果出力設定部212は、認識部220に対して、ステップS104で認識処理を行った認識結果φxを出力するように指示する。この指示に応じて、認識部220から認識結果φxが出力される。 On the other hand, when the recognition result output setting unit 212 determines that the recognition result φx recognized by the recognition unit 220 is the recognition result specified in the recognition unit 220 in step S103 (step S105, “Yes”), the processing is performed. The process proceeds to step S106. In step S106, the recognition result output setting unit 212 instructs the recognition unit 220 to output the recognition result φx obtained by the recognition process in step S104. In response to this instruction, the recognition result φx is output from the recognition unit 220.
 このように、第1の実施形態に係る認識処理部20bは、事前情報を検出し、検出した事前情報に基づき、どのタイミングの認識結果φxを出力するかを決定している。そのため、第1の実施形態に係る認識処理部20bを適用することで、状況に合わせた認識結果を得ることが可能となる。また、これにより、計算量のコストや通信コストを抑制することが可能となる。 In this way, the recognition processing unit 20b according to the first embodiment detects the prior information and determines at what timing the recognition result φx is output based on the detected prior information. Therefore, by applying the recognition processing unit 20b according to the first embodiment, it is possible to obtain a recognition result according to the situation. Further, this makes it possible to suppress the cost of calculation amount and the communication cost.
(2-4.第1の実施形態の第1の変形例)
 次に、第1の実施形態の第1の変形例について説明する。第1の実施形態の第1の変形例は、図19を用いて説明したタイミングRの処理に対応し、フレーム内の所定期間101における認識結果に基づき、認識部220が出力する認識結果φxを決定する。
(2-4. First modification of the first embodiment)
Next, a first modification of the first embodiment will be described. The first modification of the first embodiment corresponds to the processing of the timing R described with reference to FIG. 19, and based on the recognition result in the predetermined period 101 in the frame, the recognition result φx output by the recognition unit 220 is obtained. decide.
 図22は、第1の実施形態の第1の変形例に係る認識処理を示す一例のフローチャートである。ここでは、この図22のフローチャートの実行に先立って、認識部220が出力する認識結果φxを決定するための認識対象を設定しておくものとする。第1の実施形態の第1の変形例に係る情報処理装置1bが車載用との場合、認識対象としては、例えば人(歩行者)が考えられる。これに限らず、対向車や道路標識を認識対象とすることも考えられる。 FIG. 22 is a flowchart of an example showing the recognition process according to the first modification of the first embodiment. Here, prior to the execution of the flowchart of FIG. 22, it is assumed that the recognition target for determining the recognition result φx output by the recognition unit 220 is set. When the information processing device 1b according to the first modification of the first embodiment is for in-vehicle use, for example, a person (pedestrian) can be considered as a recognition target. Not limited to this, it is also possible to recognize oncoming vehicles and road signs.
 ステップS200~ステップS203の処理は、ループ処理となっている。ステップS200で、認識処理部20bは、利用領域取得部211により、例えば予め設定されたパターン(例えば分割領域35)に従い、フレーム(t)の画像データ32がサブサンプリングされたサンプリング画像36φxを取得する。 The processing of steps S200 to S203 is a loop processing. In step S200, the recognition processing unit 20b acquires the sampled image 36φx in which the image data 32 of the frame (t) is subsampled by the utilization area acquisition unit 211, for example, according to a preset pattern (for example, the division area 35). ..
 次のステップS201で、認識処理部20bは、認識結果出力設定部212により、認識結果φ1~φ4のうちどのタイミングの認識結果を出力するかを、認識部220に指定する。例えば、認識結果出力設定部212は、ステップS200で取得したサンプリング画像36φxに基づく認識結果を出力するように、認識部220に指定する。 In the next step S201, the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results φ1 to φ4 is to be output by the recognition result output setting unit 212. For example, the recognition result output setting unit 212 specifies to the recognition unit 220 to output the recognition result based on the sampled image 36φx acquired in step S200.
 次のステップS202で、認識処理部20bは、認識部220により、ステップS200で取得したサンプリング画像36φxに対する認識処理を実行する。より詳細には、認識部220は、ステップS200で取得したサンプリング画像36φxの特徴量を抽出し、抽出した特徴量に基づき認識処理を実行する。 In the next step S202, the recognition processing unit 20b executes the recognition processing for the sampled image 36φx acquired in step S200 by the recognition unit 220. More specifically, the recognition unit 220 extracts the feature amount of the sampled image 36φx acquired in step S200, and executes the recognition process based on the extracted feature amount.
 次のステップS203で、認識処理部20bは、認識結果出力設定部212により、ステップS202の認識処理の結果に基づき、フレーム(t)の次のフレーム(t+1)において、認識結果φ1~φ4のうちどのタイミングの認識結果を出力するかが決定されたか否かを判定する。例えば、認識結果出力設定部212は、ステップS202の認識処理の結果に基づき、例えば予め指定された認識対象が検出されている場合に、当該認識対象に応じて、認識結果φ1~φ4のうちどのタイミングの認識結果を出力するかが決定されたと判定する。 In the next step S203, the recognition processing unit 20b uses the recognition result output setting unit 212 to perform the recognition results φ1 to φ4 in the next frame (t + 1) of the frame (t) based on the result of the recognition processing in step S202. It is determined whether or not it is decided at which timing the recognition result is to be output. For example, the recognition result output setting unit 212 may use any of the recognition results φ1 to φ4 according to the recognition target, for example, when a recognition target designated in advance is detected based on the result of the recognition process in step S202. It is determined that it has been decided whether to output the timing recognition result.
 ステップS203で、認識結果出力設定部212は、どのタイミングの認識結果を出力するかが決定されていないと判定した場合(ステップS203、「No」)、処理をステップS200に戻し、ステップS200で取得したサンプリング画像36φxの次の位相のサンプリング画像36φ(x+1)を取得する。 When the recognition result output setting unit 212 determines in step S203 that the timing of which to output the recognition result has not been determined (step S203, “No”), the process is returned to step S200 and acquired in step S200. The sampled image 36φ (x + 1) of the next phase of the sampled image 36φx is acquired.
 一方、認識結果出力設定部212は、ステップS203でどのタイミングの認識結果を出力するかがケッテされていると判定した場合(ステップS203、「Yes」)、処理をステップS204に移行させる。 On the other hand, when the recognition result output setting unit 212 determines in step S203 that the timing of the recognition result to be output is determined (step S203, “Yes”), the process shifts to step S204.
 ステップS204で、認識処理部20bは、利用領域取得部211において、ステップS203の決定に従い、出力が決定された認識結果φxに応じたサンプリング画像36φxを取得する。ここで、取得するサンプリング画像36φxは、時間(t)における画像データ32のサンプリング画像36φxであってもよいし、次の時間(t+1)における画像データ32のサンプリング画像36φxであってもよい。 In step S204, the recognition processing unit 20b acquires the sampled image 36φx according to the recognition result φx whose output is determined according to the determination in step S203 in the utilization area acquisition unit 211. Here, the sampled image 36φx to be acquired may be the sampled image 36φx of the image data 32 at the time (t), or may be the sampled image 36φx of the image data 32 at the next time (t + 1).
 次のステップS205で、認識処理部20bは、認識結果出力設定部212により、認識結果φ1~φ4のうちどのタイミングの認識結果を出力するかを、認識部220に指定する。例えば、認識結果出力設定部212は、ステップS203で決定されたサンプリング画像36φxに基づく認識結果を出力するように、認識部220に指定する。 In the next step S205, the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results φ1 to φ4 is to be output by the recognition result output setting unit 212. For example, the recognition result output setting unit 212 specifies to the recognition unit 220 to output the recognition result based on the sampled image 36φx determined in step S203.
 次のステップS206で、認識処理部20bは、認識部220により、ステップS204で取得したサンプリング画像36φxに対する認識処理を実行する。 In the next step S206, the recognition processing unit 20b executes the recognition processing for the sampled image 36φx acquired in step S204 by the recognition unit 220.
 次のステップS207で、認識処理部20bは、認識結果出力設定部212により、ステップS205で出力が指定された認識結果、例えばステップS203で決定されたサンプリング画像36φxに基づく認識処理が行われたか否かを判定する。当該認識処理が行われていないと判定した場合(ステップS207、「No」)、処理をステップS204に戻す。一方、認識結果出力設定部212は、当該認識処理が行われたと判定した場合(ステップS207、「Yes」)、処理をステップS208に移行させる。 In the next step S207, whether or not the recognition processing unit 20b has performed the recognition process based on the recognition result whose output is specified in step S205 by the recognition result output setting unit 212, for example, the sampling image 36φx determined in step S203. Is determined. If it is determined that the recognition process has not been performed (step S207, "No"), the process is returned to step S204. On the other hand, when the recognition result output setting unit 212 determines that the recognition process has been performed (step S207, “Yes”), the process shifts to step S208.
 ステップS208で、認識処理部20bは、認識結果出力設定部212により、認識部220に対して認識結果φxを出力するよう指示する。認識部220は、この指示に応じて、認識結果φxを出力する。 In step S208, the recognition processing unit 20b instructs the recognition unit 220 to output the recognition result φx by the recognition result output setting unit 212. The recognition unit 220 outputs the recognition result φx in response to this instruction.
 より具体的な例を用いて説明する。ここでは、認識対象として、路上の障害物(歩行者など)を適用し、速報結果(認識結果φ1)では、当該情報処理装置1bが搭載される車両(自車)の制動対象距離の範囲内で当該認識対象が認識可能とされているものとする。 The explanation will be given using a more specific example. Here, an obstacle (pedestrian, etc.) on the road is applied as a recognition target, and the preliminary report result (recognition result φ1) is within the braking target distance of the vehicle (own vehicle) on which the information processing device 1b is mounted. It is assumed that the recognition target is recognizable.
 図23は、上述した図18に対応する図であって、第1の実施形態の第1の変形例に係る認識処理を説明するための模式図である。時間(t)において、画像データ32に対する最初のサブサンプリングによるサンプリング画像36φ1に基づく認識処理(ステップS13a)により、認識対象が認識されたものとする(ステップS202)。この場合、認識対象は、自車の制動対象距離の範囲内に位置するため、情報処理装置1bは、自車に対して制動を促す通信を行う。 FIG. 23 is a diagram corresponding to FIG. 18 described above, and is a schematic diagram for explaining the recognition process according to the first modification of the first embodiment. At time (t), it is assumed that the recognition target is recognized by the recognition process (step S13a) based on the sampled image 36φ1 by the first subsampling of the image data 32 (step S202). In this case, since the recognition target is located within the range of the braking target distance of the own vehicle, the information processing device 1b performs communication for urging the own vehicle to brake.
 時間(t)において、認識対象に対して遠方の対象物を検出する必要が無い。したがって、例えば、認識結果出力設定部212は、次の時間(t+1)の画像データ32(時間(t)の画像データ32の次のフレームの画像データ32)に基づく認識結果φ1~φ4の少なくとも1つを、出力を行う認識結果として決定する。これにより、認識部220は、時間(t)の画像データ32に対する残りのサンプリング画像36φ2~36φ4に基づく認識処理(ステップS13b~ステップS13d)を実行せずに、ステップS204から、次の時間(t+1)の認識処理を実行する。 It is not necessary to detect an object distant from the recognition target at time (t). Therefore, for example, the recognition result output setting unit 212 has at least one of the recognition results φ1 to φ4 based on the image data 32 of the next time (t + 1) (the image data 32 of the frame next to the image data 32 of the time (t)). Is determined as the recognition result for output. As a result, the recognition unit 220 does not execute the recognition process (step S13b to step S13d) based on the remaining sampled images 36φ2 to 36φ4 for the image data 32 of the time (t), and starts from step S204 to the next time (t + 1). ) Is executed.
 図24は、この図23の処理による効果を説明するための一例のタイムチャートである。上述した図16および図17Aでは、画像データ32から4つの位相φ1~φ4のサンプリング画像36φ1~36φ4を取得する場合において、この各サンプリング画像36φ1~36φ4に対する各認識処理が、1フレーム期間内に完了するように示している。 FIG. 24 is an example time chart for explaining the effect of the processing of FIG. 23. In FIGS. 16 and 17A described above, when the sampled images 36φ1 to 36φ4 of the four phases φ1 to φ4 are acquired from the image data 32, each recognition process for each of the sampled images 36φ1 to 36φ4 is completed within one frame period. It is shown to do.
 しかしながら、実際には、1フレーム期間は、露光期間や画素データの転送処理を行う期間を含み、実際に例えば最初のサンプリング画像36φ1に対する認識処理は、フレーム開始タイミングである時間t0に対して遅延して開始される。また、各認識処理は、1フレーム期間の1/4の時間内に完了するとは限らない。 However, in reality, the one-frame period includes an exposure period and a period for performing pixel data transfer processing, and in fact, for example, the recognition processing for the first sampled image 36φ1 is delayed with respect to the frame start timing time t 0. And start. Further, each recognition process is not always completed within 1/4 of the time of one frame period.
 これらの点を考慮すると、時間t0から開始される第1フレーム期間において、各サンプリング画像36φ1~36φ4に対する各認識処理を実行した場合、最後の認識結果φ4の認識処理が完了する時間t11が第1フレーム期間が終了する時間t1に間に合わない可能性がある。すなわち、この場合、第1フレーム期間の認識処理が、フレームを跨いで次の第2フレーム期間に掛かってしまう。したがって、次の認識処理が、第2フレーム期間が終了する時間t2から開始されることになり、応答性に問題が生じる可能性がある。 Considering these points, when each recognition process for each sampled image 36φ1 to 36φ4 is executed in the first frame period starting from time t0, the time t 11 at which the recognition process of the final recognition result φ4 is completed is the second. There is a possibility that the time t 1 at which the one frame period ends may not be met. That is, in this case, the recognition process of the first frame period is applied to the next second frame period across the frames. Therefore, the next recognition process starts from the time t 2 at which the second frame period ends, which may cause a problem in responsiveness.
 これに対して、図23を用いて説明したように、第1フレーム期間において例えば最初のサンプリング画像36φ1に対する認識処理のみ実行し、続くサンプリング画像φ2~φ4に対する認識処理を行わないことで、次の画像データ32に対する認識処理を、第2フレーム期間が開始する時間t1から開始させることができる。したがって、上述の例と比較して、応答性に優れている。 On the other hand, as described with reference to FIG. 23, in the first frame period, for example, only the recognition process for the first sampled image 36φ1 is executed, and the recognition process for the subsequent sampled images φ2 to φ4 is not performed. The recognition process for the image data 32 can be started from the time t 1 at which the second frame period starts. Therefore, it is superior in responsiveness as compared with the above example.
 なお、認識処理の場合、画像の視認性を考慮する必要が無い場合が多く、この例においては、サンプリング画像36φ1に対する認識処理が終了する時間t10において第2フレーム期間を開始させることが可能とされ、応答性をより向上させることが可能である。 In the case of the recognition process, if it is not necessary to consider the visibility of the image is large, in this example, is possible recognition processing for the sampling image 36φ1 is to start the second frame period at the time t 10 to end It is possible to further improve the responsiveness.
(2-5.第1の実施形態の第2の変形例)
 次に、第1の実施形態の第2の変形例について説明する。第1の実施形態の第2の変形例は、図19を用いて説明したタイミングQの処理に対応し、フレームを跨いだ所定期間100における認識結果に基づき、出力する認識結果φxを設定する。
(2-5. Second modification of the first embodiment)
Next, a second modification of the first embodiment will be described. The second modification of the first embodiment corresponds to the processing of the timing Q described with reference to FIG. 19, and sets the recognition result φx to be output based on the recognition result in the predetermined period 100 across the frames.
 図25は、第1の実施形態の第2の変形例に係る認識処理を示す一例のフローチャートである。ここでは、この図25のフローチャートの実行に先立って、認識部220が出力する認識結果φxを決定するための認識対象を設定しておくものとする。第1の実施形態の第2の変形例に係る情報処理装置1bが車載用との場合、認識対象としては、例えば人(歩行者)が考えられる。これに限らず、対向車や道路標識を認識対象とすることも考えられる。 FIG. 25 is a flowchart of an example showing the recognition process according to the second modification of the first embodiment. Here, prior to the execution of the flowchart of FIG. 25, it is assumed that the recognition target for determining the recognition result φx output by the recognition unit 220 is set. When the information processing device 1b according to the second modification of the first embodiment is for in-vehicle use, for example, a person (pedestrian) can be considered as a recognition target. Not limited to this, it is also possible to recognize oncoming vehicles and road signs.
 ステップS300~ステップS304の処理は、ループ処理となっている。ステップS300で、認識処理部20bは、利用領域取得部211により、例えば予め設定されたパターン(例えば分割領域35)に従い、1フレームの画像データ32がサブサンプリングされたサンプリング画像36φxを取得する。 The processing of steps S300 to S304 is a loop processing. In step S300, the recognition processing unit 20b acquires the sampled image 36φx in which the image data 32 of one frame is subsampled by the utilization area acquisition unit 211, for example, according to a preset pattern (for example, the division area 35).
 次のステップS301で、認識処理部20bは、認識結果出力設定部212により、認識結果φ1~φ4のうちどのタイミングの認識結果を出力するかを、認識部220に指定する。例えば、認識結果出力設定部212は、ステップS300で取得したサンプリング画像36φxに基づく認識結果を出力するように、認識部220に指定する。 In the next step S301, the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results φ1 to φ4 is to be output by the recognition result output setting unit 212. For example, the recognition result output setting unit 212 specifies to the recognition unit 220 to output the recognition result based on the sampled image 36φx acquired in step S300.
 次のステップS302で、認識処理部20bは、認識部220により、ステップS300で取得したサンプリング画像36φxに対する認識処理を実行する。より詳細には、認識部220は、ステップS300で取得したサンプリング画像36φxの特徴量を抽出し、抽出した特徴量に基づき認識処理を実行する。次のステップS303で、蓄積部214は、ステップS302で実行された認識処理による認識結果φxを蓄積する。 In the next step S302, the recognition processing unit 20b executes the recognition processing for the sampled image 36φx acquired in step S300 by the recognition unit 220. More specifically, the recognition unit 220 extracts the feature amount of the sampled image 36φx acquired in step S300, and executes the recognition process based on the extracted feature amount. In the next step S303, the storage unit 214 accumulates the recognition result φx by the recognition process executed in step S302.
 次のステップS304で、認識処理部20bは、認識結果出力設定部212により、ステップS300~ステップS303の処理を、所定期間(例えば数フレーム期間)実行したか否かを判定する。認識結果出力設定部212は、処理を所定期間実行していないと判定した場合(ステップS304、「No」)、処理をステップS300に戻す。一方、認識結果出力設定部212は、処理を所定期間実行したと判定した場合(ステップS304、「Yes」)、処理をステップS305に移行させる。 In the next step S304, the recognition processing unit 20b determines whether or not the processing of steps S300 to S303 has been executed for a predetermined period (for example, several frame period) by the recognition result output setting unit 212. When the recognition result output setting unit 212 determines that the process has not been executed for a predetermined period (step S304, "No"), the process returns to step S300. On the other hand, when the recognition result output setting unit 212 determines that the process has been executed for a predetermined period (step S304, "Yes"), the process shifts to step S305.
 ステップS305で、認識処理部20bは、認識結果出力設定部212により、蓄積部214に蓄積された認識結果φxに基づき、以降のフレームにおいて、認識結果φ1~φ4のうちどのタイミングの認識結果を出力するかを決定する。ここで、認識結果出力設定部212は、1フレームの画像データ32に基づく認識結果φ1~φ4から、1または複数の認識結果を、出力する認識結果φxとして決定することができる。 In step S305, the recognition processing unit 20b outputs the recognition result of which timing of the recognition results φ1 to φ4 in the subsequent frames based on the recognition result φx accumulated in the storage unit 214 by the recognition result output setting unit 212. Decide if you want to. Here, the recognition result output setting unit 212 can determine one or more recognition results as the output recognition result φx from the recognition results φ1 to φ4 based on the image data 32 of one frame.
 次のステップS306~ステップS309は、ループ処理となっている。ステップS306で、認識処理部20bは、利用領域取得部211において、ステップS305の決定に従い、出力が決定された認識結果φxに応じたサンプリング画像36φxを取得する。 The next steps S306 to S309 are loop processing. In step S306, the recognition processing unit 20b acquires the sampled image 36φx according to the recognition result φx whose output is determined according to the determination in step S305 in the utilization area acquisition unit 211.
 次のステップS307で、認識処理部20bは、認識結果出力設定部212により、認識結果φ1~φ4のうちどのタイミングの認識結果を出力するかを、認識部220に指定する。例えば、認識結果出力設定部212は、ステップS303で決定されたサンプリング画像36φxに基づく認識結果を出力するように、認識部220に指定する。ステップS305で複数の認識結果が出力する認識結果φxとして決定されている場合、ステップS306~ステップS309のループにおいて、それらを順次に選択して決定する。 In the next step S307, the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results φ1 to φ4 is to be output by the recognition result output setting unit 212. For example, the recognition result output setting unit 212 designates the recognition unit 220 to output the recognition result based on the sampled image 36φx determined in step S303. When a plurality of recognition results are determined as the recognition results φx to be output in step S305, they are sequentially selected and determined in the loop of steps S306 to S309.
 次のステップS308で、認識処理部20bは、認識部220により、ステップS306で取得したサンプリング画像36φxに対する認識処理を実行する。 In the next step S308, the recognition processing unit 20b executes the recognition processing for the sampled image 36φx acquired in step S306 by the recognition unit 220.
 次のステップS309で、認識処理部20bは、認識結果出力設定部212により、ステップS305で出力が指定された認識結果に基づく認識処理が全て行われたか否かを判定する。出力が決定された認識結果のうち、認識処理が行われていないものがあると判定した場合(ステップS309、「No」)、認識結果出力設定部212は、処理をステップS306に戻す。一方、認識結果出力設定部212は、出力が決定された認識結果の全てについて認識処理が行われたと判定した場合(ステップS309、「Yes」)、処理をステップS310に移行させる。 In the next step S309, the recognition processing unit 20b determines whether or not all the recognition processing based on the recognition result whose output is specified in step S305 has been performed by the recognition result output setting unit 212. When it is determined that some of the recognition results whose output has been determined have not been recognized (step S309, "No"), the recognition result output setting unit 212 returns the process to step S306. On the other hand, when the recognition result output setting unit 212 determines that the recognition process has been performed for all the recognition results whose output has been determined (step S309, “Yes”), the process shifts to step S310.
 ステップS310で、認識処理部20bは、認識結果出力設定部212により、認識部220に対して、出力が決定された各認識結果φxを出力するよう指示する。認識部220は、この指示に応じて、各認識結果φxを出力する。なお、このステップS310の処理は、ステップS308の処理とステップS309の処理との間に実行してもよい。 In step S310, the recognition processing unit 20b instructs the recognition result output setting unit 212 to output each recognition result φx whose output has been determined to the recognition unit 220. The recognition unit 220 outputs each recognition result φx in response to this instruction. The process of step S310 may be executed between the process of step S308 and the process of step S309.
 より具体的な例を用いて説明する。上述と同様に、認識対象として、路上の障害物(歩行者など)を適用し、速報結果(認識結果φ1)では、当該情報処理装置1bが搭載される車両(自車)の制動対象距離の範囲内で当該認識対象が認識可能とされているものとする。 The explanation will be given using a more specific example. Similar to the above, an obstacle (pedestrian, etc.) on the road is applied as the recognition target, and the preliminary report result (recognition result φ1) shows the braking target distance of the vehicle (own vehicle) on which the information processing device 1b is mounted. It is assumed that the recognition target can be recognized within the range.
 図26は、上述した図18に対応する図であって、第1の実施形態の第2の変形例に係る認識処理を説明するための模式図である。先ず、図26の上段に示すように、認識結果出力設定部212は、認識部220に対して例えば認識結果φ4、すなわち統合結果を出力するように指示し、所定期間100の認識結果φ4を蓄積部214に蓄積する。認識結果出力設定部212は、蓄積部214に蓄積された認識結果φ4に基づき、現在自車が位置する区域が、歩行者が多い区域であると判定したものとする。 FIG. 26 is a diagram corresponding to FIG. 18 described above, and is a schematic diagram for explaining the recognition process according to the second modification of the first embodiment. First, as shown in the upper part of FIG. 26, the recognition result output setting unit 212 instructs the recognition unit 220 to output, for example, the recognition result φ4, that is, the integration result, and accumulates the recognition result φ4 for a predetermined period 100. Accumulate in unit 214. Based on the recognition result φ4 accumulated in the storage unit 214, the recognition result output setting unit 212 determines that the area where the own vehicle is currently located is an area with many pedestrians.
 この場合、飛び出し歩行者も多いと考えられるため、認識結果出力設定部212は、図26の下段に示すように、認識結果φ1(速報結果)と、認識結果φ4(統合結果)とを出力するように決定する(ステップS305)。認識部220は、この決定に従い、認識結果φ1およびφ4を出力する。これら認識結果φ1およびφ4は、例えば自車の制動システムに送られる。 In this case, since it is considered that there are many pedestrians jumping out, the recognition result output setting unit 212 outputs the recognition result φ1 (breaking news result) and the recognition result φ4 (integration result) as shown in the lower part of FIG. 26. (Step S305). The recognition unit 220 outputs the recognition results φ1 and φ4 according to this determination. These recognition results φ1 and φ4 are sent to, for example, the braking system of the own vehicle.
 このように、第1の実施形態の第2の変形例では、所定期間の認識結果に基づき出力する認識結果を決定しているため、状況に応じた認識結果を適切に出力することが可能となる。 As described above, in the second modification of the first embodiment, since the recognition result to be output is determined based on the recognition result for a predetermined period, it is possible to appropriately output the recognition result according to the situation. Become.
(2-6.第1の実施形態の他の変形例)
 次に、第1の実施形態の他の変形例について説明する。上述では、本開示に係る技術が物体を検出するための認識処理に適用されるように説明したが、これはこの例に限定されない。例えば、セマンティックセグメンテーションや、その他の類似するタスクに、本開示に係る技術を適用することができる。
(2-6. Another modified example of the first embodiment)
Next, another modification of the first embodiment will be described. In the above description, the technique according to the present disclosure has been described as being applied to a recognition process for detecting an object, but this is not limited to this example. For example, the techniques according to the present disclosure can be applied to semantic segmentation and other similar tasks.
 また、上述では、本開示に係る技術がDNNを用いた認識処理に適用されるように説明したが、これはこの例に限定されない。例えば、画像情報を時間軸展開して利用するアーキテクチャであれば、他の技術にも適用可能である。 Further, in the above description, the technique according to the present disclosure has been described as being applied to the recognition process using DNN, but this is not limited to this example. For example, any architecture that expands and uses image information on the time axis can be applied to other technologies.
[3.第2の実施形態]
 次に、本開示の第2の実施形態について説明する。本開示の第2の実施形態は、画素アレイ部1001を含むセンサ部10bと、認識部220と、前処理部210に相当する構成と、を層構造のCISに一体的に組み込んだ例である。
[3. Second Embodiment]
Next, a second embodiment of the present disclosure will be described. A second embodiment of the present disclosure is an example in which a sensor unit 10b including a pixel array unit 1001, a recognition unit 220, and a configuration corresponding to a preprocessing unit 210 are integrally incorporated into a layered CIS. ..
 図28は、第2の実施形態に係る情報処理装置の一例の構成を示すブロック図である。図28において、情報処理装置1cは、センサ部10cと、認識部220と、を含む。また、センサ部10cは、画素アレイ部1001と、読出制御部240と、を含む。読出制御部240は、例えば、第1の実施形態で説明した前処理部210に対応する機能と、撮像部1200における制御部1100の機能と、を含む。 FIG. 28 is a block diagram showing a configuration of an example of the information processing device according to the second embodiment. In FIG. 28, the information processing device 1c includes a sensor unit 10c and a recognition unit 220. Further, the sensor unit 10c includes a pixel array unit 1001 and a read control unit 240. The read control unit 240 includes, for example, a function corresponding to the preprocessing unit 210 described in the first embodiment and a function of the control unit 1100 in the imaging unit 1200.
 なお、図28において、図5を用いて説明した構成のうち、垂直走査部1002、AD変換部1003および信号処理部1101は、画素アレイ部1001に含まれるものとして説明を行う。 Note that, in FIG. 28, among the configurations described with reference to FIG. 5, the vertical scanning unit 1002, the AD conversion unit 1003, and the signal processing unit 1101 will be described as being included in the pixel array unit 1001.
 読出制御部240は、画素アレイ部1001に対して、画素信号を読み出す画素回路1000を指定する制御信号を供給する。例えば、読出制御部240は、画素アレイ部1001に対して、サンプリング画素を含むラインを選択的に読み出すことができる。これに限らず、読出制御部240は、画素アレイ部1001に対して、画素回路1000単位で、サンプリング画素に対応する画素回路1000を選択的に指定することもできる。このとき、読出制御部240は、画素アレイ部1001に対して、第1の実施形態で説明した、位相をずらしながら行うサブサンプリングによるサンプリング画素の画素位置に対応する画素回路1000を指定することができる。 The read control unit 240 supplies the pixel array unit 1001 with a control signal that specifies the pixel circuit 1000 that reads the pixel signal. For example, the read control unit 240 can selectively read a line including sampling pixels from the pixel array unit 1001. Not limited to this, the read control unit 240 can selectively specify the pixel circuit 1000 corresponding to the sampling pixel in the pixel circuit 1000 unit for the pixel array unit 1001. At this time, the read control unit 240 may specify to the pixel array unit 1001 the pixel circuit 1000 corresponding to the pixel position of the sampled pixel by subsampling performed while shifting the phase described in the first embodiment. can.
 画素アレイ部1001は、指定された画素回路1000から読み出した画素信号をデジタル方式の画素データに変換し、この画素データを読出制御部240に渡す。読出制御部240は、画素アレイ部1001から渡された、1フレーム分の画素データを、画像データとして認識部220に渡す。この画像データは、位相ずらしサブサンプリングによるサンプリング画像である。認識部220は、渡された画像データに対して認識処理を実行する。 The pixel array unit 1001 converts the pixel signal read from the designated pixel circuit 1000 into digital pixel data, and passes this pixel data to the read control unit 240. The read control unit 240 passes the pixel data for one frame passed from the pixel array unit 1001 to the recognition unit 220 as image data. This image data is a sampled image by phase shift subsampling. The recognition unit 220 executes a recognition process on the passed image data.
 第2の実施形態では、情報処理装置1cを、図6Aを用いて説明した、半導体チップを2層に積層した2層構造の積層型CISにより構成することができる。図6Aを参照し、第1層の半導体チップに画素部2020aを形成し、第2層の半導体チップにメモリ+ロジック部2020bを形成している。画素部2020aは、少なくとも情報処理装置1cにおけるセンサ部10cを含む。メモリ+ロジック部2020bは、例えば、画素アレイ部1001を駆動するための駆動回路を含むと共に、読出制御部240と、認識部220と、を含む。メモリ+ロジック部2020bに、フレームメモリをさらに含ませることができる。 In the second embodiment, the information processing apparatus 1c can be configured by the laminated CIS having a two-layer structure in which semiconductor chips are laminated in two layers, which is described with reference to FIG. 6A. With reference to FIG. 6A, the pixel portion 2020a is formed on the semiconductor chip of the first layer, and the memory + logic portion 2020b is formed on the semiconductor chip of the second layer. The pixel unit 2020a includes at least the sensor unit 10c in the information processing device 1c. The memory + logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit 1001, a read control unit 240, and a recognition unit 220. The memory + logic unit 2020b can further include a frame memory.
 別の例として、情報処理装置1cを、図6Bを用いて説明した、半導体チップを3層に積層した3層構造の積層型CISにより構成することができる。この場合、第1層の半導体チップに上述の画素部2020aを形成し、第2層の半導体チップに例えばフレームメモリを含むメモリ部2020cを形成し、第3層の半導体チップに上述のメモリ+ロジック部2020bに対応するロジック部2020dを形成している。この場合、ロジック部2020dは、例えば画素アレイ部を駆動するための駆動回路と、読出制御部240と、認識部220と、を含む。また、メモリ部2020cは、フレームメモリやメモリ1202を含むことができる。 As another example, the information processing apparatus 1c can be configured by the laminated CIS having a three-layer structure in which semiconductor chips are laminated in three layers, which is described with reference to FIG. 6B. In this case, the pixel portion 2020a described above is formed on the semiconductor chip of the first layer, the memory portion 2020c including, for example, a frame memory is formed on the semiconductor chip of the second layer, and the memory + logic described above is formed on the semiconductor chip of the third layer. The logic unit 2020d corresponding to the unit 2020b is formed. In this case, the logic unit 2020d includes, for example, a drive circuit for driving the pixel array unit, a read control unit 240, and a recognition unit 220. Further, the memory unit 2020c can include a frame memory and a memory 1202.
 このように、第2の実施形態では、センサ部10cにおいてサブサンプリング処理を行っている。そのため、画素アレイ部1001に含まれる全画素回路1000からの読み出しを行う必要が無い。したがって、認識処理の遅延を、上述した第1の実施形態に対してさらに短縮することが可能である。また、全画素回路1000からサンプリング画素を含むラインの画素回路1000を選択的に読み出すため、画素アレイ部1001からの画素信号の読み出し量を低減でき、バス幅を削減することが可能である。 As described above, in the second embodiment, the sensor unit 10c performs the subsampling process. Therefore, it is not necessary to read from the all-pixel circuit 1000 included in the pixel array unit 1001. Therefore, the delay in the recognition process can be further shortened as compared with the first embodiment described above. Further, since the pixel circuit 1000 of the line including the sampling pixels is selectively read from the all pixel circuits 1000, the amount of reading the pixel signal from the pixel array unit 1001 can be reduced, and the bus width can be reduced.
 また、第2の実施形態では、画素アレイ部1001において、サンプリング画素を含むラインを選択的に読み出す、ライン間引きによる読み出しを行っている。そのため、ローリングシャッタによる撮像画像の歪みを低減することができる。また、画素アレイ部1001における撮像時の消費電力を低減させることが可能である。さらに、サブサンプリングにより間引きされたラインにおいて、例えば露出などの撮像条件を、サブサンプリングにより読み出しを行うラインに対して変更して撮像を行うことも可能である。 Further, in the second embodiment, the pixel array unit 1001 selectively reads out the lines including the sampling pixels, and reads out by thinning out the lines. Therefore, it is possible to reduce the distortion of the captured image by the rolling shutter. Further, it is possible to reduce the power consumption at the time of imaging in the pixel array unit 1001. Further, in the lines thinned out by subsampling, it is possible to change the imaging conditions such as exposure to the lines to be read out by subsampling to perform imaging.
(3-1.第2の実施形態の変形例)
 次に、第2の実施形態の変形例について説明する。第2の実施形態の変形例は、上述した第2の実施形態に係る情報処理装置1cにおける、センサ部10cと認識部220とを分離した例である。
(3-1. Modified example of the second embodiment)
Next, a modified example of the second embodiment will be described. A modification of the second embodiment is an example in which the sensor unit 10c and the recognition unit 220 are separated from each other in the information processing device 1c according to the second embodiment described above.
 図29は、第2の実施形態の変形例に係る情報処理装置の一例の構成を示すブロック図である。図29において、情報処理装置1dは、センサ部10dと、認識処理部20dと、を含む、センサ部10dは、画素アレイ部1001と、読出制御部240と、を含む。また、認識処理部20dは、認識部220を含む。 FIG. 29 is a block diagram showing a configuration of an example of an information processing device according to a modified example of the second embodiment. In FIG. 29, the information processing device 1d includes a sensor unit 10d and a recognition processing unit 20d, and the sensor unit 10d includes a pixel array unit 1001 and a read control unit 240. Further, the recognition processing unit 20d includes a recognition unit 220.
 ここで、センサ部10dは、例えば、図6Aを用いて説明した、半導体チップを2層に積層した2層構造の積層型CISにより形成する。図6Aを参照し、第1層の半導体チップに画素部2020aを形成し、第2層の半導体チップにメモリ+ロジック部2020bを形成している。画素部2020aは、少なくともセンサ部10dにおける画素アレイ部1001を含む。メモリ+ロジック部2020bは、例えば、画素アレイ部1001を駆動するための駆動回路と、読出制御部240とを含む。メモリ+ロジック部2020bに、フレームメモリをさらに含ませることができる。 Here, the sensor unit 10d is formed by, for example, the laminated CIS having a two-layer structure in which semiconductor chips are laminated in two layers, which is described with reference to FIG. 6A. With reference to FIG. 6A, the pixel portion 2020a is formed on the semiconductor chip of the first layer, and the memory + logic portion 2020b is formed on the semiconductor chip of the second layer. The pixel unit 2020a includes at least the pixel array unit 1001 in the sensor unit 10d. The memory + logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit 1001 and a read control unit 240. The memory + logic unit 2020b can further include a frame memory.
 センサ部10dは、サンプリング画像の画像データを読出制御部240から出力し、センサ部10dとは異なるハードウェアに含まれる認識処理部20dに供給する。認識処理部20dは、センサ部10dから供給された画像データを認識部220に入力する。認識部220は、入力された画像データに基づき認識処理を実行し、認識結果を外部に出力する。 The sensor unit 10d outputs the image data of the sampled image from the read control unit 240 and supplies it to the recognition processing unit 20d included in the hardware different from the sensor unit 10d. The recognition processing unit 20d inputs the image data supplied from the sensor unit 10d to the recognition unit 220. The recognition unit 220 executes the recognition process based on the input image data, and outputs the recognition result to the outside.
 別の例として、センサ部10dを、図6Bを用いて説明した、半導体チップを3層に積層した3層構造の積層型CISにより形成することができる。この場合、第1層の半導体チップに上述の画素部2020aを形成し、第2層の半導体チップに例えばフレームメモリを含むメモリ部2020cを形成し、第3層の半導体チップに上述のメモリ+ロジック部2020bに対応するロジック部2020bを形成している。この場合、ロジック部2020bは、例えば画素アレイ部1001を駆動するための駆動回路と、読出制御部240とを含む。また、メモリ部2020cは、フレームメモリやメモリ1202を含むことができる。 As another example, the sensor unit 10d can be formed by the laminated CIS having a three-layer structure in which semiconductor chips are laminated in three layers, which is described with reference to FIG. 6B. In this case, the pixel portion 2020a described above is formed on the semiconductor chip of the first layer, the memory portion 2020c including, for example, a frame memory is formed on the semiconductor chip of the second layer, and the memory + logic described above is formed on the semiconductor chip of the third layer. The logic portion 2020b corresponding to the portion 2020b is formed. In this case, the logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit 1001 and a read control unit 240. Further, the memory unit 2020c can include a frame memory and a memory 1202.
 このように、認識処理部20d(認識部220)をセンサ部10dとは別のハードウェアにより構成することで、認識部220の構成、例えば認識モデルなどの変更が容易とすることができる。 In this way, by configuring the recognition processing unit 20d (recognition unit 220) with hardware different from the sensor unit 10d, it is possible to easily change the configuration of the recognition unit 220, for example, the recognition model.
 また、センサ部10dにおいて、サブサンプリングされたサンプリング画像に基づき認識処理が行われるため、撮像画像による画像データ32をそのまま用いて認識処理を行う場合と比較して、認識処理の負荷を軽減することができる。そのため、例えば認識処理部20dにおいて、処理能力の低いCPU、DSP、あるいはGPUを用いることができ、情報処理装置1dのコストを削減することが可能となる。 Further, since the sensor unit 10d performs the recognition process based on the sub-sampled sampled image, the load of the recognition process can be reduced as compared with the case where the recognition process is performed by using the image data 32 of the captured image as it is. Can be done. Therefore, for example, in the recognition processing unit 20d, a CPU, DSP, or GPU having a low processing capacity can be used, and the cost of the information processing device 1d can be reduced.
[4.第3の実施形態]
(4-1.本開示の技術の適用例)
 次に、第3の実施形態として、本開示に係る、第1の実施形態およびその各変形例、ならびに、第2の実施形態およびその変形例に係る情報処理装置1b、1cおよび1dの適用例について説明する。図30は、第1の実施形態およびその各変形例、ならびに、第2の実施形態およびその変形例に係る情報処理装置1b、1cおよび1dを使用する使用例を示す図である。なお、以下では、特に区別する必要のない場合、情報処理装置1b、1cおよび1dを情報処理装置1bで代表させて説明を行う。
[4. Third Embodiment]
(4-1. Application example of the technology of the present disclosure)
Next, as a third embodiment, the first embodiment and each modification thereof according to the present disclosure, and the application examples of the information processing devices 1b, 1c, and 1d according to the second embodiment and the modification thereof. Will be described. FIG. 30 is a diagram showing a first embodiment and each modification thereof, and a usage example using the information processing devices 1b, 1c, and 1d according to the second embodiment and the modification. In the following description, the information processing devices 1b, 1c and 1d will be represented by the information processing device 1b when it is not necessary to distinguish them.
 上述した情報処理装置1bは、例えば、以下のように、可視光や、赤外光、紫外光、X線等の光をセンシングしセンシング結果に基づき認識処理を行う様々なケースに使用することができる。 The information processing device 1b described above can be used in various cases where, for example, as shown below, light such as visible light, infrared light, ultraviolet light, and X-ray is sensed and recognition processing is performed based on the sensing result. can.
・ディジタルカメラや、カメラ機能付きの携帯機器等の、鑑賞の用に供される画像を撮影する装置。
・自動停止等の安全運転や、運転者の状態の認識等のために、自動車の前方や後方、周囲、車内等を撮影する車載用センサ、走行車両や道路を監視する監視カメラ、車両間等の測距を行う測距センサ等の、交通の用に供される装置。
・ユーザのジェスチャを撮影して、そのジェスチャに従った機器操作を行うために、TVや、冷蔵庫、エアーコンディショナ等の家電に供される装置。
・内視鏡や、赤外光の受光による血管撮影を行う装置等の、医療やヘルスケアの用に供される装置。
・防犯用途の監視カメラや、人物認証用途のカメラ等の、セキュリティの用に供される装置。
・肌を撮影する肌測定器や、頭皮を撮影するマイクロスコープ等の、美容の用に供される装置。
・スポーツ用途等向けのアクションカメラやウェアラブルカメラ等の、スポーツの用に供される装置。
・畑や作物の状態を監視するためのカメラ等の、農業の用に供される装置。
-A device that captures images used for viewing, such as digital cameras and mobile devices with camera functions.
・ For safe driving such as automatic stop and recognition of the driver's condition, in-vehicle sensors that photograph the front, rear, surroundings, inside of the vehicle, etc., surveillance cameras that monitor traveling vehicles and roads, inter-vehicle distance, etc. A device used for traffic, such as a distance measuring sensor that measures a distance.
-A device used for home appliances such as TVs, refrigerators, and air conditioners in order to take a picture of a user's gesture and operate the device according to the gesture.
-Devices used for medical treatment and healthcare, such as endoscopes and devices that perform angiography by receiving infrared light.
-Devices used for security, such as surveillance cameras for crime prevention and cameras for personal authentication.
-Devices used for cosmetology, such as a skin measuring device that photographs the skin and a microscope that photographs the scalp.
-Devices used for sports, such as action cameras and wearable cameras for sports applications.
-Agricultural equipment such as cameras for monitoring the condition of fields and crops.
(4-2.移動体への適用例)
 本開示に係る技術(本技術)は、様々な製品へ応用することができる。例えば、本開示に係る技術は、自動車、電気自動車、ハイブリッド電気自動車、自動二輪車、自転車、パーソナルモビリティ、飛行機、ドローン、船舶、ロボット等のいずれかの種類の移動体に搭載される装置として実現されてもよい。
(4-2. Application example to mobile body)
The technology according to the present disclosure (the present technology) can be applied to various products. For example, the technology according to the present disclosure is realized as a device mounted on a moving body of any kind such as an automobile, an electric vehicle, a hybrid electric vehicle, a motorcycle, a bicycle, a personal mobility, an airplane, a drone, a ship, and a robot. You may.
 図31は、本開示に係る技術が適用され得る移動体制御システムの一例である車両制御システムの概略的な構成例を示すブロック図である。 FIG. 31 is a block diagram showing a schematic configuration example of a vehicle control system, which is an example of a mobile control system to which the technique according to the present disclosure can be applied.
 車両制御システム12000は、通信ネットワーク12001を介して接続された複数の電子制御ユニットを備える。図31に示した例では、車両制御システム12000は、駆動系制御ユニット12010、ボディ系制御ユニット12020、車外情報検出ユニット12030、車内情報検出ユニット12040、及び統合制御ユニット12050を備える。また、統合制御ユニット12050の機能構成として、マイクロコンピュータ12051、音声画像出力部12052、及び車載ネットワークI/F(interface)12053が図示されている。 The vehicle control system 12000 includes a plurality of electronic control units connected via the communication network 12001. In the example shown in FIG. 31, the vehicle control system 12000 includes a drive system control unit 12010, a body system control unit 12020, an outside information detection unit 12030, an in-vehicle information detection unit 12040, and an integrated control unit 12050. Further, as a functional configuration of the integrated control unit 12050, a microcomputer 12051, an audio image output unit 12052, and an in-vehicle network I / F (interface) 12053 are shown.
 駆動系制御ユニット12010は、各種プログラムにしたがって車両の駆動系に関連する装置の動作を制御する。例えば、駆動系制御ユニット12010は、内燃機関又は駆動用モータ等の車両の駆動力を発生させるための駆動力発生装置、駆動力を車輪に伝達するための駆動力伝達機構、車両の舵角を調節するステアリング機構、及び、車両の制動力を発生させる制動装置等の制御装置として機能する。 The drive system control unit 12010 controls the operation of the device related to the drive system of the vehicle according to various programs. For example, the drive system control unit 12010 provides a driving force generator for generating the driving force of the vehicle such as an internal combustion engine or a driving motor, a driving force transmission mechanism for transmitting the driving force to the wheels, and a steering angle of the vehicle. It functions as a control device such as a steering mechanism for adjusting and a braking device for generating a braking force of a vehicle.
 ボディ系制御ユニット12020は、各種プログラムにしたがって車体に装備された各種装置の動作を制御する。例えば、ボディ系制御ユニット12020は、キーレスエントリシステム、スマートキーシステム、パワーウィンドウ装置、あるいは、ヘッドランプ、バックランプ、ブレーキランプ、ウィンカー又はフォグランプ等の各種ランプの制御装置として機能する。この場合、ボディ系制御ユニット12020には、鍵を代替する携帯機から発信される電波又は各種スイッチの信号が入力され得る。ボディ系制御ユニット12020は、これらの電波又は信号の入力を受け付け、車両のドアロック装置、パワーウィンドウ装置、ランプ等を制御する。 The body system control unit 12020 controls the operation of various devices mounted on the vehicle body according to various programs. For example, the body system control unit 12020 functions as a keyless entry system, a smart key system, a power window device, or a control device for various lamps such as a head lamp, a back lamp, a brake lamp, a winker, or a fog lamp. In this case, the body system control unit 12020 may be input with radio waves transmitted from a portable device that substitutes for the key or signals of various switches. The body system control unit 12020 receives inputs of these radio waves or signals and controls a vehicle door lock device, a power window device, a lamp, and the like.
 車外情報検出ユニット12030は、車両制御システム12000を搭載した車両の外部の情報を検出する。例えば、車外情報検出ユニット12030には、撮像部12031が接続される。車外情報検出ユニット12030は、撮像部12031に車外の画像を撮像させるとともに、撮像された画像を受信する。車外情報検出ユニット12030は、受信した画像に基づいて、人、車、障害物、標識又は路面上の文字等の物体検出処理又は距離検出処理を行ってもよい。 The vehicle outside information detection unit 12030 detects information outside the vehicle equipped with the vehicle control system 12000. For example, the imaging unit 12031 is connected to the vehicle exterior information detection unit 12030. The vehicle outside information detection unit 12030 causes the image pickup unit 12031 to capture an image of the outside of the vehicle and receives the captured image. The vehicle exterior information detection unit 12030 may perform object detection processing or distance detection processing such as a person, a vehicle, an obstacle, a sign, or a character on the road surface based on the received image.
 撮像部12031は、光を受光し、その光の受光量に応じた電気信号を出力する光センサである。撮像部12031は、電気信号を画像として出力することもできるし、測距の情報として出力することもできる。また、撮像部12031が受光する光は、可視光であっても良いし、赤外線等の非可視光であっても良い。 The imaging unit 12031 is an optical sensor that receives light and outputs an electric signal according to the amount of the light received. The image pickup unit 12031 can output an electric signal as an image or can output it as distance measurement information. Further, the light received by the imaging unit 12031 may be visible light or invisible light such as infrared light.
 車内情報検出ユニット12040は、車内の情報を検出する。車内情報検出ユニット12040には、例えば、運転者の状態を検出する運転者状態検出部12041が接続される。運転者状態検出部12041は、例えば運転者を撮像するカメラを含み、車内情報検出ユニット12040は、運転者状態検出部12041から入力される検出情報に基づいて、運転者の疲労度合い又は集中度合いを算出してもよいし、運転者が居眠りをしていないかを判別してもよい。 The in-vehicle information detection unit 12040 detects the in-vehicle information. For example, a driver state detection unit 12041 that detects the driver's state is connected to the in-vehicle information detection unit 12040. The driver state detection unit 12041 includes, for example, a camera that images the driver, and the in-vehicle information detection unit 12040 determines the degree of fatigue or concentration of the driver based on the detection information input from the driver state detection unit 12041. It may be calculated, or it may be determined whether the driver is dozing.
 マイクロコンピュータ12051は、車外情報検出ユニット12030又は車内情報検出ユニット12040で取得される車内外の情報に基づいて、駆動力発生装置、ステアリング機構又は制動装置の制御目標値を演算し、駆動系制御ユニット12010に対して制御指令を出力することができる。例えば、マイクロコンピュータ12051は、車両の衝突回避あるいは衝撃緩和、車間距離に基づく追従走行、車速維持走行、車両の衝突警告、又は車両のレーン逸脱警告等を含むADAS(Advanced Driver Assistance System)の機能実現を目的とした協調制御を行うことができる。 The microcomputer 12051 calculates the control target value of the driving force generator, the steering mechanism, or the braking device based on the information inside and outside the vehicle acquired by the outside information detection unit 12030 or the inside information detection unit 12040, and the drive system control unit. A control command can be output to 12010. For example, the microcomputer 12051 realizes ADAS (Advanced Driver Assistance System) functions including vehicle collision avoidance or impact mitigation, follow-up driving based on inter-vehicle distance, vehicle speed maintenance driving, vehicle collision warning, vehicle lane deviation warning, and the like. It is possible to perform cooperative control for the purpose of.
 また、マイクロコンピュータ12051は、車外情報検出ユニット12030又は車内情報検出ユニット12040で取得される車両の周囲の情報に基づいて駆動力発生装置、ステアリング機構又は制動装置等を制御することにより、運転者の操作に拠らずに自律的に走行する自動運転等を目的とした協調制御を行うことができる。 Further, the microcomputer 12051 controls the driving force generator, the steering mechanism, the braking device, and the like based on the information around the vehicle acquired by the vehicle exterior information detection unit 12030 or the vehicle interior information detection unit 12040, so that the driver can control the vehicle. It is possible to perform coordinated control for the purpose of automatic driving, etc., which runs autonomously without depending on the operation.
 また、マイクロコンピュータ12051は、車外情報検出ユニット12030で取得される車外の情報に基づいて、ボディ系制御ユニット12020に対して制御指令を出力することができる。例えば、マイクロコンピュータ12051は、車外情報検出ユニット12030で検知した先行車又は対向車の位置に応じてヘッドランプを制御し、ハイビームをロービームに切り替える等の防眩を図ることを目的とした協調制御を行うことができる。 Further, the microcomputer 12051 can output a control command to the body system control unit 12020 based on the information outside the vehicle acquired by the vehicle exterior information detection unit 12030. For example, the microcomputer 12051 controls the headlamps according to the position of the preceding vehicle or the oncoming vehicle detected by the external information detection unit 12030, and performs coordinated control for the purpose of anti-glare such as switching the high beam to the low beam. It can be carried out.
 音声画像出力部12052は、車両の搭乗者又は車外に対して、視覚的又は聴覚的に情報を通知することが可能な出力装置へ音声及び画像のうちの少なくとも一方の出力信号を送信する。図31の例では、出力装置として、オーディオスピーカ12061、表示部12062及びインストルメントパネル12063が例示されている。表示部12062は、例えば、オンボードディスプレイ及びヘッドアップディスプレイの少なくとも一つを含んでいてもよい。 The audio image output unit 12052 transmits the output signal of at least one of the audio and the image to the output device capable of visually or audibly notifying the passenger or the outside of the vehicle of the information. In the example of FIG. 31, an audio speaker 12061, a display unit 12062, and an instrument panel 12063 are exemplified as output devices. The display unit 12062 may include, for example, at least one of an onboard display and a heads-up display.
 図32は、撮像部12031の設置位置の例を示す図である。 FIG. 32 is a diagram showing an example of the installation position of the imaging unit 12031.
 図32では、車両12100は、撮像部12031として、撮像部12101,12102,12103,12104,12105を有する。 In FIG. 32, the vehicle 12100 has imaging units 12101, 12102, 12103, 12104, 12105 as imaging units 12031.
 撮像部12101,12102,12103,12104,12105は、例えば、車両12100のフロントノーズ、サイドミラー、リアバンパ、バックドア及び車室内のフロントガラスの上部等の位置に設けられる。フロントノーズに備えられる撮像部12101及び車室内のフロントガラスの上部に備えられる撮像部12105は、主として車両12100の前方の画像を取得する。サイドミラーに備えられる撮像部12102,12103は、主として車両12100の側方の画像を取得する。リアバンパ又はバックドアに備えられる撮像部12104は、主として車両12100の後方の画像を取得する。撮像部12101及び12105で取得される前方の画像は、主として先行車両又は、歩行者、障害物、信号機、交通標識又は車線等の検出に用いられる。 The imaging units 12101, 12102, 12103, 12104, 12105 are provided at positions such as the front nose, side mirrors, rear bumpers, back doors, and the upper part of the windshield in the vehicle interior of the vehicle 12100, for example. The image pickup unit 12101 provided on the front nose and the image pickup section 12105 provided on the upper part of the windshield in the vehicle interior mainly acquire an image in front of the vehicle 12100. The imaging units 12102 and 12103 provided in the side mirrors mainly acquire images of the side of the vehicle 12100. The imaging unit 12104 provided on the rear bumper or the back door mainly acquires an image of the rear of the vehicle 12100. The images in front acquired by the imaging units 12101 and 12105 are mainly used for detecting a preceding vehicle or a pedestrian, an obstacle, a traffic light, a traffic sign, a lane, or the like.
 なお、図32には、撮像部12101ないし12104の撮影範囲の一例が示されている。撮像範囲12111は、フロントノーズに設けられた撮像部12101の撮像範囲を示し、撮像範囲12112,12113は、それぞれサイドミラーに設けられた撮像部12102,12103の撮像範囲を示し、撮像範囲12114は、リアバンパ又はバックドアに設けられた撮像部12104の撮像範囲を示す。例えば、撮像部12101ないし12104で撮像された画像データが重ね合わせられることにより、車両12100を上方から見た俯瞰画像が得られる。 Note that FIG. 32 shows an example of the photographing range of the imaging units 12101 to 12104. The imaging range 12111 indicates the imaging range of the imaging unit 12101 provided on the front nose, the imaging ranges 12112 and 12113 indicate the imaging ranges of the imaging units 12102 and 12103 provided on the side mirrors, respectively, and the imaging range 12114 indicates the imaging range of the imaging units 12102 and 12103. The imaging range of the imaging unit 12104 provided on the rear bumper or the back door is shown. For example, by superimposing the image data captured by the imaging units 12101 to 12104, a bird's-eye view image of the vehicle 12100 as viewed from above can be obtained.
 撮像部12101ないし12104の少なくとも1つは、距離情報を取得する機能を有していてもよい。例えば、撮像部12101ないし12104の少なくとも1つは、複数の撮像素子からなるステレオカメラであってもよいし、位相差検出用の画素を有する撮像素子であってもよい。 At least one of the imaging units 12101 to 12104 may have a function of acquiring distance information. For example, at least one of the image pickup units 12101 to 12104 may be a stereo camera composed of a plurality of image pickup elements, or an image pickup element having pixels for phase difference detection.
 例えば、マイクロコンピュータ12051は、撮像部12101ないし12104から得られた距離情報を基に、撮像範囲12111ないし12114内における各立体物までの距離と、この距離の時間的変化(車両12100に対する相対速度)を求めることにより、特に車両12100の進行路上にある最も近い立体物で、車両12100と略同じ方向に所定の速度(例えば、0km/h以上)で走行する立体物を先行車として抽出することができる。さらに、マイクロコンピュータ12051は、先行車の手前に予め確保すべき車間距離を設定し、自動ブレーキ制御(追従停止制御も含む)や自動加速制御(追従発進制御も含む)等を行うことができる。このように運転者の操作に拠らずに自律的に走行する自動運転等を目的とした協調制御を行うことができる。 For example, the microcomputer 12051 has a distance to each three-dimensional object within the imaging range 12111 to 12114 based on the distance information obtained from the imaging units 12101 to 12104, and a temporal change of this distance (relative velocity with respect to the vehicle 12100). By obtaining can. Further, the microcomputer 12051 can set an inter-vehicle distance to be secured in front of the preceding vehicle in advance, and can perform automatic braking control (including follow-up stop control), automatic acceleration control (including follow-up start control), and the like. In this way, it is possible to perform coordinated control for the purpose of automatic driving or the like in which the vehicle travels autonomously without depending on the operation of the driver.
 例えば、マイクロコンピュータ12051は、撮像部12101ないし12104から得られた距離情報を元に、立体物に関する立体物データを、2輪車、普通車両、大型車両、歩行者、電柱等その他の立体物に分類して抽出し、障害物の自動回避に用いることができる。例えば、マイクロコンピュータ12051は、車両12100の周辺の障害物を、車両12100のドライバが視認可能な障害物と視認困難な障害物とに識別する。そして、マイクロコンピュータ12051は、各障害物との衝突の危険度を示す衝突リスクを判断し、衝突リスクが設定値以上で衝突可能性がある状況であるときには、オーディオスピーカ12061や表示部12062を介してドライバに警報を出力することや、駆動系制御ユニット12010を介して強制減速や回避操舵を行うことで、衝突回避のための運転支援を行うことができる。 For example, the microcomputer 12051 converts three-dimensional object data related to a three-dimensional object into two-wheeled vehicles, ordinary vehicles, large vehicles, pedestrians, electric poles, and other three-dimensional objects based on the distance information obtained from the imaging units 12101 to 12104. It can be classified and extracted and used for automatic avoidance of obstacles. For example, the microcomputer 12051 distinguishes obstacles around the vehicle 12100 into obstacles that can be seen by the driver of the vehicle 12100 and obstacles that are difficult to see. Then, the microcomputer 12051 determines the collision risk indicating the risk of collision with each obstacle, and when the collision risk is equal to or higher than the set value and there is a possibility of collision, the microcomputer 12051 is used via the audio speaker 12061 or the display unit 12062. By outputting an alarm to the driver and performing forced deceleration and avoidance steering via the drive system control unit 12010, driving support for collision avoidance can be provided.
 撮像部12101ないし12104の少なくとも1つは、赤外線を検出する赤外線カメラであってもよい。例えば、マイクロコンピュータ12051は、撮像部12101ないし12104の撮像画像中に歩行者が存在するか否かを判定することで歩行者を認識することができる。かかる歩行者の認識は、例えば赤外線カメラとしての撮像部12101ないし12104の撮像画像における特徴点を抽出する手順と、物体の輪郭を示す一連の特徴点にパターンマッチング処理を行って歩行者か否かを判別する手順によって行われる。マイクロコンピュータ12051が、撮像部12101ないし12104の撮像画像中に歩行者が存在すると判定し、歩行者を認識すると、音声画像出力部12052は、当該認識された歩行者に強調のための方形輪郭線を重畳表示するように、表示部12062を制御する。また、音声画像出力部12052は、歩行者を示すアイコン等を所望の位置に表示するように表示部12062を制御してもよい。 At least one of the imaging units 12101 to 12104 may be an infrared camera that detects infrared rays. For example, the microcomputer 12051 can recognize a pedestrian by determining whether or not a pedestrian is present in the captured image of the imaging units 12101 to 12104. Such pedestrian recognition includes, for example, a procedure for extracting feature points in an image captured by an imaging unit 12101 to 12104 as an infrared camera, and a pattern matching process for a series of feature points indicating the outline of an object to determine whether or not the pedestrian is a pedestrian. It is done by the procedure to determine. When the microcomputer 12051 determines that a pedestrian is present in the captured images of the imaging units 12101 to 12104 and recognizes the pedestrian, the audio image output unit 12052 outputs a square contour line for emphasizing the recognized pedestrian. The display unit 12062 is controlled so as to superimpose and display. Further, the audio image output unit 12052 may control the display unit 12062 so as to display an icon or the like indicating a pedestrian at a desired position.
 以上、本開示に係る技術が適用され得る車両制御システムの一例について説明した。本開示に係る技術は、以上説明した構成のうち、撮像部12031および車外情報検出ユニット12030に適用され得る。具体的には、例えば、情報処理装置1bのセンサ部10bを撮像部12031に適用し、認識処理部20bを車外情報検出ユニット12030に適用する。認識処理部20bから出力された認識結果は、例えば通信ネットワーク12001を介して統合制御ユニット12050に渡される。 The above is an example of a vehicle control system to which the technology according to the present disclosure can be applied. The technique according to the present disclosure can be applied to the imaging unit 12031 and the vehicle exterior information detection unit 12030 among the configurations described above. Specifically, for example, the sensor unit 10b of the information processing device 1b is applied to the image pickup unit 12031, and the recognition processing unit 20b is applied to the vehicle exterior information detection unit 12030. The recognition result output from the recognition processing unit 20b is passed to the integrated control unit 12050 via, for example, the communication network 12001.
 このように、本開示に係る技術を撮像部12031および車外情報検出ユニット12030に適用することで、サブサンプリングによるパターンを所定の条件に応じて切り替えることが可能であると共に、認識処理に用いる認識器およびパラメータを、切り替えられたパターンに応じて変更することができる。そのため、速報性を重視した認識結果である速報結果をより高精度で得ることができ、より確実な運転支援が可能となる。 As described above, by applying the technique according to the present disclosure to the imaging unit 12031 and the vehicle exterior information detection unit 12030, it is possible to switch the pattern by subsampling according to a predetermined condition, and the recognizer used for the recognition process. And the parameters can be changed according to the switched pattern. Therefore, the breaking news result, which is the recognition result with an emphasis on breaking news, can be obtained with higher accuracy, and more reliable driving support becomes possible.
 なお、本明細書に記載された効果はあくまで例示であって限定されるものでは無く、また他の効果があってもよい。 Note that the effects described in this specification are merely examples and are not limited, and other effects may be obtained.
 なお、本技術は以下のような構成も取ることができる。
(1)
 画素によって構成される撮像情報が所定のパターンで分割された分割領域毎に設定された画素位置に従い取得されたサンプリング画素により構成されるサンプリング画像を生成する生成部と、
 前記サンプリング画像の特徴量を算出する算出部と、
 算出された前記特徴量を蓄積する蓄積部と、
 前記蓄積部に蓄積された前記特徴量の少なくとも一部の特徴量に基づき認識処理を行い認識結果を出力する認識部と、
 前記認識部が、前記蓄積部に蓄積された前記特徴量のうち所定の特徴量に基づく前記認識結果を出力するように制御する出力制御部と、
を備える情報処理装置。
(2)
 前記所定の特徴量は、
 1の前記サンプリング画像を用いて算出された前記特徴量である、
前記(1)に記載の情報処理装置。
(3)
 前記所定の特徴量は、
 1フレームの前記撮像画像の全ての画素位置のサンプリング画素の一部を含む、複数の前記サンプリング画像を用いて算出された前記特徴量である、
前記(1)に記載の情報処理装置。
(4)
 前記所定の特徴量は、
 1フレームの前記撮像情報の全ての画素位置のサンプリング画素を用いて算出された前記特徴量である、
前記(1)に記載の情報処理装置。
(5)
 前記出力制御部は、
 前記認識部が、
 前記所定の特徴量として1の前記サンプリング画像を用いて算出した第1の特徴量と、
 前記所定の特徴量として1フレームの前記撮像画像の全ての画素位置のサンプリング画素の一部を含む、複数の前記サンプリング画像を用いて算出した第2の特徴量と、
 前記所定の特徴量として1フレームの前記撮像情報の全ての画素位置のサンプリング画素を用いて算出した第3の特徴量と、
のうち何れに基づく前記認識結果を出力するかを決定する、
前記(1)に記載の情報処理装置。
(6)
 前記出力制御部は、
 前記第1の特徴量と、前記第2の特徴量と、前記第3の特徴量と、のうち何れの特徴量に基づく前記認識結果を出力するかを、前記認識処理に対して予め設定された事前情報に基づき決定する、
前記(5)に記載の情報処理装置。
(7)
 前記出力制御部は、
 前記第1の特徴量と、前記第2の特徴量と、前記第3の特徴量と、のうち何れの特徴量に基づく前記認識結果を出力するかを、前記認識処理の中間結果に基づき決定する、
前記(5)に記載の情報処理装置。
(8)
 前記認識結果を累積する累積部をさらに備え、
 前記出力制御部は、
 前記第1の特徴量と、前記第2の特徴量と、前記第3の特徴量と、のうち何れの特徴量に基づく前記認識結果を出力するかを、前記累積部に累積された前記認識結果に基づき決定する、
前記(5)に記載の情報処理装置。
(9)
 前記出力制御部は、
 1の前記サンプリング画像を用いて算出された前記特徴量に基づく前記認識結果に応じて、前記撮像情報のうち、該サンプリング画像が取得されたフレームに対して時系列上で後に取得される1以上のフレームにおける、前記算出部による前記算出の処理と、前記蓄積部による前記蓄積の処理と、前記認識部による前記認識処理と、のうち少なくとも1つの処理の実行の要否を決定する、
前記(1)に記載の情報処理装置。
(10)
 前記出力制御部は、
 1フレームの前記撮像画像の全ての画素位置のサンプリング画素の一部を含む、複数の前記サンプリング画像を用いて算出された前記特徴量に基づく前記認識結果に応じて、前記撮像情報のうち、該サンプリング画像が取得されたフレームに対して時系列で後に取得される1以上のフレームにおける、前記算出部による前記算出の処理と、前記蓄積部による前記蓄積の処理と、前記認識部による前記認識処理と、のうち少なくとも1つの処理の実行の要否を決定する、
前記(1)に記載の情報処理装置。
(11)
 前記認識部は、
 前記蓄積部に蓄積された複数の特徴量を統合した統合特徴量に基づき前記認識処理を行う、
前記(1)乃至(10)の何れかに記載の情報処理装置。
(12)
 前記認識部は、
 前記撮像情報の取得に応じて前記算出部により算出された前記特徴量を、該取得の直前までに前記蓄積部に蓄積された特徴量の少なくとも一部の特徴量と統合し、統合された特徴量に基づき前記認識処理を行う、
前記(11)に記載の情報処理装置。
(13)
 前記認識部は、
 前記分割領域それぞれの前記画素位置に対応する前記画素毎の教師データに基づき、前記サンプリング画像の前記特徴量に対する前記認識処理を行う、
前記(1)乃至(12)の何れかに記載の情報処理装置。
(14)
 前記認識部は、
 前記撮像情報のうち、第1の撮像情報に設定された前記サンプリング画素と、時系列で該第1の撮像情報の次に取得される第2の撮像情報に設定された前記サンプリング画素と、を用いたRNN(Recurrent Neural Network)により機械学習処理を実行し、該機械学習処理の結果に基づき前記認識処理を行う、
前記(1)乃至(13)の何れかに記載の情報処理装置。
(15)
 プロッサにより実行される、
 画素によって構成される撮像情報が所定のパターンで分割された分割領域毎に設定された画素位置に従い取得されたサンプリング画素により構成されるサンプリング画像を生成する生成ステップと、
 前記サンプリング画像の特徴量を算出する算出ステップと、
 算出された前記特徴量を蓄積する蓄積ステップと、
 前記蓄積ステップにより蓄積された前記特徴量の少なくとも一部の特徴量に基づき認識処理を行い認識結果を出力する認識ステップと、
 前記認識ステップが、前記蓄積ステップにより蓄積された前記特徴量のうち所定の特徴量に基づく前記認識結果を出力するように制御する出力制御ステップと、
を有する情報処理方法。
(16)
 画素によって構成される撮像情報が所定のパターンで分割された分割領域毎に設定された画素位置に従い取得されたサンプリング画素により構成されるサンプリング画像を生成する生成ステップと、
 前記サンプリング画像の特徴量を算出する算出ステップと、
 算出された前記特徴量を蓄積する蓄積ステップと、
 前記蓄積ステップにより蓄積された前記特徴量の少なくとも一部の特徴量に基づき認識処理を行い認識結果を出力する認識ステップと、
 前記認識ステップが、前記蓄積ステップにより蓄積された前記特徴量のうち所定の特徴量に基づく前記認識結果を出力するように制御する出力制御ステップと、
をコンピュータに実行させるための情報処理プログラム。
The present technology can also have the following configurations.
(1)
A generation unit that generates a sampled image composed of sampling pixels obtained according to pixel positions set for each division area in which imaging information composed of pixels is divided in a predetermined pattern.
A calculation unit that calculates the feature amount of the sampled image,
A storage unit that accumulates the calculated features and
A recognition unit that performs recognition processing based on at least a part of the feature amounts accumulated in the storage unit and outputs a recognition result.
An output control unit that controls the recognition unit to output the recognition result based on a predetermined feature amount among the feature amounts stored in the storage unit.
Information processing device equipped with.
(2)
The predetermined feature amount is
The feature amount calculated using the sampled image of 1.
The information processing device according to (1) above.
(3)
The predetermined feature amount is
It is the feature amount calculated using a plurality of the sampled images including a part of the sampled pixels at all the pixel positions of the captured image in one frame.
The information processing device according to (1) above.
(4)
The predetermined feature amount is
It is the feature amount calculated by using the sampling pixels of all the pixel positions of the imaging information in one frame.
The information processing device according to (1) above.
(5)
The output control unit
The recognition unit
A first feature amount calculated using the sampled image of 1 as the predetermined feature amount, and
A second feature amount calculated using a plurality of the sampled images, which includes a part of sampled pixels at all pixel positions of the captured image in one frame as the predetermined feature amount, and a second feature amount.
As the predetermined feature amount, a third feature amount calculated by using sampling pixels at all pixel positions of the imaging information in one frame and a third feature amount.
To determine which of the above recognition results is to be output.
The information processing device according to (1) above.
(6)
The output control unit
Which of the first feature amount, the second feature amount, and the third feature amount to output the recognition result based on the feature amount is set in advance for the recognition process. Determined based on prior information
The information processing device according to (5) above.
(7)
The output control unit
Based on the intermediate result of the recognition process, it is determined which of the first feature amount, the second feature amount, and the third feature amount is used to output the recognition result. do,
The information processing device according to (5) above.
(8)
Further provided with a cumulative unit for accumulating the recognition results,
The output control unit
The recognition accumulated in the cumulative unit indicates which of the first feature amount, the second feature amount, and the third feature amount is used to output the recognition result. Determine based on the results,
The information processing device according to (5) above.
(9)
The output control unit
According to the recognition result based on the feature amount calculated using the sampled image of 1, one or more of the imaging information to be acquired later in time series with respect to the frame from which the sampled image was acquired. In the frame, it is determined whether or not at least one of the calculation process by the calculation unit, the accumulation process by the storage unit, and the recognition process by the recognition unit needs to be executed.
The information processing device according to (1) above.
(10)
The output control unit
Of the captured information, the The calculation process by the calculation unit, the accumulation process by the storage unit, and the recognition process by the recognition unit in one or more frames acquired later in time series with respect to the frame from which the sampled image was acquired. To determine the necessity of executing at least one of the processes,
The information processing device according to (1) above.
(11)
The recognition unit
The recognition process is performed based on the integrated feature amount that integrates the plurality of feature amounts accumulated in the storage unit.
The information processing device according to any one of (1) to (10).
(12)
The recognition unit
The feature amount calculated by the calculation unit in response to the acquisition of the imaging information is integrated with at least a part of the feature amount accumulated in the storage unit by the time immediately before the acquisition, and the integrated feature is integrated. Perform the recognition process based on the amount.
The information processing device according to (11) above.
(13)
The recognition unit
Based on the teacher data for each pixel corresponding to the pixel position in each of the divided regions, the recognition process for the feature amount of the sampled image is performed.
The information processing device according to any one of (1) to (12).
(14)
The recognition unit
Among the imaging information, the sampling pixels set in the first imaging information and the sampling pixels set in the second imaging information acquired next to the first imaging information in time series are included. Machine learning processing is executed by the used RNN (Recurrent Neural Network), and the recognition processing is performed based on the result of the machine learning processing.
The information processing device according to any one of (1) to (13).
(15)
Performed by Prossa,
A generation step of generating a sampling image composed of sampling pixels obtained according to a pixel position set for each division area in which imaging information composed of pixels is divided in a predetermined pattern, and a generation step.
A calculation step for calculating the feature amount of the sampled image and
Accumulation steps for accumulating the calculated features and
A recognition step that performs recognition processing based on at least a part of the features accumulated by the accumulation step and outputs a recognition result.
An output control step that controls the recognition step to output the recognition result based on a predetermined feature amount among the feature amounts accumulated by the accumulation step.
Information processing method having.
(16)
A generation step of generating a sampling image composed of sampling pixels obtained according to a pixel position set for each division area in which imaging information composed of pixels is divided in a predetermined pattern, and a generation step.
A calculation step for calculating the feature amount of the sampled image and
Accumulation steps for accumulating the calculated features and
A recognition step that performs recognition processing based on at least a part of the features accumulated by the accumulation step and outputs a recognition result.
An output control step that controls the recognition step to output the recognition result based on a predetermined feature amount among the feature amounts accumulated by the accumulation step.
An information processing program that allows a computer to execute.
1a,1b,1c,1d 情報処理装置
10a,10b,10c,10d センサ部
20a,20b,20d 認識処理部
30a,30b 撮像画像
32,32a,32a’,32b,32c,32d 画像データ
35,35’ 分割領域
36,36φ1,36φ1’,36φ2,36φ3,36φ4,36φ01,36φx サンプリング画像
50a,50a’,50b,50c,50d 特徴量
210 前処理部
211 利用領域取得部
212 認識結果出力設定部
213 認識結果出力演算部
214 蓄積部
220 認識部
221 特徴量算出部
222 特徴量蓄積制御部
223 特徴量蓄積部
224 認識処理実行部
240 読出制御部
300,300φ1,300φ2,300φ3,300φ4 画素
1a, 1b, 1c, 1d Information processing device 10a, 10b, 10c, 10d Sensor unit 20a, 20b, 20d Recognition processing unit 30a, 30b Captured image 32, 32a, 32a', 32b, 32c, 32d Image data 35, 35' Divided area 36, 36φ1, 36φ1', 36φ2, 36φ3, 36φ4, 36φ01, 36φx Sampling image 50a, 50a', 50b, 50c, 50d Feature 210 Preprocessing unit 211 Utilized area acquisition unit 212 Recognition result Output setting unit 213 Recognition result Output calculation unit 214 Accumulation unit 220 Recognition unit 221 Feature amount calculation unit 222 Feature amount accumulation control unit 223 Feature amount accumulation unit 224 Recognition processing execution unit 240 Read control unit 300, 300φ1, 300φ2, 300φ3, 300φ4 pixels

Claims (16)

  1.  画素によって構成される撮像情報が所定のパターンで分割された分割領域毎に設定された画素位置に従い取得されたサンプリング画素により構成されるサンプリング画像を生成する生成部と、
     前記サンプリング画像の特徴量を算出する算出部と、
     算出された前記特徴量を蓄積する蓄積部と、
     前記蓄積部に蓄積された前記特徴量の少なくとも一部の特徴量に基づき認識処理を行い認識結果を出力する認識部と、
     前記認識部が、前記蓄積部に蓄積された前記特徴量のうち所定の特徴量に基づく前記認識結果を出力するように制御する出力制御部と、
    を備える情報処理装置。
    A generation unit that generates a sampled image composed of sampling pixels obtained according to pixel positions set for each division area in which imaging information composed of pixels is divided in a predetermined pattern.
    A calculation unit that calculates the feature amount of the sampled image,
    A storage unit that accumulates the calculated features and
    A recognition unit that performs recognition processing based on at least a part of the feature amounts accumulated in the storage unit and outputs a recognition result.
    An output control unit that controls the recognition unit to output the recognition result based on a predetermined feature amount among the feature amounts stored in the storage unit.
    Information processing device equipped with.
  2.  前記所定の特徴量は、
     1の前記サンプリング画像を用いて算出された前記特徴量である、
    請求項1に記載の情報処理装置。
    The predetermined feature amount is
    The feature amount calculated using the sampled image of 1.
    The information processing device according to claim 1.
  3.  前記所定の特徴量は、
     1フレームの前記撮像画像の全ての画素位置のサンプリング画素の一部を含む、複数の前記サンプリング画像を用いて算出された前記特徴量である、
    請求項1に記載の情報処理装置。
    The predetermined feature amount is
    It is the feature amount calculated using a plurality of the sampled images including a part of the sampled pixels at all the pixel positions of the captured image in one frame.
    The information processing device according to claim 1.
  4.  前記所定の特徴量は、
     1フレームの前記撮像情報の全ての画素位置のサンプリング画素を用いて算出された前記特徴量である、
    請求項1に記載の情報処理装置。
    The predetermined feature amount is
    It is the feature amount calculated by using the sampling pixels of all the pixel positions of the imaging information in one frame.
    The information processing device according to claim 1.
  5.  前記出力制御部は、
     前記認識部が、
     前記所定の特徴量として1の前記サンプリング画像を用いて算出した第1の特徴量と、
     前記所定の特徴量として1フレームの前記撮像画像の全ての画素位置のサンプリング画素の一部を含む、複数の前記サンプリング画像を用いて算出した第2の特徴量と、
     前記所定の特徴量として1フレームの前記撮像情報の全ての画素位置のサンプリング画素を用いて算出した第3の特徴量と、
    のうち何れに基づく前記認識結果を出力するかを決定する、
    請求項1に記載の情報処理装置。
    The output control unit
    The recognition unit
    A first feature amount calculated using the sampled image of 1 as the predetermined feature amount, and
    A second feature amount calculated using a plurality of the sampled images, which includes a part of sampled pixels at all pixel positions of the captured image in one frame as the predetermined feature amount, and a second feature amount.
    As the predetermined feature amount, a third feature amount calculated by using sampling pixels at all pixel positions of the imaging information in one frame and a third feature amount.
    To determine which of the above recognition results is to be output.
    The information processing device according to claim 1.
  6.  前記出力制御部は、
     前記第1の特徴量と、前記第2の特徴量と、前記第3の特徴量と、のうち何れの特徴量に基づく前記認識結果を出力するかを、前記認識処理に対して予め設定された事前情報に基づき決定する、
    請求項5に記載の情報処理装置。
    The output control unit
    Which of the first feature amount, the second feature amount, and the third feature amount to output the recognition result based on the feature amount is set in advance for the recognition process. Determined based on prior information
    The information processing device according to claim 5.
  7.  前記出力制御部は、
     前記第1の特徴量と、前記第2の特徴量と、前記第3の特徴量と、のうち何れの特徴量に基づく前記認識結果を出力するかを、前記認識処理の中間結果に基づき決定する、
    請求項5に記載の情報処理装置。
    The output control unit
    Based on the intermediate result of the recognition process, it is determined which of the first feature amount, the second feature amount, and the third feature amount is used to output the recognition result. do,
    The information processing device according to claim 5.
  8.  前記認識結果を累積する累積部をさらに備え、
     前記出力制御部は、
     前記第1の特徴量と、前記第2の特徴量と、前記第3の特徴量と、のうち何れの特徴量に基づく前記認識結果を出力するかを、前記累積部に累積された前記認識結果に基づき決定する、
    請求項5に記載の情報処理装置。
    Further provided with a cumulative unit for accumulating the recognition results,
    The output control unit
    The recognition accumulated in the cumulative unit indicates which of the first feature amount, the second feature amount, and the third feature amount is used to output the recognition result. Determine based on the results,
    The information processing device according to claim 5.
  9.  前記出力制御部は、
     1の前記サンプリング画像を用いて算出された前記特徴量に基づく前記認識結果に応じて、前記撮像情報のうち、該サンプリング画像が取得されたフレームに対して時系列上で後に取得される1以上のフレームにおける、前記算出部による前記算出の処理と、前記蓄積部による前記蓄積の処理と、前記認識部による前記認識処理と、のうち少なくとも1つの処理の実行の要否を決定する、
    請求項1に記載の情報処理装置。
    The output control unit
    According to the recognition result based on the feature amount calculated using the sampled image of 1, one or more of the imaging information to be acquired later in time series with respect to the frame from which the sampled image was acquired. In the frame, it is determined whether or not at least one of the calculation process by the calculation unit, the accumulation process by the storage unit, and the recognition process by the recognition unit needs to be executed.
    The information processing device according to claim 1.
  10.  前記出力制御部は、
     1フレームの前記撮像画像の全ての画素位置のサンプリング画素の一部を含む、複数の前記サンプリング画像を用いて算出された前記特徴量に基づく前記認識結果に応じて、前記撮像情報のうち、該サンプリング画像が取得されたフレームに対して時系列で後に取得される1以上のフレームにおける、前記算出部による前記算出の処理と、前記蓄積部による前記蓄積の処理と、前記認識部による前記認識処理と、のうち少なくとも1つの処理の実行の要否を決定する、
    請求項1に記載の情報処理装置。
    The output control unit
    Of the captured information, the The calculation process by the calculation unit, the accumulation process by the storage unit, and the recognition process by the recognition unit in one or more frames acquired later in time series with respect to the frame from which the sampled image was acquired. To determine the necessity of executing at least one of the processes,
    The information processing device according to claim 1.
  11.  前記認識部は、
     前記蓄積部に蓄積された複数の特徴量を統合した統合特徴量に基づき前記認識処理を行う、
    請求項1に記載の情報処理装置。
    The recognition unit
    The recognition process is performed based on the integrated feature amount that integrates the plurality of feature amounts accumulated in the storage unit.
    The information processing device according to claim 1.
  12.  前記認識部は、
     前記撮像情報の取得に応じて前記算出部により算出された前記特徴量を、該取得の直前までに前記蓄積部に蓄積された特徴量の少なくとも一部の特徴量と統合し、統合された特徴量に基づき前記認識処理を行う、
    請求項11に記載の情報処理装置。
    The recognition unit
    The feature amount calculated by the calculation unit in response to the acquisition of the imaging information is integrated with at least a part of the feature amount accumulated in the storage unit by the time immediately before the acquisition, and the integrated feature is integrated. Perform the recognition process based on the amount.
    The information processing device according to claim 11.
  13.  前記認識部は、
     前記分割領域それぞれの前記画素位置に対応する前記画素毎の教師データに基づき、前記サンプリング画像の前記特徴量に対する前記認識処理を行う、
    請求項1に記載の情報処理装置。
    The recognition unit
    Based on the teacher data for each pixel corresponding to the pixel position in each of the divided regions, the recognition process for the feature amount of the sampled image is performed.
    The information processing device according to claim 1.
  14.  前記認識部は、
     前記撮像情報のうち、第1の撮像情報に設定された前記サンプリング画素と、時系列で該第1の撮像情報の次に取得される第2の撮像情報に設定された前記サンプリング画素と、を用いたRNN(Recurrent Neural Network)により機械学習処理を実行し、該機械学習処理の結果に基づき前記認識処理を行う、
    請求項1に記載の情報処理装置。
    The recognition unit
    Among the imaging information, the sampling pixels set in the first imaging information and the sampling pixels set in the second imaging information acquired next to the first imaging information in time series are included. Machine learning processing is executed by the used RNN (Recurrent Neural Network), and the recognition processing is performed based on the result of the machine learning processing.
    The information processing device according to claim 1.
  15.  プロッサにより実行される、
     画素によって構成される撮像情報が所定のパターンで分割された分割領域毎に設定された画素位置に従い取得されたサンプリング画素により構成されるサンプリング画像を生成する生成ステップと、
     前記サンプリング画像の特徴量を算出する算出ステップと、
     算出された前記特徴量を蓄積する蓄積ステップと、
     前記蓄積ステップにより蓄積された前記特徴量の少なくとも一部の特徴量に基づき認識処理を行い認識結果を出力する認識ステップと、
     前記認識ステップが、前記蓄積ステップにより蓄積された前記特徴量のうち所定の特徴量に基づく前記認識結果を出力するように制御する出力制御ステップと、
    を有する情報処理方法。
    Performed by Prossa,
    A generation step of generating a sampling image composed of sampling pixels obtained according to a pixel position set for each division area in which imaging information composed of pixels is divided in a predetermined pattern, and a generation step.
    A calculation step for calculating the feature amount of the sampled image and
    Accumulation steps for accumulating the calculated features and
    A recognition step that performs recognition processing based on at least a part of the features accumulated by the accumulation step and outputs a recognition result.
    An output control step that controls the recognition step to output the recognition result based on a predetermined feature amount among the feature amounts accumulated by the accumulation step.
    Information processing method having.
  16.  画素によって構成される撮像情報が所定のパターンで分割された分割領域毎に設定された画素位置に従い取得されたサンプリング画素により構成されるサンプリング画像を生成する生成ステップと、
     前記サンプリング画像の特徴量を算出する算出ステップと、
     算出された前記特徴量を蓄積する蓄積ステップと、
     前記蓄積ステップにより蓄積された前記特徴量の少なくとも一部の特徴量に基づき認識処理を行い認識結果を出力する認識ステップと、
     前記認識ステップが、前記蓄積ステップにより蓄積された前記特徴量のうち所定の特徴量に基づく前記認識結果を出力するように制御する出力制御ステップと、
    をコンピュータに実行させるための情報処理プログラム。
    A generation step of generating a sampling image composed of sampling pixels obtained according to a pixel position set for each division area in which imaging information composed of pixels is divided in a predetermined pattern, and a generation step.
    A calculation step for calculating the feature amount of the sampled image and
    Accumulation steps for accumulating the calculated features and
    A recognition step that performs recognition processing based on at least a part of the features accumulated by the accumulation step and outputs a recognition result.
    An output control step that controls the recognition step to output the recognition result based on a predetermined feature amount among the feature amounts accumulated by the accumulation step.
    An information processing program that allows a computer to execute.
PCT/JP2021/011644 2020-03-30 2021-03-22 Information processing device, information processing method, and information processing program WO2021200329A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-060853 2020-03-30
JP2020060853 2020-03-30

Publications (1)

Publication Number Publication Date
WO2021200329A1 true WO2021200329A1 (en) 2021-10-07

Family

ID=77928397

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/011644 WO2021200329A1 (en) 2020-03-30 2021-03-22 Information processing device, information processing method, and information processing program

Country Status (1)

Country Link
WO (1) WO2021200329A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009005025A1 (en) * 2007-07-03 2009-01-08 Konica Minolta Holdings, Inc. Moving object detection device
JP2012053606A (en) * 2010-08-31 2012-03-15 Sony Corp Information processor, method and program
WO2019135270A1 (en) * 2018-01-04 2019-07-11 株式会社ソシオネクスト Motion video analysis device, motion video analysis system, motion video analysis method, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009005025A1 (en) * 2007-07-03 2009-01-08 Konica Minolta Holdings, Inc. Moving object detection device
JP2012053606A (en) * 2010-08-31 2012-03-15 Sony Corp Information processor, method and program
WO2019135270A1 (en) * 2018-01-04 2019-07-11 株式会社ソシオネクスト Motion video analysis device, motion video analysis system, motion video analysis method, and program

Similar Documents

Publication Publication Date Title
JP7105754B2 (en) IMAGING DEVICE AND METHOD OF CONTROLLING IMAGING DEVICE
JP7424051B2 (en) Solid-state imaging device, imaging device, imaging method, and imaging program
US20210218923A1 (en) Solid-state imaging device and electronic device
WO2022019026A1 (en) Information processing device, information processing system, information processing method, and information processing program
WO2018139187A1 (en) Solid-state image capturing device, method for driving same, and electronic device
WO2021200330A1 (en) Information processing device, information processing method, and information processing program
WO2021200329A1 (en) Information processing device, information processing method, and information processing program
WO2021200199A1 (en) Information processing device, information processing method, and information processing program
WO2017212722A1 (en) Control apparatus and control method
WO2022019025A1 (en) Information processing device, information processing system, information processing method, and information processing program
WO2024135307A1 (en) Solid-state imaging device
US20240078803A1 (en) Information processing apparatus, information processing method, computer program, and sensor apparatus
WO2020090272A1 (en) Electronic circuit, solid-state imaging element, and method for manufacturing electronic circuit
JP2024090345A (en) Photodetection device and method for controlling photodetection device
KR20240035570A (en) Solid-state imaging devices and methods of operating solid-state imaging devices
JP2022123986A (en) Solid-state imaging element, imaging method, and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21779380

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21779380

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP