WO2021200329A1

WO2021200329A1 - Information processing device, information processing method, and information processing program

Info

Publication number: WO2021200329A1
Application number: PCT/JP2021/011644
Authority: WO
Inventors: 佑介日永田; 卓青木; 竜太佐藤
Original assignee: ソニーグループ株式会社
Priority date: 2020-03-30
Filing date: 2021-03-22
Publication date: 2021-10-07

Abstract

Provided are an information processing device, an information processing method, and an information processing program by which characteristics of a recognition process, in which a captured image is used, can be improved. The information processing device of the present invention is provided with: a generation unit (211) that generates a sampling image constituted by sampling pixels acquired according to pixel positions set for each divided region in which captured image information constituted by pixels is divided by a predetermined pattern; a calculation unit (221) that calculates a feature amount of the sampling image; an accumulation unit (223) that accumulates calculated feature amounts; a recognition unit (224) that carries out recognition processing on the basis of at least a portion of the feature amounts accumulated by the accumulation unit, and outputs a recognition result; and an output control unit (212) that performs control such that the recognition unit outputs a recognition result based on a predetermined feature amount from among the feature amounts accumulated by the accumulation unit.

Description

Information processing equipment, information processing methods and information processing programs

This disclosure relates to an information processing device, an information processing method, and an information processing program.

In recent years, with the increase in resolution of image pickup devices such as digital still cameras, digital video cameras, and small cameras mounted on multifunctional mobile phones (smartphones), image recognition that recognizes a predetermined object included in a captured image is recognized. Information processing devices equipped with functions have been developed.

JP-A-2017-112409

In the image recognition function, it is possible to improve the detection performance of an object by using a higher resolution captured image. However, in the conventional technique, image recognition using a high-resolution captured image requires a large amount of calculation related to the image recognition process, and it is difficult to improve the simultaneity of the recognition process for the captured image.

An object of the present disclosure is to provide an information processing device, an information processing method, and an information processing program capable of improving the characteristics of recognition processing using captured images.

The information processing apparatus according to the present disclosure is a generation unit that generates a sampled image composed of sampled pixels in which imaging information composed of pixels is acquired according to pixel positions set for each divided region divided by a predetermined pattern. The recognition process is performed based on the calculation unit for calculating the feature amount of the sampled image, the storage unit for accumulating the calculated feature amount, and the feature amount of at least a part of the feature amount accumulated in the storage unit, and the recognition result is obtained. A recognition unit for output and an output control unit for controlling the recognition unit to output a recognition result based on a predetermined feature amount among the feature amounts stored in the storage unit are provided.

It is a block diagram which shows the basic configuration example of the information processing apparatus applied to each embodiment. It is a figure which shows typically the example of the recognition processing by DNN. It is a figure which shows typically the example of the recognition processing by DNN. It is a figure which shows typically the 1st example of the identification processing by DNN when the time series information is used. It is a figure which shows typically the 1st example of the identification processing by DNN when the time series information is used. It is a figure which shows typically the 2nd example of the identification processing by DNN when the time series information is used. It is a figure which shows typically the 2nd example of the identification processing by DNN when the time series information is used. It is a block diagram which shows schematic the hardware configuration example of the image pickup apparatus as an information processing apparatus applicable to each embodiment. It is a figure which shows the example which formed the imaging part by the laminated type CIS of a two-layer structure. It is a figure which shows the example which formed the imaging part 1200 by the laminated type CIS of a three-layer structure. It is a block diagram which shows the structure of an example of the imaging unit applicable to each embodiment. It is a figure for demonstrating the resolution of the image used for recognition processing. It is a figure for demonstrating the resolution of the image used for recognition processing. It is a block diagram which shows the structure of an example of the information processing apparatus which concerns on 1st Embodiment of this disclosure. It is a schematic diagram for demonstrating the recognition process by the recognizer which concerns on 1st Embodiment. It is a schematic diagram for demonstrating the sampling process which concerns on 1st Embodiment. It is a figure for demonstrating more concretely about the recognition process by the recognizer which concerns on 1st Embodiment. It is a figure for demonstrating more concretely about the recognition process by the recognizer which concerns on 1st Embodiment. It is a figure for demonstrating more concretely about the recognition process by the recognizer which concerns on 1st Embodiment. It is a figure for demonstrating more concretely about the recognition process by the recognizer which concerns on 1st Embodiment. It is a figure for demonstrating more concretely about the recognition process by the recognizer which concerns on 1st Embodiment. It is a schematic diagram for demonstrating the subsampling process in the recognition process which concerns on 1st Embodiment. It is a schematic diagram for demonstrating the subsampling process in the recognition process which concerns on 1st Embodiment. It is a schematic diagram for demonstrating the basic architecture of recognition processing which concerns on the existing technology. It is a schematic diagram for demonstrating the basic architecture of recognition processing which concerns on each embodiment. It is an example time chart which shows the 1st example of the read | recognition process in the basic architecture of the recognition process which concerns on each embodiment. It is an example time chart which shows the 2nd example of the read | recognition process in the basic architecture of the recognition process which concerns on each embodiment. It is an example time chart comparing the intraframe processing and the processing by the existing technology. It is a figure which shows each recognition result by the intraframe processing schematically. It is a figure which shows typically the structure of the example of the recognizer which concerns on 1st Embodiment. It is an example time chart which shows the example of the timing which determines how to output the recognition result which concerns on 1st Embodiment. It is a functional block diagram of an example for demonstrating the more detailed function of the pre-processing part which concerns on 1st Embodiment. It is a functional block diagram of an example for explaining the more detailed function of the recognition part which concerns on 1st Embodiment. It is an example flowchart which shows the recognition process which concerns on 1st Embodiment. It is a flowchart of an example which shows the recognition process which concerns on the 1st modification of 1st Embodiment. It is a schematic diagram for demonstrating the recognition process which concerns on 1st modification of 1st Embodiment. It is an example time chart for demonstrating the effect of the recognition process which concerns on the 1st modification of 1st Embodiment. It is a flowchart of an example which shows the recognition process which concerns on the 2nd modification of 1st Embodiment. It is a schematic diagram for demonstrating the recognition process which concerns on the 2nd modification of 1st Embodiment. It is a block diagram which shows the structure of an example of the information processing apparatus which concerns on 2nd Embodiment. It is a block diagram which shows the structure of an example of the information processing apparatus which concerns on the modification of 2nd Embodiment. It is a figure which shows the 1st Embodiment and each modification | use example which uses the information processing apparatus which concerns on 2nd Embodiment and the modification. It is a block diagram which shows an example of the schematic structure of a vehicle control system. It is explanatory drawing which shows an example of the installation position of the vehicle exterior information detection unit and the image pickup unit.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In the following embodiments, the same parts are designated by the same reference numerals, so that duplicate description will be omitted.

Hereinafter, embodiments of the present disclosure will be described in the following order.
1. 1. Techniques applicable to each embodiment 1-0. Outline of recognition processing applicable to each embodiment 1-1. Hardware configuration applicable to each embodiment 1-1-1. Configuration example of the imaging unit applicable to each embodiment 1-1-2. About the resolution of the captured image 1-2. Outline of recognition processing that is the premise of each embodiment 1-2-1. Configuration related to the prerequisite technology of each embodiment 1-2-1-1. Outline of the configuration applicable to the prerequisite technology of each embodiment 1-2-1-2. Examples of recognition processing related to the prerequisite technology of each embodiment 1-2-1-3. Subsampling processing related to the prerequisite technology of each embodiment 1-3. Basic architecture of recognition processing according to each embodiment 1-3-1. More specific configuration 1-3-1-1. First example 1-3-1-2. Second example 2. First Embodiment 2-1. Outline of the first embodiment 2-2. More specific configuration example according to the first embodiment 2-3. More specific processing according to the first embodiment 2-4. First modification of the first embodiment 2-5. Second modification of the first embodiment 2-6. Another modification of the first embodiment 3. Second Embodiment 3-1. Modification example of the second embodiment 4. Third Embodiment 4-1. Application example of the technology of the present disclosure 4-2. Application example to mobile

[1. Technology applicable to each embodiment]
First, in order to facilitate understanding, the techniques applicable to each embodiment will be schematically described.

(1-0. Outline of recognition processing applicable to each embodiment)
FIG. 1 is a block diagram showing a basic configuration example of an information processing apparatus applicable to each embodiment. In FIG. 1, the information processing device 1a includes a sensor unit 10a and a recognition processing unit 20a. Although not shown, the sensor unit 10a includes an imaging means (camera) and an imaging control unit that controls the imaging means.

The sensor unit 10a performs imaging under the control of the imaging control unit, and supplies the image data of the captured image acquired by the imaging to the recognition processing unit 20a. The recognition processing unit 20a uses DNN (Deep Neural Network) to perform recognition processing on image data. More specifically, the recognition processing unit 20a includes a recognition model pre-learned using predetermined teacher data by machine learning, and DNN based on the recognition model with respect to the image data supplied from the sensor unit 10a. Performs recognition processing using. The recognition processing unit 20a outputs the recognition result of the recognition processing to, for example, the outside of the information processing device 1a.

2A and 2B are diagrams schematically showing an example of recognition processing by DNN. In this example, one image is input to the DNN as shown in FIG. 2A. In DNN, recognition processing is performed on the input image, and the recognition result is output.

The process of FIG. 2A will be described in more detail with reference to FIG. 2B. As shown in FIG. 2B, the DNN executes the feature extraction process and the recognition process. In DNN, the feature amount is extracted from the input image by the feature extraction process. This feature extraction process is performed using, for example, CNN (Convolutional Neural Network) of DNN. Further, in the DNN, the recognition process is executed on the extracted feature amount, and the recognition result is obtained.

In DNN, recognition processing can be executed using time-series information. 3A and 3B are diagrams schematically showing an example of identification processing by DNN when time series information is used. In the examples of FIGS. 3A and 3B, identification processing by DNN is performed using a fixed number of past information on the time series. In the example of FIG. 3A, the image of the time T [T], the image of the time T-1 before the time T [T-1], and the image of the time T-2 before the time T-1 [T-2]. ] And is input to DNN. In the DNN, the identification process is executed for each of the input images [T], [T-1] and [T-2], and the recognition result [T] at the time T is obtained.

FIG. 3B is a diagram for explaining the process of FIG. 3A in more detail. As shown in FIG. 3B, in DNN, for each of the input images [T], [T-1], and [T-2], a pair of feature extraction processes described with reference to FIG. 2B described above is performed. 1 is executed, and the feature quantities corresponding to the images [T], [T-1] and [T-2] are extracted. In DNN, each feature amount obtained based on these images [T], [T-1] and [T-2] is integrated, identification processing is executed for the integrated feature amount, and recognition at time T is performed. The result [T] is obtained. It can be said that each feature amount obtained based on the images [T], [T-1] and [T-2] is intermediate data for obtaining an integrated feature amount used in the recognition process.

4A and 4B are diagrams schematically showing another example of identification processing by DNN when time series information is used. In the example of FIG. 4A, the image [T] of the time T is input to the DNN whose internal state is updated to the state of the time T-1, and the recognition result [T] at the time T is obtained.

FIG. 4B is a diagram for explaining the process of FIG. 4A in more detail. As shown in FIG. 4B, in the DNN, the feature extraction process described with reference to FIG. 2B described above is executed on the input time T image [T], and the feature amount corresponding to the image [T] is obtained. Extract. In the DNN, the internal state is updated by the image before the time T, and the feature amount related to the updated internal state is stored. The feature amount related to the stored internal information and the feature amount in the image [T] are integrated, and the identification process is executed for the integrated feature amount. In this case, it can be said that each of the feature amount related to the stored internal information and the feature amount in the image [T] are intermediate data for obtaining the integrated feature amount used in the recognition process.

The identification process shown in FIGS. 4A and 4B is executed using, for example, a DNN whose internal state has been updated using the immediately preceding recognition result, and is a recursive process. A DNN that performs recursive processing in this way is called an RNN (Recurrent Neural Network). The identification process by RNN is generally used for moving image recognition or the like, and it is possible to improve the identification accuracy by sequentially updating the internal state of the DNN by, for example, a frame image updated in time series. ..

(1-1. Hardware configuration applicable to each embodiment)
FIG. 5 is a block diagram schematically showing a hardware configuration example of an information processing device applicable to each embodiment. In FIG. 5, the information processing apparatus 1 includes an imaging unit 1200, a memory 1202, a DSP (Digital Signal Processor) 1203, and an interface (I / F) 1204, which are communicatively connected to each other via a bus 1210. , CPU (Central Processing Unit) 1205, ROM (Read Only Memory) 1206, and RAM (Random Access Memory) 1207. The information processing device 1 can further include an input device that accepts user operations, a display device for displaying information to the user, and a storage device that non-volatilely stores data.

The CPU 1205 operates using the RAM 1207 as a work memory according to a program stored in the ROM 1206 in advance, and controls the overall operation of the information processing device 1. The interface 1204 communicates with the outside of the information processing device 1 by wire or wireless communication. For example, when the information processing device 1 is used for in-vehicle use, the information processing device 1 can communicate with the braking control system of the vehicle on which the information processing device 1 is mounted via the interface 1204.

The imaging unit 1200 captures a moving image at a predetermined frame cycle and outputs pixel data for composing the frame image. More specifically, the imaging unit 1200 includes a plurality of photoelectric conversion elements that convert the received light into pixel signals that are electrical signals by photoelectric conversion, and a drive circuit that drives each photoelectric conversion element. In the imaging unit 1200, the plurality of photoelectric conversion elements are arranged in a matrix-like arrangement to form a pixel array.

For example, the sensor unit 10a in FIG. 1 includes an image pickup unit 1200, and outputs pixel data output from the image pickup unit 1200 within one frame cycle as image data for one frame.

Here, each of the photoelectric conversion elements corresponds to a pixel in the image data, and in the pixel array unit, the number of photoelectric conversion elements corresponding to, for example, 1920 pixels × 1080 pixels as the number of pixels in rows × columns is arranged in a matrix. Will be done. An image of one frame is formed by pixel signals by a number of photoelectric conversion elements corresponding to 1920 pixels × 1080 pixels.

The optical unit 1201 includes a lens, an autofocus mechanism, and the like, and irradiates the pixel array unit of the imaging unit 1200 with the light incident on the lens. The imaging unit 1200 generates a pixel signal for each photoelectric conversion element according to the light emitted to the pixel array unit via the optical unit 1201. The imaging unit 1200 converts a pixel signal, which is an analog signal, into pixel data, which is a digital signal, and outputs the signal. The pixel data output from the imaging unit 1200 is stored in the memory 1202. The memory 1202 is, for example, a frame memory, and is capable of storing pixel data for at least one frame.

The DSP 1203 performs predetermined image processing on the pixel data stored in the memory 1202. Further, the DSP 1203 includes a recognition model learned in advance, and performs a recognition process using the above-mentioned DNN on the image data stored in the memory 1202 based on the recognition model. The recognition result, which is the result of the recognition process by the DSP 1203, is temporarily stored in, for example, the memory provided in the DSP 1203 or the RAM 1207, and is output from the interface 1204 to the outside. Not limited to this, when the information processing device 1 includes a storage device, the recognition result may be stored in the storage device.

Not limited to this, the function of the DSP 1203 may be realized by the CPU 1205. Further, GPU (Graphics Processing Unit) may be used instead of DSP1203.

The image pickup unit 1200 can apply a CMOS image sensor (CIS) in which each part included in the image pickup unit 1200 is integrally formed by using CMOS (Complementary Metal Oxide Semiconductor). The imaging unit 1200 can be formed on one substrate. Not limited to this, the imaging unit 1200 may be a laminated CIS in which a plurality of semiconductor chips are laminated and integrally formed. The imaging unit 1200 is not limited to this example, and may be another type of optical sensor such as an infrared light sensor that performs imaging with infrared light.

As an example, the imaging unit 1200 can be formed by a two-layer structure laminated CIS in which semiconductor chips are laminated in two layers. FIG. 6A is a diagram showing an example in which the imaging unit 1200 is formed by a two-layer structure laminated CIS. In the structure of FIG. 6A, the pixel portion 2020a is formed on the semiconductor chip of the first layer, and the memory + logic portion 2020b is formed on the semiconductor chip of the second layer. The pixel unit 2020a includes at least a pixel array unit in the imaging unit 1200. The memory + logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit. The memory + logic unit 2020b can further include the memory 1202.

As shown on the right side of FIG. 6A, the image pickup unit 1200 is configured as one solid-state image pickup element by bonding the semiconductor chip of the first layer and the semiconductor chip of the second layer while electrically contacting each other.

As another example, the imaging unit 1200 can be formed by a three-layer structure in which semiconductor chips are laminated in three layers. FIG. 6B is a diagram showing an example in which the imaging unit 1200 is formed by a laminated CIS having a three-layer structure. In the structure of FIG. 6B, the pixel portion 2020a is formed on the semiconductor chip of the first layer, the memory portion 2020c is formed on the semiconductor chip of the second layer, and the logic portion 2020d is formed on the semiconductor chip of the third layer. In this case, the logic unit 2020d includes, for example, a drive circuit for driving the pixel array unit. Further, the memory unit 2020c can include a frame memory and a memory 1202.

As shown on the right side of FIG. 6B, the image pickup unit 1200 is attached by electrically contacting the semiconductor chip of the first layer, the semiconductor chip of the second layer, and the semiconductor chip of the third layer. It is configured as one solid-state image sensor.

In the configurations of FIGS. 6A and 6B, the memory + logic unit 2020b may include configurations corresponding to the DSP 1203, the interface 1204, the CPU 1205, the ROM 1206, and the RAM 1207 shown in FIG.

(1-1-1. Configuration example of imaging unit applicable to each embodiment)
FIG. 7 is a block diagram showing a configuration of an example of the imaging unit 1200 applicable to each embodiment. In FIG. 7, the imaging unit 1200 includes a pixel array unit 1001, a vertical scanning unit 1002, an AD (Analog to Digital) conversion unit 1003, a pixel signal line 1006, a vertical signal line VSL, a control unit 1100, and a signal. The processing unit 1101 and the like are included. Note that, in FIG. 7, the control unit 1100 and the signal processing unit 1101 can also be realized by, for example, the CPU 1205 and the DSP 1203 shown in FIG.

The pixel array unit 1001 includes a plurality of pixel circuits 1000 including, for example, a photoelectric conversion element using a photodiode and a circuit for reading out charges from the photoelectric conversion element, each of which performs photoelectric conversion on the received light. In the pixel array unit 1001, the plurality of pixel circuits 1000 are arranged in a matrix in the horizontal direction (row direction) and the vertical direction (column direction). In the pixel array unit 1001, the arrangement in the row direction of the pixel circuit 1000 is called a line. For example, when an image of one frame is formed by 1920 pixels × 1080 lines, the pixel array unit 1001 includes at least 1080 lines including lines including at least 1920 pixel circuits 1000. An image (image data) of one frame is formed by the pixel signal read from the pixel circuit 1000 included in the frame.

Further, in the pixel array unit 1001, the pixel signal line 1006 is connected to each row and column of each pixel circuit 1000, and the vertical signal line VSL is connected to each column. The end of the pixel signal line 1006 that is not connected to the pixel array unit 1001 is connected to the vertical scanning unit 1002. The vertical scanning unit 1002 transmits a control signal such as a drive pulse when reading a pixel signal from a pixel to the pixel array unit 1001 via the pixel signal line 1006 in accordance with the control of the control unit 1100 described later. The end portion of the vertical signal line VSL that is not connected to the pixel array unit 1001 is connected to the AD conversion unit 1003. The pixel signal read from the pixel is transmitted to the AD conversion unit 1003 via the vertical signal line VSL.

The reading control of the pixel signal from the pixel circuit 1000 will be schematically described. The pixel signal is read out from the pixel circuit 1000 by transferring the charge accumulated in the photoelectric conversion element due to exposure to the floating diffusion layer (FD) and converting the electric charge transferred in the floating diffusion layer into a voltage. conduct. The voltage at which the charge is converted in the floating diffusion layer is output as a pixel signal to the vertical signal line VSL via an amplifier.

More specifically, in the pixel circuit 1000, during exposure, the space between the photoelectric conversion element and the floating diffusion layer is turned off (open), and the photoelectric conversion element is generated according to the light incidented by the photoelectric conversion. Accumulates electric charge. After the end of exposure, the floating diffusion layer and the vertical signal line VSL are connected according to the selection signal supplied via the pixel signal line 1006. Further, the floating diffusion layer is connected to the supply line of the power supply voltage VDD or the black level voltage in a short period of time according to the reset pulse supplied via the pixel signal line 1006 to reset the floating diffusion layer. The reset level voltage (assumed to be voltage A) of the floating diffusion layer is output to the vertical signal line VSL. After that, the transfer pulse supplied via the pixel signal line 1006 turns the photoelectric conversion element and the floating diffusion layer into an on (closed) state, and transfers the electric charge accumulated in the photoelectric conversion element to the floating diffusion layer. A voltage (referred to as voltage B) corresponding to the amount of electric charge of the floating diffusion layer is output to the vertical signal line VSL.

The AD conversion unit 1003 includes an AD converter 1007 provided for each vertical signal line VSL, a reference signal generation unit 1004, and a horizontal scanning unit 1005. The AD converter 1007 is a column AD converter that performs AD conversion processing on each column of the pixel array unit 1001. The AD converter 1007 performs AD conversion processing on the pixel signal supplied from the pixel circuit 1000 via the vertical signal line VSL to reduce noise, and is used for correlated double sampling (CDS: Correlated Double Sampling) processing. Two digital values (values corresponding to voltage A and voltage B, respectively) are generated.

The AD converter 1007 supplies the two generated digital values to the signal processing unit 1101. The signal processing unit 1101 performs CDS processing based on the two digital values supplied from the AD converter 1007, and generates pixel data which is a pixel signal by the digital signal.

The reference signal generation unit 1004 generates a lamp signal as a reference signal, which is used by each AD converter 1007 to convert the pixel signal into two digital values, based on the control signal input from the control unit 1100. The lamp signal is a signal whose level (voltage value) decreases with a constant slope with respect to time, or a signal whose level decreases stepwise. The reference signal generation unit 1004 supplies the generated lamp signal to each AD converter 1007. The reference signal generation unit 1004 is configured by using, for example, a DAC (Digital to Analog Converter) or the like.

When the reference signal generation unit 1004 supplies a lamp signal whose voltage drops stepwise according to a predetermined inclination, the counter starts counting according to the clock signal. The comparator compares the voltage of the pixel signal supplied from the vertical signal line VSL with the voltage of the lamp signal, and stops the counting by the counter at the timing when the voltage of the lamp signal straddles the voltage of the pixel signal. The AD converter 1007 converts the pixel signal of the analog signal into a digital value by outputting a value corresponding to the count value of the time when the count is stopped.

The AD converter 1007 supplies the two generated digital values to the signal processing unit 1101. The signal processing unit 1101 performs CDS processing based on the two digital values supplied from the AD converter 1007, and generates a pixel signal (pixel data) based on the digital signal. The pixel data generated by the signal processing unit 1101 is stored in a frame memory (not shown), and when the pixel data for one frame is stored in the frame memory, the image data is output from the imaging unit 1200 as one frame of image data.

Under the control of the control unit 1100, the horizontal scanning unit 1005 performs selective scanning in which the AD converters 1007 are selected in a predetermined order to temporarily hold each digital value of the AD converters 1007. The signal processing unit 1101 is sequentially output. The horizontal scanning unit 1005 is configured by using, for example, a shift register or an address decoder.

The control unit 1100 performs drive control of the vertical scanning unit 1002, the AD conversion unit 1003, the reference signal generation unit 1004, the horizontal scanning unit 1005, and the like according to the imaging control signal supplied from the sensor control unit 11. The control unit 1100 generates various drive signals that serve as a reference for the operations of the vertical scanning unit 1002, the AD conversion unit 1003, the reference signal generation unit 1004, and the horizontal scanning unit 1005. The control unit 1100 supplies the vertical scanning unit 1002 to each pixel circuit 1000 via the pixel signal line 1006 based on, for example, a vertical synchronization signal or an external trigger signal included in the imaging control signal and a horizontal synchronization signal. Generate a control signal. The control unit 1100 supplies the generated control signal to the vertical scanning unit 1002.

Further, the control unit 1100 passes, for example, information indicating an analog gain included in the image pickup control signal supplied from the CPU 1205 to the AD conversion unit 1003. The AD conversion unit 1003 controls the gain of the pixel signal input to each AD converter 1007 included in the AD conversion unit 1003 via the vertical signal line VSL according to the information indicating the analog gain.

Based on the control signal supplied from the control unit 1100, the vertical scanning unit 1002 transmits various signals including a drive pulse to the pixel signal line 1006 of the selected pixel line of the pixel array unit 1001 to each pixel circuit 1000 for each line. It is supplied, and the pixel signal is output from each pixel circuit 1000 to the vertical signal line VSL. The vertical scanning unit 1002 is configured by using, for example, a shift register or an address decoder. Further, the vertical scanning unit 1002 controls the exposure in each pixel circuit 1000 according to the information indicating the exposure supplied from the control unit 1100.

The imaging unit 1200 configured in this way is a column AD type CMOS (Complementary Metal Oxide Semiconductor) image sensor in which AD converters 1007 are arranged for each column.

(1-1-2. Resolution of captured image)
Next, the resolution of the image used for the recognition process will be described with reference to FIGS. 8A and 8B. 8A and 8B are diagrams schematically showing examples of captured

images

30a and 30b when the same imaging range is captured by using a low-resolution imaging device and a high-resolution imaging device, respectively. The imaging range shown in FIGS. 8A and 8B includes a "person" in the central portion at a position somewhat distant from the imaging apparatus. Consider the case of recognizing a "person" as an object by the recognition process.

In the low resolution example of FIG. 8A, it is difficult to recognize the "person" included in the captured image 30a, and the recognition performance of the "person" by the recognition process is extremely low. On the other hand, in the high-resolution example of FIG. 8B, the "person" included in the captured image 30b is easily recognized, and the recognized "person" is obtained as the recognition result 40. Compared with, the recognition performance is high.

On the other hand, the recognition process for a high-resolution image requires a large amount of calculation as compared with the recognition process for a low-resolution image, and the processing takes time. Therefore, it is difficult to improve the simultaneity between the recognition result and the captured image. On the other hand, the recognition process for a low-resolution image requires a small amount of calculation, so that the process can be performed in a short time, and the simultaneity with the captured image can be relatively easily increased.

As an example, consider a case where recognition processing is performed based on an image captured by an in-vehicle image pickup device. In this case, since it is necessary to recognize a distant object (for example, an oncoming vehicle traveling in the opposite lane in the direction opposite to the traveling direction of the own vehicle) with high simultaneity, it is considered that recognition processing is performed on a low-resolution image. Be done. However, as described with reference to FIG. 8A, it is difficult to recognize a distant object when a low-resolution captured image is used. Further, when a high-resolution captured image is used, it is relatively easy to recognize a distant object, but it is difficult to improve the simultaneity with the captured image, and there is a possibility that it cannot respond to an emergency situation. ..

In each embodiment of the present disclosure, in order to make it possible to recognize a distant object easily and at high speed, a recognition process is performed on a sampled image by pixels obtained by thinning out a high-resolution captured image by subsampling according to a predetermined rule. I do. The captured image acquired in the next frame is sampled with pixels different from the subsampling of the immediately preceding captured image, and the sampled image by the sampled pixels is recognized.

In the second captured image acquired next in chronological order with respect to the first captured image, the operation of performing recognition processing on the sampled image obtained by sampling pixels different from the first captured image is performed as a frame. Repeat in units. This makes it possible to acquire recognition results at high speed while using a high-resolution captured image. Further, by sequentially integrating the feature amount extracted during the recognition process with the feature amount extracted in the recognition process for the next sampled image, a more accurate recognition result can be obtained.

(1-2. Outline of recognition processing that is a premise of each embodiment)
Next, the recognition processing technology (hereinafter referred to as the prerequisite technology) that is the premise of each embodiment of the present disclosure will be schematically described.

(1-2-1. Configuration related to the prerequisite technology of each embodiment)
(1-2-1-1. Outline of configuration applicable to the prerequisite technology of each embodiment)
FIG. 9 is a block diagram showing a configuration of an example of an information processing device according to the prerequisite technology of each embodiment of the present disclosure. In FIG. 9, the information processing device 1b includes a sensor unit 10b and a recognition processing unit 20b. Although not shown, the sensor unit 10b includes an imaging means (camera) and an imaging control unit that controls the imaging means, similarly to the sensor unit 10a described with reference to FIG. This imaging means shall perform imaging at a high resolution (for example, 1920 pixels × 1080 pixels). The sensor unit 10b supplies the image data of the captured image captured by the imaging means to the recognition processing unit 20b.

The recognition processing unit 20b includes a pre-processing unit 210 and a recognition unit 220. The image data supplied from the sensor unit 10b to the recognition processing unit 20b is input to the preprocessing unit 210. The preprocessing unit 210 performs subsampling on the input image data by thinning out the pixels according to a predetermined rule. The sampled image in which the image data is subsampled is input to the recognition unit 220.

The recognition unit 220 uses the DNN to perform recognition processing on the image data in the same manner as the recognition processing unit 20a in FIG. More specifically, the recognition processing unit 20a includes a recognition model pre-learned using predetermined teacher data by machine learning, and DNN based on the recognition model with respect to the image data supplied from the sensor unit 10a. Performs recognition processing using. At this time, as the teacher data, a sampled image subsampled in the same manner as the preprocessing unit 210 is used.

The recognition unit 220 outputs the recognition result of the recognition process to, for example, the outside of the information processing device 1b.

(1-2-1-2. Example of recognition processing related to the prerequisite technology of each embodiment)
FIG. 10 is a schematic diagram for explaining the recognition process by the recognizer according to the prerequisite technology of each embodiment. The recognizer shown in FIG. 10 corresponds to, for example, the recognition processing unit 20b. The image data 32 schematically shows one frame of image data based on the captured image captured by the sensor unit 10b. The image data 32 includes a plurality of pixels 300 arranged in a matrix. The image data 32 is input to the preprocessing unit 210 in the recognition processing unit 20b. The preprocessing unit 210 subsamples the image data 32 by thinning out according to a predetermined rule (step S10).

The sampled image by the sub-sampled sampling pixels is input to the recognition unit 220. The recognition unit 220 extracts the feature amount of the input sampled image by DNN (step S11). Here, the recognition unit 220 extracts the feature amount using CNN among the DNNs.

The recognition unit 220 stores the feature amount extracted in step S11 in a storage unit (for example, RAM 1207) (not shown). At this time, for example, when the feature amount extracted in the immediately preceding frame is already stored in the storage unit, the recognition unit 220 recursively uses the feature amount stored in the memory and integrates it with the extracted feature amount. (Step S12). The recognition unit 220 stores, stores, and integrates the feature quantities extracted up to the immediately preceding frame in the storage unit. That is, the process in step S12 corresponds to the process using the RNN of the DNN.

The recognition unit 220 executes the recognition process based on the features accumulated and integrated in step S12 (step S13).

Here, the subsampling process by the preprocessing unit 210 in step S10 will be described in more detail. FIG. 11 is a schematic diagram for explaining the sampling process according to the prerequisite technique of each embodiment. In FIG. 11, section (a) schematically shows an example of image data 32. As described above, the image data 32 includes a plurality of pixels 300 arranged in a matrix. The preprocessing unit 210 divides the image data 32 into a division region 35 including two or more pixels 300. In the example of FIG. 11, the divided region 35 is a region having a size of 4 pixels × 4 pixels, and includes 16 pixels 300.

The preprocessing unit 210 sets a pixel position for selecting a sampling pixel by subsampling from each pixel 300 included in the division area 35 with respect to the division area 35. Further, the preprocessing unit 210 sets a pixel position different for each frame as a pixel position for selecting a sampling pixel.

Section (b) of FIG. 11 shows an example of pixel positions set with respect to the division region 35 in a certain frame. In this example, in the divided region 35, the pixel positions are set so that the pixels 300 are selected every other row and column direction, and the pixels 300sa ₁ , 300sa ₂ , 300sa ₃ and 300sa at each of the set pixel positions are selected. ₄ is selected as the sampling pixel. In this way, the preprocessing unit 210 performs subsampling in units of the divided region 35.

_{The preprocessing unit 210 generates an image consisting of each pixel 300sa 1} to 300sa ₄ selected as a sampling pixel in a certain frame as a sampling image composed of sampling pixels. Section (c) of FIG. 11 shows an example of a sampled image 36 generated from _{each pixel 300sa 1} to 300sa ₄ selected as a sampling pixel in section (b) of FIG. The preprocessing unit 210 inputs the sampled image 36 to the recognition unit 220. The recognition unit 220 executes a recognition process on the sampled image 36.

The recognition process by the recognizer according to the prerequisite technology of each embodiment will be described more specifically with reference to FIGS. 12A to 12E. As described above, the preprocessing unit 210 sets different pixel positions for each frame as pixel positions for selecting sampling pixels. The recognition unit 220 performs recognition processing for each frame based on a sampled image composed of each pixel 300 at each set pixel position. 12A to 12E show each recognition process for the image data 32a to 32d and 32a'of the frames # 1 to # 5, which are sequentially imaged by the sensor unit 10b in time series.

Note that in FIGS. 12A to 12E, the images based on the image data 32a to 32d and 32a'contain the

objects

41 and 42 by humans, respectively. The object 41 is located at a relatively short distance (medium distance) with respect to the sensor unit 10b. On the other hand, the object 42 is located at a distance (referred to as a long distance) farther than the middle distance with respect to the sensor unit 10b, and the size in the image is smaller than the object 41.

In the section (a) of FIG. 12A, the preprocessing unit 210 performs subsampling on each divided region 35 of the image data 32a of the frame # 1, for example, with the pixel position in the upper left corner as a base point. More specifically, the preprocessing unit 210 samples each pixel 300 selected every other row and column direction with the pixel position in the upper left corner as the base point in each division region 35 of the image data 32a. Subsampling is performed to select the pixels 300sa ₁ to 300sa _{4 (step S10a).}

As shown in section (b), the preprocessing unit 210 generates a sampled image 36φ1 of the first phase by the _{subsampled pixels 300sa 1} to 300sa _4. The generated sampled image 36φ1 is input to the recognition unit 220.

The recognition unit 220 extracts the feature amount 50a of the input sampled image 36φ1 using DNN (step S11). The recognition unit 220 stores and stores the feature amount 50a extracted in step S11 in the storage unit (step S12). When the feature amount is already accumulated in the storage unit, the recognition unit 220 can accumulate the feature amount 50a in the storage unit and integrate the feature amount with the already accumulated feature amount. Section (b) of FIG. 12A shows how the first feature amount 50a is stored in the empty storage portion as the process of step S12.

The recognition unit 220 executes the recognition process based on the feature amount 50a accumulated in the storage unit (step S13). In the example of FIG. 12A, as shown in step S13 in section (b), the object 41 located at a medium distance is recognized and obtained as the recognition result 60. On the other hand, the object 42 located at a long distance is not recognized.

In the section (a) of FIG. 12B, the preprocessing unit 210 relates to each divided area 35 of the image data 32b of the frame # 2 with respect to each divided area 35 of the image data 32a of the frame # 1 shown in FIG. 12A. Subsampling is performed in which each pixel position shifted in the horizontal direction by one pixel with respect to the set pixel position is set as the pixel position of the sampling pixel (step S10b). That is, each sampling pixel selected in step S10b is each pixel 300 at a pixel position adjacent to the right side of the pixel position of each sampling pixel selected in step S10a in FIG. 12A.

As shown in section (b), the preprocessing unit 210 generates a second phase sampled image 36φ2 from each sampling pixel subsampled in step S10b. The generated sampled image 36φ2 is input to the recognition unit 220.

The recognition unit 220 extracts the feature amount 50b of the input sampled image 36φ2 using DNN (step S11). The recognition unit 220 stores and stores the feature amount 50b extracted in step S11 in the storage unit (step S12). In this example, as shown in step S12 in section (b), the feature amount 50a extracted from the sampled image 36φ1 of the first phase is already stored in the storage unit. Therefore, the recognition unit 220 accumulates the feature amount 50b in the storage unit and integrates the feature amount 50b with the stored feature amount 50a.

The recognition unit 220 executes the recognition process based on the feature amount in which the feature amount 50a and the feature amount 50b are integrated (step S13). In the example of FIG. 12B, as shown in step S13 in section (b), the object 41 located at a medium distance is recognized and obtained as the recognition result 60, but the object 42 located at a long distance is Not recognized at this point.

In the section (a) of FIG. 12C, the preprocessing unit 210 relates to each divided area 35 of the image data 32c of the frame # 3 with respect to each divided area 35 of the image data 32a of the frame # 1 shown in FIG. 12A. Subsampling is performed in which the pixel position shifted in the column direction by one pixel with respect to the set pixel position is set as the pixel position of each sampling pixel (step S10c). That is, each sampling pixel selected in step S10c is each pixel 300 at a pixel position adjacent to the lower side in the figure with respect to the pixel position of each sampling image selected in step S10a in FIG. 12A.

As shown in section (b), the preprocessing unit 210 generates a sampled image 36φ3 of the third phase by each sampling subsampled in step S10c. The generated sampled image 36φ3 is input to the recognition unit 220.

The recognition unit 220 extracts the feature amount 50c of the input sampled image 36φ3 using DNN (step S11). The recognition unit 220 stores and stores the feature amount 50c extracted in step S11 in the storage unit (step S12). In this example, as shown in step S12 in section (b), the

feature quantities

50a and 50b extracted from the sampled images 36φ1 and 36φ2 of the first and second phases are already stored in the storage unit, respectively. There is. Therefore, the recognition unit 220 accumulates the feature amount 50c in the storage unit and integrates the feature amount 50c with the accumulated feature amounts 50a and 50b.

The recognition unit 220 executes the recognition process based on the feature amount in which the feature amounts 50a and 50b and the feature amount 50c are integrated (step S13). In the example of FIG. 12C, as shown in step S13 in section (b), the object 41 located at a medium distance is recognized and obtained as the recognition result 60, but the object 42 located at a long distance is Not recognized at this point.

In the section (a) of FIG. 12D, the preprocessing unit 210 relates to each divided area 35 of the image data 32d of the frame # 4 with respect to each divided area 35 of the image data 32c of the frame # 3 shown in FIG. 12C. Subsampling is performed in which each pixel position shifted in the horizontal direction by one pixel with respect to the set pixel position is set as the pixel position of the sampling pixel (step S10d). That is, each sampling pixel selected in step S10d is each pixel 300 at a pixel position adjacent to the right side in the figure with respect to the pixel position of each sampling image selected in step S10c in FIG. 12C.

As shown in section (b), the preprocessing unit 210 generates a sampled image 36φ4 of the fourth phase by each sampling subsampled in step S10d. The generated sampled image 36φ4 is input to the recognition unit 220.

The recognition unit 220 extracts the feature amount 50d of the input sampled image 36φ4 using DNN (step S11). The recognition unit 220 stores and stores the feature amount 50d extracted in step S11 in the storage unit (step S12). In this example, as shown in step S12 in section (b), each feature amount 50a to 50c extracted from the sampled images 36φ1 to 36φ3 of the first to third phases has already been accumulated in the storage unit. ing. Therefore, the recognition unit 220 accumulates the feature amount 50d in the storage unit and integrates the feature amount 50d with the accumulated feature amounts 50a to 50c.

The recognition unit 220 executes the recognition process based on the feature amount in which the feature amounts 50a to 50c and the feature amount 50d are integrated (step S13). In the example of FIG. 12D, as shown in step S13 in section (b), the object 41 located at a medium distance is recognized and obtained as a recognition result 60, and the object 42 located at a long distance is further recognized and recognized. The result is 61.

By the processing of FIGS. 12A to 12D, all the pixel positions of the 16 pixels 300 included in each division area 35 are selected as the pixel positions of the sampling pixels. Therefore, the preprocessing unit 210 selects the pixel positions of all the pixels 300 included in one frame as the pixel positions of the sampling pixels. Further, it can be said that the preprocessing unit 210 selects the pixel positions of the 16 pixels 300 included in each division region 35 by shifting the phase by one pixel.

From the time when the pixel positions of the sampling pixels are first selected for each division area 35 or one frame, the pixel positions of all the pixels 300 included in each division area 35 or one frame are selected as the pixel positions of the sampling pixels. The period until it is done is one cycle. That is, the preprocessing unit 210 circulates each pixel position of each division area 35 at a constant cycle, and sets all the pixel positions in the division area 35 as pixel positions for acquiring sampling pixels.

When the subsampling and recognition processing for one cycle is completed, the subsampling and recognition processing for the next one cycle is started.

That is, in the section (a) of FIG. 12E, the preprocessing unit 210 uses the pixel position in the upper left corner as the base point for each divided region 35 of the image data 32a'of the frame # 1'in the same manner as in the example of FIG. 12A. Subsampling is performed (step S10a'). As shown in section (b), the preprocessing unit 210 generates a sampled image 36φ1'of the first phase by each sampling subsampled in step S10a'. The generated sampled image 36φ1'is input to the recognition unit 220.

The recognition unit 220 extracts the feature amount 50a'of the input sampled image 36φ1' using DNN (step S11). The recognition unit 220 stores and stores the feature amount 50a'extracted in step S11 in the storage unit (step S12). In this example, as shown in step S12 in section (b), each feature amount 50a to 50d extracted from the sampled images 36φ1 to 36φ4 of the first to fourth phases in the immediately preceding period with respect to the storage unit. Has already been accumulated. Therefore, the recognition unit 220 accumulates the feature amount 50a'in the storage unit and integrates the feature amount 50a'with the accumulated feature amounts 50a to 50d.

Not limited to this, the recognition unit 220 may reset the storage unit every cycle of selecting the pixel position of the sampling pixel. The storage unit can be reset, for example, by deleting the feature amounts 50a to 50d for one cycle accumulated in the storage unit from the storage unit.

Further, the recognition unit 220 can always accumulate a certain amount of features in the storage unit. For example, the recognition unit 220 stores the feature amount for one cycle, that is, the feature amount for four frames with the storage unit. In this case, when the new feature amount 50a'is extracted, the recognition unit 220 deletes, for example, the oldest feature amount 50d among the feature amounts 50a to 50d accumulated in the storage unit, and the new feature amount 50a'. Is stored in the storage unit and stored. The recognition unit 220 executes the recognition process based on the accumulated amount in which the feature amounts 50a to 50c remaining after the feature amount 50d is deleted and the new feature amount 50a'are integrated.

The recognition unit 220 executes the recognition process based on the feature amount in which the feature amounts 50a to 50d already accumulated in the storage unit and the newly extracted feature amount 50a'are integrated (step S13). In the example of FIG. 12E, as shown in step S13 in section (b), the object 41 located at a medium distance is recognized and obtained as a recognition result 60, and the object 42 located at a long distance is further recognized and recognized. The result is 61.

Here, the sampled image 36 is a thinned image in which pixels are thinned out from the original image data 32. In the example of FIG. 11, the sampled image 36 is image data obtained by reducing the image data 32 by 1/2 in the row and column directions, respectively, and the number of pixels is 1/4 of the original image data 32. .. Therefore, the recognition unit 220 can execute the recognition process for the sampled image 36 at high speed with respect to the recognition process using all the pixels 300 included in the original image data 32.

Further, the pixel position of the pixel 300 set as the sampling pixel for generating the sampled image 36 is selected by shifting it by one pixel for each frame in the division area 35. Therefore, it is possible to obtain a sampled image 36 that is out of phase by one pixel for each frame. At this time, the pixel positions of all the pixels 300 included in the division area 35 are selected as the pixel positions of the pixels 300 to be set as sampling pixels.

In this way, the pixel positions of the pixels 300 that generate the sampled image 36 are selected, and the feature amounts calculated from each sampled image 36 are accumulated and integrated. As a result, the pixels 300 at all the pixel positions included in the image data 32 can be involved in the recognition process, and for example, a distant object can be easily recognized.

In the above description, the pixel position for selecting the sampling pixel is set by the preprocessing unit 210 according to a predetermined rule, but this is not limited to this example. For example, the preprocessing unit 210 sets a pixel position for selecting sampling pixels in response to an instruction from the outside of the recognition processing unit 20b or the outside of the information processing device 1b including the recognition processing unit 20b. You may.

(1-2-1-3. Subsampling processing related to the prerequisite technology of each embodiment)
Next, the subsampling process in the prerequisite technology of each embodiment will be described more specifically. 13A and 13B are schematic views for explaining the subsampling process in the recognition process according to the prerequisite technology of each embodiment. Here, for the sake of explanation, as shown in the section (b) of FIG. 13A, the divided region 35 is defined as a region of 2 pixels × 2 pixels. In each division region 35, the upper left pixel position is the origin coordinate [0,0], and the upper right, lower left, and lower right pixel positions are the coordinates [1,0] [0,1] and [1,1], respectively. And. Further, sampling of the pixel 300 is performed in each division region 35 with the coordinates [1,1], [1,0], [0,1], [0,1] starting from the lower right pixel position [1,1]. 0] shall be performed in this order.

In section (a) of FIG. 13A, the passage of time is shown from the bottom to the top of the figure. In the example of FIG. 13A, corresponding to the above-mentioned FIGS. 12A to 12E, the image data 32a is the image [T] having the newest time T, and thereafter, the time is in the order of the image data 32b, the image data 32c, and the image data 32d. The images are T-1, T-2, and T-3, and the image [T-1], the image [T-2], and the image [T-3] based on the old image data 32 frame by frame.

At time T-3, the preprocessing unit 210 selected the pixels 300 at the coordinates [1,1] of each division region 35 as sampling pixels for the image data 32a (step S10a), and the recognition unit 220 was selected. The feature amount of the sampled image 36φ1 by the sampling pixel is extracted (step S11). The recognition unit 220 integrates the feature amount 50a extracted from the sampled image 36φ1 with, for example, the feature amount extracted in a predetermined period before that (step S12), and performs recognition processing based on the integrated feature amount (step S12). S13).

Here, for example, the sampled image 36φ1 obtained by uniformly thinning out the image data 32a can be obtained by the subsampling process (step S10a) in each of the divided regions 35 of the image data 32a described above. Using the feature amount 50a extracted from the sampled image 36φ1 in step S11, the recognition process for the entire image data 32a can be executed. It is possible to complete the recognition process for the image data 32 by the recognition process for the sampled image by the sampling pixels selected by subsampling from the image data 32.

This series of processes in which a sampled image is generated from the image data 32, a feature amount is extracted from the generated sampled image, and recognition processing is performed based on the extracted feature amount is called a unit process. In the example of FIG. 13A, for example, the subsampling process of step S10a, the feature amount extraction process of step S11 for the sampled image 36φ1 generated by the subsampling process, the feature amount integration process of step S12, and the recognition by step S13. Processing is included in one unit of processing. The recognition unit 220 can execute the recognition process for the thinned-out image data 32 for each process of this one unit (step S13).

After that, in the same manner, the recognition processing unit 20b executes the above-mentioned one-unit processing for each of the

image data

32b, 32c, and 32d that are sequentially updated in the frame cycle, and executes the recognition processing. At this time, the feature amount integration process in step S12 and the recognition process in step S13 can be common in the process of each unit.

By performing one unit of processing for each of the image data 32a to 32d described above, the selection of sampling pixels for each pixel position included in each division area 35 is completed. FIG. 13B shows the next one unit of processing after one cycle of sampling pixel selection for each pixel position included in each divided region 35. That is, when one unit of processing for each of the

image data

32a, 32b, 32c and 32d has completed, one unit of processing for the image data 32a'of the next frame input to the recognition processing unit 20b is executed.

In this example, the feature amount 50d extracted based on the oldest image data 32d is discarded, and the feature amount 50a'is extracted from the new image data 32a'. That is, the preprocessing unit 210 selects each pixel 300 of the coordinates [1,1] of each division region 35 of the image data 32a'as a sampling pixel, and generates a sampled image 36φ1. The recognition unit 220 extracts the feature amount 50a'from the sampled image 36φ1 selected from the image data 32a'. The recognition unit 220 integrates the feature amount 50a'and the feature amounts 50a, 50b, and 50c extracted up to the previous time, and performs recognition processing based on the integrated feature amount. In this case, the recognition unit 220 may perform the feature amount extraction process only for the newly acquired image data 32a'.

As described above, the recognition process related to the prerequisite technology of each embodiment is performed by executing the process for one unit in the same processing system in the recognition processing unit 20b. More specifically, the recognition processing unit 20b repeats the processing system of the subsampling process and the feature amount extraction process for the image data 32 for each frame as the processing for one unit, and integrates the feature amounts extracted by this repetition. And the recognition process is being performed.

Further, the recognition processing unit 20b performs the subsampling process including the pixel positions of all the pixels 300 included in the image data 32 while periodically shifting the pixel positions for selecting the sampling pixels. Further, the recognition processing unit 20b integrates the feature amounts as intermediate data extracted from the sampled image by the sampling pixels selected from the image data 32 of each frame in step S11 to perform the recognition process.

Since the recognition process related to the prerequisite technology of each embodiment configured in this way is a processing system that can be completed in the process of one unit, the recognition result can be obtained more quickly. Further, since the sampling pixel is selected from the entire image data 32 in one unit, a wide range of recognition results can be confirmed by one unit of processing. Further, since the intermediate data (feature amount) based on the plurality of image data 32 is integrated, it is possible to acquire a more detailed recognition result acquired by straddling a plurality of units.

That is, by using the information processing device 1b according to the prerequisite technology of each embodiment, it is possible to improve the simultaneity of the recognition results and acquire the recognition results by utilizing the resolution of the captured image. It is possible to improve the characteristics of the recognition process using.

(1-3. Basic architecture of recognition processing according to each embodiment)
Next, the basic architecture of the recognition process according to each embodiment of the present disclosure will be described. FIG. 14A is a schematic diagram for explaining the basic architecture of the recognition process according to the existing technology. As shown in FIG. 14A, the recognizer in the existing technique executes the recognition process for one input information (for example, an image), and basically outputs one recognition result for the input information.

FIG. 14B is a schematic diagram for explaining the basic architecture of the recognition process according to each embodiment. The recognizer according to each embodiment corresponds to, for example, the recognition unit 220 of FIG. 9, and as shown in FIG. 14B, executes recognition processing for one input information (for example, an image) by time axis expansion, and performs the recognition process. It is possible to output a plurality of recognition results according to the processing. Here, in the recognition process based on the time axis expansion, as described with reference to FIGS. 10, 11, 12A to 12E, subsampling is performed by thinning out the pixels for each division region 35, and the subsampled sampling pixels are used. It is a process of executing the recognition process for each sampled image by.

In the example of FIG. 14B, the recognizer according to each embodiment has two types of input information, one is a highly responsive breaking news result and the other is a highly accurate integrated result, by the recognition process in the time axis expansion. The recognition result can be output. Of these, the breaking news result is, for example, the recognition result by the recognition process performed on the sampled image acquired by the first subsampling in each divided region 35. On the other hand, the integration result is, for example, a recognition result obtained by a recognition process performed based on the integrated feature amount of the feature amounts extracted from each sampled image acquired by each subsampling in each divided region 35.

The calculation amount of the recognition process executed in the recognizer according to each embodiment shown in FIG. 14B is substantially the same as the calculation amount of the recognition process executed in the recognizer according to the existing technology shown in FIG. 14A. Therefore, according to the recognizer according to each embodiment, the recognition result of both the more responsive breaking news result and the more accurate integrated result can be obtained by the amount of calculation substantially the same as that of the recognizer by the existing technology. It is possible to get it.

(1-3-1. More specific configuration)
Next, a more specific configuration of the basic architecture of the recognition process according to each embodiment will be described.

(1-3-1-1. First example)
FIG. 15 is an example time chart showing a first example of reading and recognition processing in the basic architecture of recognition processing according to each embodiment. In FIG. 15 and FIG. 16 described later, sampling pixels are selected every other pixel in the divided region 35 having a size of 4 pixels × 4 pixels described in the section (b) of FIG. In this case, in each division region 35, all pixel positions are selected by four subsamplings, and the image data 32 of one frame is divided into four sampled images 36φ1 to 36φ4 in the first to fourth phases. become.

In this first example, the sampled images 36φ1 to 36φ4 of the first to fourth phases by subsampling are extracted from each of the image data 32 of a plurality of frames connected in chronological order. That is, in this first example, the sampled images 36φ1 to 36φ4 of the first to fourth phases are extracted across the image data 32 of a plurality of frames connected in chronological order. The recognition process according to the first example is a recognition process performed between a plurality of frames, and is appropriately referred to as an inter-frame process.

In FIG. 15, the imaging cycle is a frame cycle, for example, 50 [ms] (20 [fps (frame per second)]). Further, here, reading from the pixel circuit 1000 arranged in a matrix arrangement in the pixel array unit 1001 is performed line-sequentially by a rolling shutter method. Here, in FIG. 15, the passage of time is shown to the right, and the line position is shown from top to bottom.

For example, in the imaging process of frame # 1, each line is exposed for a predetermined time, and after the exposure is completed, the pixel signal is transferred from each pixel circuit 1000 to the AD conversion unit 1003 via the vertical signal line VSL to perform AD conversion. In unit 1003, each AD converter 1007 converts the transferred analog pixel signal into pixel data which is a digital signal. When the pixel signal is converted into pixel data for all lines, the image data 32a based on the pixel data of frame # 1 is input to the preprocessing unit 210.

The preprocessing unit 210 performs subsampling of the first phase φ1 on the input image data 32a by the subsampling process (indicated as “SS” in the figure) as described above. The pre-processing unit 210 acquires the pixels 300 from the pixel positions of the sampling pixels selected for each division region 35 by the subsampling of the first phase φ1, and generates the sampled image 36φ1 (step S10a).

The preprocessing unit 210 passes the sampled image 36φ1 to the recognition unit 220. At this time, the sampled image 36φ1 passed from the preprocessing unit 210 to the recognition unit 220 is an image in which the number of pixels is reduced with respect to the image data 32a by thinning out by the subsampling process. The recognition unit 220 executes a recognition process on the sampled image 36φ1. Here, as the recognition process, it is shown that the feature amount extraction process (step S11), the feature amount integration process (step S12), and the recognition process (step S13) are included. The recognition result φ1 based on the sampled image 36 φ1 is output to the outside of the recognition processing unit 20b.

The processes of steps S11 to S13 are performed within a period of one frame. Here, the sampled image 36φ1 to be processed is an image in which the number of pixels is reduced with respect to the image data 32a by thinning out by the subsampling process. Therefore, the amount of processing executed for the image data 32a is smaller than the amount of processing executed for the image data 32 for one frame that is not thinned out. In the example of FIG. 15, the processing of steps S11 to S13 for the sampled image 36φ1 based on the image data 32a is completed in a period of approximately 1/4 of the one-frame period.

In parallel with the above-mentioned processing for frame # 1, the processing for the next frame # 2 is executed. Image data 32b composed of pixel data of frame # 2 is input to the preprocessing unit 210. The preprocessing unit 210 performs subsampling processing on the input image data 32b in a second phase φ2 different from that of the image data 32a to generate a sampled image 36φ2.

The pre-processing unit 210 passes the sampled image 36φ2, which has a smaller number of pixels than the image data 32b by subsampling, to the recognition unit 220. The recognition unit 220 executes the recognition process on the sampled image 36φ2 within a period of one frame. In this case as well, as described above, the recognition process is completed in a period of approximately 1/4 of the one-frame period.

At this time, the recognition unit 220 integrates the feature amount 50b extracted from the sampled image 36φ2 and the feature amount 50a extracted by the feature amount extraction process for the image data 32a by the feature amount integration process in step S12. The recognition unit 220 executes the recognition process using the integrated feature amount. The recognition result φ2 by this recognition process is output to the outside of the recognition process unit 20b.

After that, in the same manner, the preprocessing unit 210 executes subsampling processing with the third phase φ3 for the image data 32c of the next frame # 3 in parallel with the processing for the image data 32b of the immediately preceding frame # 2. Then, the recognition unit 220 extracts the feature amount 50c from the sampled image 36φ3 generated by the subsampling process. The recognition unit 220 further integrates the

feature amount

50a and 50b extracted from the

image data

32a and 32b, respectively, and the extracted feature amount 50c, and performs recognition processing based on the integrated feature amount. Run. The recognition unit 220 outputs the recognition result φ3 obtained by this recognition process to the outside. In this case as well, as described above, the recognition process is completed in a period of approximately 1/4 of the one-frame period.

Similarly, for the image data 32d of the next frame # 4, the recognition processing unit 20b performs subsampling processing and feature quantity by the fourth phase φ4 in parallel with the processing for the image data 32c of the immediately preceding frame # 3. Extraction processing is performed to obtain a feature amount of 50d. The recognition processing unit 20b further integrates the feature amount 50a to 50c extracted from each of the image data 32a to 32c by the recognition unit 220 and the extracted feature amount 50d, and the integrated feature. Execute recognition processing based on the quantity. The recognition unit 220 outputs the recognition result φ4 obtained by this recognition process to the outside. In this case as well, as described above, the recognition process is completed in a period of approximately 1/4 of the one-frame period.

Here, in FIG. 15, the vertical arrows, that is, the arrows indicating the output of each recognition process from each image data 32a to 32d, each step S10a to step S10d, and each recognition result φ1 to φ4 by each recognition process are shown. , Its thickness outlines the amount of information.

More specifically, in the example of FIG. 15, the preprocessing unit 210 to step S10a with respect to the amount of data of each image data 32a to 32d input to the preprocessing unit 210 for the processing of steps S10a to S10d. The amount of data in the sampled images 36φ1 to φ4 subsampled by the process of step S10d and passed to the recognition unit 220 is smaller.

On the other hand, the amount of information of each recognition result φ1 to φ4 by the recognition process based on each image data 32a to 32d increases as the recognition process is repeated, and the obtained recognition result becomes more detailed for each recognition process. Shown. This is a feature amount that integrates the feature amount acquired while shifting the phase of the sampled image up to the previous time and the feature amount newly acquired by further shifting the phase with respect to the sampled image immediately before each recognition process. This is because it uses.

(1-3-1-2. Second example)
FIG. 16 is an example time chart showing a second example of reading and recognition processing in the basic architecture of recognition processing according to each embodiment. In this second example, the sampled images 36φ1 to 36φ4 of the first to fourth phases by subsampling are extracted from the image data 32 of one frame, respectively. That is, in this second example, the recognition process by the sampled images 36φ1 to 36φ4 of the first to fourth phases is completed in one frame, and is hereinafter appropriately referred to as an intra-frame process.

Since the meaning of each part in FIG. 16 is the same as that in FIG. 15 described above, detailed description here will be omitted.

For example, in FIG. 16, the preprocessing unit 210 performs subsampling of the first phase φ1 as described above on the image data 32a of the first frame in FIG. 16, and starts from the pixel positions of the sampling pixels selected for each division region 35. Pixels 300 are acquired and a sampled image 36φ1 with the first phase φ1 is generated (step S10a).

When the subsampling of the first phase φ1 for the image data 32a is completed, the preprocessing unit 210 executes the subsampling of the second phase φ2 for the image data 32b. The preprocessing unit 210 generates a sampled image 36φ2 in the second phase φ2 from each sampling pixel acquired by the subsampling of the second phase φ2 (step S10b). After that, the preprocessing unit 210 executes subsampling with different phases (subsampling of the third phase φ3, subsampling of the fourth phase φ4) with respect to the image data 32a, and the sampled image by the third phase φ3. A sampled image 36φ4 with 36φ3 and a fourth phase φ4 is generated (step S10c, step S10d), respectively.

In this way, the preprocessing unit 210 executes subsampling according to the first to fourth phases φ1 to φ4 for one frame of image data 32a within one frame period, respectively.

The recognition unit 220 executes a feature amount extraction process on the sampled image 36φ1 of the first phase φ1 generated based on the image data 32a by the preprocessing unit 210 (step S11a), and extracts the feature amount. When the feature amount that can be integrated is accumulated, the recognition unit 220 can integrate the feature amount extracted in step S11a with the accumulated feature amount that can be integrated (step S12a). For example, the recognition unit 220 executes the recognition process based on the feature quantity integrated in step S12a (step S13a), and outputs the recognition result φ1 by the first phase.

The recognition unit 220 executes a feature amount extraction process on the sampled image 36φ2 of the second phase φ2 generated based on the image data 32a by the preprocessing unit 210 (step S11b), and extracts the feature amount. When the feature amount that can be integrated is accumulated, the recognition unit 220 can integrate the feature amount extracted in step S11b with the accumulated feature amount that can be integrated (step S12b). In this example, for example, the recognition unit 220, which can integrate the feature amount extracted in step S11b and the feature amount extracted in step S11a described above, performs recognition processing on the integrated feature amount. (Step S13b), the recognition result φ2 by the second phase φ2 is output.

After that, in the same manner, the recognition unit 220 executes the feature amount extraction process on the sampled images 36φ3 and 36φ4 of the third and fourth phases φ3 and φ4 generated by the preprocessing unit 210 based on the image data 32a (). Step S11c, step S11d), the feature amount is extracted. The recognition unit 220 sequentially integrates each feature amount extracted in step S11c and step S11d with the feature amount integrated up to the immediately preceding integration process (step S12c, step S12d). The recognition unit 220 executes recognition processing based on, for example, each feature quantity integrated in each phase φ3 and φ4, and outputs recognition results φ3 and φ4 of each phase φ3 and φ4, respectively.

In the example of FIG. 16, each feature amount extraction process (step S11a to step S11d), each integration process (step S12a to step S12d), and each recognition process (step S13a to step S13d) in each of the phases φ1 to φ4 described above. And are executed within the period of one frame. That is, the recognition unit 220 performs recognition processing on each sampled image 36φ1 to 36φ4 in which pixels are thinned out by subsampling the image data 32a of one frame. Therefore, the amount of calculation of each recognition process in the recognition unit 220 is small, and each recognition process can be executed in a short time.

FIG. 17 is a schematic diagram for explaining the effect of the processing (intraframe processing) according to the second example described above. FIG. 17A is an example time chart comparing the processing according to the second example described above with the processing according to the existing technique, and shows the passage of time toward the right. In FIG. 17A, section (a) shows an example of reading and recognition processing by existing technology. Further, section (b) shows an example of reading and recognizing processing according to the second example described above.

In sections (a) and (b), the imaging process is performed during a period of _{time t 0} to t _1. The imaging process includes exposure in the pixel array unit 1001 for a predetermined time, and transfer processing of each pixel data based on the electric charge generated by the photoelectric conversion element in response to the exposure. Each pixel data transferred from the pixel array unit 1001 by the imaging process is stored in the frame memory as, for example, one frame of image data.

In Section (a) and (b), reading of the image data stored in the frame memory is started, for example, from time t _1. Here, in the processing by the existing technique of the section (a), the recognition processing for the image data for one frame is started after _{the reading of the image data for one frame is completed (time t 4).} Here, for the sake of explanation, it is assumed that this recognition process ends _{at the time t 6} when one frame period elapses from _{the time t 4.}

On the other hand, in the process according to the second example of the section (b), the reading of the image data from the frame memory is started after _{the time t 1 as in the example of the section (a).} Here, in the second example, the reading of the sampled image 36φ1 by the subsampling of the first phase φ1 is executed, for example, during the period t ₁ to t ₂ which is 1/4 of the one frame period, and the same is true. In addition, the recognition process for the sampled image 36φ1 is executed, for example, during the period t ₂ to t ₃ , which is 1/4 of the one frame period, and the recognition result φ1 is output.

In the processing according to the second example, thereafter, in the same manner, the reading of the sampled images 36φ2 to 36φ4 by the subsampling of the second to fourth phases φ2 to φ4 is, for example, 1/4 of the time of one frame period. It is executed at times t ₂ to t ₃ , ..., And ends at _{time t 4, for example.}

Recognition processing for the sampling image 36φ2 is started to a time t _2, for example, be terminated elapsed time 1/4 time of one frame period t _3, the recognition result φ2 is output. The recognition process for the other sampled images 36φ3 and 36φ4 is also executed following the recognition process for the immediately preceding sampled image. .. In the example of FIG. 17A, the recognition processing for the sampling image 36φ4 by the last sub-sampling of the image data 32 for one frame has ended at time t _5.

FIG. 17B is a diagram schematically showing each recognition result according to the second example. In FIG. 17B, the upper stage, the middle stage, and the lower stage show examples of the recognition results φ1, φ2, and φ4 by the recognition processing for the first phase φ1, the second phase φ2, and the fourth phase φ4, respectively.

Further, in each of the upper, middle, and lower views of FIG. 17B, images of three people whose recognition targets are people and are at different distances from the sensor unit 10b (information processing device 1b) are included in one frame. Shows the case. In the upper, middle, and lower views of FIG. 17B, three

objects

96L, 96M, and 96S, which are images of people and have different sizes, are included with respect to the frame 95. Of these, the object 96L is the largest, and of the three persons included in the frame 95, the person corresponding to the object 96L is the closest to the sensor unit 10b. Further, the smallest object 96S among the

objects

96L, 96M and 96S represents the person whose person corresponding to the object 96S is the farthest from the sensor unit 10b among the three people included in the frame 95. There is.

In FIG. 17B, the recognition result φ1 is an example in which the recognition process is executed on the above-mentioned sampled image 36φ1 and the largest object 96L is recognized. The recognition result φ2 is an example in which the feature amount extracted from the sampled image 36φ2 is further integrated with the feature amount in the recognition result φ1 and the next largest object 96M is recognized. Further, in the recognition result φ4, the feature amount extracted from the sampled image 36φ4, the feature amount extracted from the sampled image 36φ2, and the feature amount extracted from the next sampled image 36φ are integrated, and the

objects

96L and 96M are integrated. In addition, it is shown that the smallest object 96S is recognized.

In this way, by extracting the feature amounts of each sampled image 36φ1, 36φ2, ... From the image data 32 of one frame and accumulating and integrating the extracted feature amounts, a person who is farther away is sequentially recognized. become able to. At this time, as shown as the recognition result φ1, the largest object 96L is recognized by the recognition process based on the sampled image 36φ1 by the first subsampling.

As described above, in the second example, a rough recognition result φ1 can be obtained based on the sampled image 36φ1 by the first subsampling for the frame. The recognition result φ1 _{can be output at time t 3} in FIG. 17A, and as shown by the arrow B in the figure, low latency is realized with respect _{to the time t 6} at which the recognition result is output by the existing technology. can.

The recognition result φ1 based on the sampled image 36 φ1 by the first subsampling for the frame according to this second example is the breaking news result. This breaking news result is also applicable to the first example described above.

Further, in the second example, the recognition process in the final subsampling for the frame is performed based on the feature quantity that integrates the feature quantities extracted from each sampled image 36φ1 to 36φ4 in the frame, so that the accuracy is higher. The recognition result φ4 can be obtained. This recognition result φ4 can realize, for example, the same accuracy as the recognition processing by the existing technology. Moreover, this last sub-sampling, 1/4 frame period, for example with respect to the time t ₄ when read process is completed by an existing technique is terminated and the time t ₅ elapses. As described above, in the second example, the accuracy equivalent to that of the existing technology can be obtained in a shorter time than the recognition processing by the existing technology as shown by the arrow A in the figure, resulting in lower latency. Can be planned.

Based on the feature quantity that integrates the sampled image 36φ4 by the last subsampling for the frame and the feature quantity extracted from each sampled image 36φ1 to 36φ3 acquired before the sampled image 36φ4 according to this second example. The recognition result φ4 is the integration result. This integration result is also applicable to the first example described above.

In the following, unless otherwise specified, the second example of the first and second examples described above will be described with respect to the reading and recognition processing by subsampling from the image data 32. Twice

[2. First Embodiment]
(2-1. Outline of the first embodiment)
Next, the first embodiment of the present disclosure will be described. In the first embodiment of the present disclosure, any one of the recognition results φx obtained by performing recognition processing based on each of the plurality of sampled images 36φx generated by subsampling from the image data 32 of one frame, or each recognition result. It is possible to adaptively output a combination of a plurality of recognition results among φx. By doing so, for example, it is possible to obtain a recognition result according to the environment and the situation.

In the following, the preprocessing unit 210 sets the division region 35 as 4 pixels × 4 pixels and performs subsampling by thinning out every other pixel, which is described with reference to FIG. 11, and expands the image data 32 of one frame on the time axis. Therefore, it is assumed that four sampled images 36φ1, 36φ2, 36φ3, and 36φ4 that are out of phase are generated.

FIG. 18 is a diagram schematically showing a configuration of an example of a recognizer according to the first embodiment. In FIG. 18, the left end shows a state in which the image data 32 of one frame is divided into four according to the pixels 300φ1, 300φ2, 300φ3, and 300φ4 of the four phases of the first phase φ1 to the fourth phase φ4. Sampling images 36φ1 to 36φ4 of each phase are generated by the subsampling process (steps S11a to S11d) according to the first to fourth phases φ1 to φ4.

Here, it is assumed that the sampled images 36φ1 to 36φ4 of each phase are generated in the order of the first phase φ1, the second phase φ2, the third phase φ3, and the fourth phase φ4.

The method of dividing the image data 32 is not limited to the above-mentioned four divisions (= 2 × 2) by the division area 35 having the size of 4 pixels × 4 pixels. For example, the size of the divided area 35 may be 8 pixels × 8 pixels (in this case, 4 × 4 is divided into 16), or the divided area 35 may be further set to another size. Furthermore, the divided region 35 does not have to be square and is not limited to a rectangle.

Further, the entire image data 32 or an arbitrary pixel position of the predetermined division area 35 may be selected, and the pixel 300 at the selected pixel position may be used as the sampling pixel. Here, the plurality of pixel positions arbitrarily selected include, for example, a plurality of discrete and aperiodic pixel positions. For example, the preprocessing unit 210 can select the plurality of pixel positions by using pseudo-random numbers. Further, the selected pixel positions are preferably different for each frame, but some pixel positions may overlap between the frames.

Feature extraction processing is performed on the sampled images 36φ1 to 36φ4 of each phase (steps S11a to S11d). The feature amount of the sampled image 36φ1 first extracted in step S11a is integrated with the feature amount already accumulated in step S12a. In the example of FIG. 18, it is shown that the feature amount extracted from the sampled image 36φ1 is directly subjected to the recognition process and the recognition result φ1 is acquired (step S13a).

The recognition result φ1 of the recognition process in step S13a is a breaking news because it is the first recognition result of the recognition results φ1 to φ4 based on the sampling images 36 φ1 to 36φ4 generated from the image data 32 of one frame. Call it the result.

Next, the feature amount of the sampled image 36φ2 extracted in step S11b is integrated with the feature amount extracted from the sampled image 36φ1 in step S11a in step S12b. The recognition process is performed on the features integrated in step S12b, and the recognition result φ2 is acquired (step S13b). At the same time, the integrated feature amount is integrated with the feature amount of the sampled image 36φ3 extracted in step S11c (step S12c). That is, in step S12c, the feature quantities extracted from the sampled images 36φ1, 36φ2, and 36φ3 are integrated.

Recognition processing is performed on the feature amount integrated in step S12c, and the recognition result φ3 is acquired (step S13c). At the same time, the integrated feature amount is integrated with the feature amount of the sampled image 36φ4 extracted in step S11d (step S12d). That is, in step S12d, the feature quantities extracted from the sampled images 36φ1, 36φ2, 36φ3, and 36φ4 are integrated. Recognition processing is performed on the integrated feature amount in step S13d, and the recognition result φ4 is acquired.

The recognition result φ4 of the recognition process in this step S13d is called an integration result because it is acquired based on the integrated feature amount that integrates the feature amounts extracted from all of the sampled images 36φ1 to 36φ4. The integration result corresponds to the recognition result when the pixels 300 at all the pixel positions of the image data 32 of one frame are used as sampling pixels.

Further, the recognition results φ2 and φ3 of the recognition processing in steps S13b and S13c are recognition results acquired by the recognition processing in the middle of acquiring the integration result, and are called intermediate results.

As described above, the recognizer according to the first embodiment has any of these recognition results φ1 to φ4, or each recognition result φ1 to φ4, depending on the prior information detected in advance, the environmental information, and the like. Of these, a combination of multiple recognition results can be output adaptively.

FIG. 19 shows how to output the recognition result according to the first embodiment, that is, any one of the recognition results φ1 to φ4, or a combination of a plurality of recognition results among the recognition results φ1 to φ4. It is an example time chart showing an example of the timing for determining which recognition result is to be output and what condition is to determine the recognition result to be output. In FIG. 19, the passage of time is shown in the right direction.

In the example of FIG. 19, three timings, timing P, timing Q, and timing R, are shown as determination timings from the earliest in the time series.

Timing P detects prior information before a series of recognition processes are started. The prior information is, for example, information to be detected in advance for executing the recognition process, and based on the prior information, for example, which of the recognition results φ1 to φ4 is to be output is determined. The timing Q sets the recognition result to be output based on the recognition result in the predetermined period 100. At this time, the predetermined period 100 can be, for example, a period of one frame or more in frame units. The timing R sets the recognition result to be output based on the recognition result (for example, the above-mentioned breaking news result or the intermediate result) in the predetermined period 101 in the frame.

In the first embodiment, the recognition result to be output is determined based on the prior information detected before the recognition process is started, corresponding to the above-mentioned timing P process.

(2-2. More specific configuration example according to the first embodiment)
Next, a more specific configuration example according to the first embodiment will be described. FIG. 20A is a functional block diagram of an example for explaining a more detailed function of the pretreatment unit 210 according to the first embodiment. In FIG. 20A, the preprocessing unit 210 includes a utilization area acquisition unit 211, a recognition result setting unit 212, a recognition result output calculation unit 213, and a storage unit 214. The storage unit 214 includes a memory and a memory control unit for controlling reading and writing to the memory.

The used area acquisition unit 211, the recognition result setting unit 212, the recognition result output calculation unit 213, and the storage unit 214 (memory control unit) are realized by, for example, an information processing program running on the CPU 1205. This information processing program can be stored in ROM 1206 in advance. Not limited to this, the information processing program can also be supplied from the outside via the interface 1204 and written to the ROM 1206.

Further, the utilization area acquisition unit 211, the recognition result setting unit 212, the recognition result output calculation unit 213, and the storage unit 214 (memory control unit) may be realized by operating the CPU 1205 and the DSP 1203, respectively, according to the information processing program. .. Furthermore, a part or all of the usage area acquisition unit 211, the recognition result setting unit 212, the recognition result output calculation unit 213, and the storage unit 214 (memory control unit) are configured by a hardware circuit that operates in cooperation with each other. You may.

In the preprocessing unit 210, the used area acquisition unit 211 includes a reading unit that reads the image data 32 from the sensor unit 10b. The used area acquisition unit 211 performs subsampling processing on the image data 32 read from the sensor unit 10b by the reading unit according to a predetermined pattern (for example, a divided area 35 having a size of 4 pixels × 4 pixels). , Sampling pixels are extracted, and a sampled image 36φx having a phase φx is generated from the extracted sampling pixels. That is, the utilization area acquisition unit 211 realizes the function of the generation unit that generates the sampled image.

The used area acquisition unit 211 passes the generated sampled image 36φx to the recognition unit 220. The use area acquisition unit 211 can perform read control on the sensor unit 10b to specify a line or the like for reading.

FIG. 20B is a functional block diagram of an example for explaining a more detailed function of the recognition unit 220 according to the first embodiment. In FIG. 20B, the recognition unit 220 includes a feature amount calculation unit 221, a feature amount accumulation control unit 222, a feature amount accumulation unit 223, and a recognition process execution unit 224.

The feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount storage unit 223, and the recognition process execution unit 224 are realized by, for example, an information processing program running on the CPU 1205. This information processing program can be stored in ROM 1206 in advance. Not limited to this, the information processing program can also be supplied from the outside via the interface 1204 and written to the ROM 1206.

Further, the feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount storage unit 223, and the recognition process execution unit 224 may be realized by operating the CPU 1205 and the DSP 1203, respectively, according to the information processing program. Furthermore, a part or all of the feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount storage unit 223, and the recognition processing execution unit 224 may be configured by a hardware circuit that operates in cooperation with each other. ..

In the recognition unit 220, the feature amount calculation unit 221 and the feature amount accumulation control unit 222, the feature amount accumulation unit 223, and the recognition process execution unit 224 constitute a recognizer that executes recognition processing based on image data. The recognition unit 220 can construct the recognizer and change the configuration according to the recognizer information passed from the parameter storage unit 230.

In the recognition unit 220, the sampled image 36φx passed from the usage area acquisition unit 211 is input to the feature amount calculation unit 221. The feature amount calculation unit 221 includes one or more feature calculation units for calculating the feature amount, and calculates the feature amount based on the passed sampled image 36φx. That is, the feature amount calculation unit 221 functions as a calculation unit for calculating the feature amount of the sampled image 36φx composed of sampling pixels. Not limited to this, the feature amount calculation unit 221 may acquire information for setting the exposure and analog gain from, for example, the sensor unit 10b, and further use the acquired information to calculate the feature amount. The feature amount calculation unit 221 passes the calculated feature amount to the feature amount accumulation control unit 222.

The feature amount accumulation control unit 222 accumulates the feature amount passed from the feature amount calculation unit 221 in the feature amount accumulation unit 223. At this time, the feature amount accumulation control unit 222 integrates the past feature amount already accumulated in the feature amount storage unit 223 and the feature amount passed from the feature amount calculation unit 221 to generate the integrated feature amount. can do. Further, when the feature amount storage unit 223 is initialized and the feature amount does not exist, the feature amount accumulation control unit 222 uses the feature amount passed from the feature amount calculation unit 221 as the first feature amount as the feature amount storage unit. Accumulate in 223.

Further, the feature amount accumulation control unit 222 can delete unnecessary feature amounts from the feature amounts accumulated in the feature amount accumulation unit 223. The unnecessary feature amount is, for example, a feature amount related to the previous frame, a feature amount calculated based on a frame image of a scene different from the frame image in which a new feature amount is calculated, and an already accumulated feature amount. Not limited to this, the feature amount accumulation control unit 222 can also specify the feature amount to be deleted in response to an instruction from the outside. Further, the feature amount accumulation control unit 222 can also delete and initialize all the feature amounts accumulated in the feature amount accumulation unit 223, if necessary.

The feature amount accumulation control 222 is the feature amount passed from the feature amount calculation unit 221 to the feature amount accumulation control unit 222, or the feature amount accumulated in the feature amount accumulation unit 223, and the feature amount is passed from the feature amount calculation unit 221. The feature amount integrated with the feature amount is passed to the recognition processing execution unit 224.

The recognition process execution unit 224 executes a recognition process that performs object detection, person detection, face detection, etc. based on the feature amount passed from the feature amount accumulation control unit 222. For example, the recognition processing execution unit 224 recognizes when the feature amount is a feature amount passed from the feature amount calculation unit 221 to the feature amount accumulation control unit 222, that is, a feature amount that is not integrated with other feature amounts. The breaking news result is output as the recognition result of the processing result.

Further, for example, the recognition processing execution unit 224 recognizes the result of the recognition processing when the feature amount is a combination of all the feature amounts based on all the sampled images 36φx generated from the image data 32 of one frame. As a result, the integration result is output. Further, the recognition processing execution unit 224 can also output an intermediate result which is an intermediate recognition result between the breaking news result and the integration result.

Returning the explanation to FIG. 20A, the storage unit 214 accumulates the recognition results output by the recognition unit 220. The recognition result accumulated in the storage unit 214 is passed to the recognition result output calculation unit 213. Not limited to this, the storage unit 214 can also directly pass the recognition result output from the recognition processing execution unit 224 to the recognition result output calculation unit 213. The recognition result output calculation unit 213 obtains one or more recognition results to be output from the recognition unit 220 among the recognition results φ1 to φ4 based on the recognition result passed from the storage unit 214. The recognition result output calculation unit 213 passes the obtained recognition result to the recognition result output setting unit 212.

The recognition result output setting unit 212 sets the recognition result to be output to the recognition unit 220 based on the recognition result passed from the recognition result output calculation unit 213 or, for example, prior information supplied from the outside of the recognition processing unit 20b. .. That is, the recognition result output setting unit 212 executes any of the timing P, Q, and R processes described with reference to FIG. 19 based on the recognition result or prior information. In this way, the recognition result output setting unit and the recognition result output calculation unit 213 function as an output control unit that controls the output of the recognition result by the recognition unit 220.

(2-3. More specific processing according to the first embodiment)
Next, the process according to the first embodiment will be described more specifically. FIG. 21 is an example flowchart showing the recognition process according to the first embodiment. In the following description, the information processing device 1b according to the first embodiment will be described as being used for in-vehicle use.

In step S100, the recognition processing unit 20b detects advance information by the recognition result output setting unit 212. The prior information includes, for example, vehicle body information of a vehicle on which the information processing device 1b including the recognition processing unit 20b is mounted, position information indicating the current position of the vehicle (information processing device 1b), and time information indicating the current date and time. Can be applied.

More specifically, the traveling speed of the vehicle can be applied as the vehicle body information. Further, the position information can be acquired by providing a self-position acquisition means such as GNSS (Global Navigation Satellite System) or SLAM (Simultaneous Localization and Mapping) in the vehicle or the information processing device 1b itself. .. By referring to the map information based on the acquired location information, the country or region can be identified. In this case, in the case of Japan, the area includes a wide area such as a prefecture and a specific area (shopping district, school zone, etc.) in the urban area. The time information can be acquired from, for example, a timer or calendar mounted on the vehicle or the information processing device 1b itself, and it is possible to know the day and night and the season.

In the next step S101, the recognition processing unit 20b outputs the recognition result (for example, any of recognition results φ1 to φ4) at which timing based on the prior information detected in step S100 by the recognition result output setting unit 212. To determine. For example, when the prior information is running speed information, if the running speed is equal to or higher than a predetermined value, a breaking news result (recognition result φ1) may be output, and if not, an integration result (recognition result φ4) may be output. Be done. Also, for example, when the prior information is location information, if the current position is the school zone, the breaking news result (recognition result φ1) is output in order to prioritize the recognition of short-distance objects, and if it is a highway, it is far away. It is conceivable to output the integration result (recognition result φ4) in order to prioritize the recognition of traffic conditions and oncoming vehicles.

In the next step S102, the recognition processing unit 20b acquires the sampled image 36φx in which the image data 32 is subsampled according to, for example, a preset pattern (for example, the division area 35) by the utilization area acquisition unit 211. In the next step S103, the recognition result output setting unit 212 specifies to the recognition unit 220 which timing of the recognition results φ1 to φ4 is to be output according to the determination of the recognition result output setting unit 212 in step S101. do.

In the next step S104, the recognition processing unit 20b executes the recognition processing on the sampled image 36φx passed from the used area acquisition unit 211 by the recognition unit 220. In the next step S105, the recognition processing unit 20b determines whether or not the recognition result φx recognized by the recognition unit 220 is the recognition result specified in the recognition unit 220 by the recognition result output setting unit 212. do. For example, the recognition result output setting unit 212 can execute the determination by acquiring the recognition result φx output from the recognition unit 220 via the storage unit 214 and the recognition result output calculation unit 213. When the recognition result output setting unit 212 determines that the recognition result φx recognized by the recognition unit 220 is not the recognition result specified in the recognition unit 220 in step S103 (step S105, “No”), the recognition result output setting unit 212 performs the processing in step S102. Return to.

On the other hand, when the recognition result output setting unit 212 determines that the recognition result φx recognized by the recognition unit 220 is the recognition result specified in the recognition unit 220 in step S103 (step S105, “Yes”), the processing is performed. The process proceeds to step S106. In step S106, the recognition result output setting unit 212 instructs the recognition unit 220 to output the recognition result φx obtained by the recognition process in step S104. In response to this instruction, the recognition result φx is output from the recognition unit 220.

In this way, the recognition processing unit 20b according to the first embodiment detects the prior information and determines at what timing the recognition result φx is output based on the detected prior information. Therefore, by applying the recognition processing unit 20b according to the first embodiment, it is possible to obtain a recognition result according to the situation. Further, this makes it possible to suppress the cost of calculation amount and the communication cost.

(2-4. First modification of the first embodiment)
Next, a first modification of the first embodiment will be described. The first modification of the first embodiment corresponds to the processing of the timing R described with reference to FIG. 19, and based on the recognition result in the predetermined period 101 in the frame, the recognition result φx output by the recognition unit 220 is obtained. decide.

FIG. 22 is a flowchart of an example showing the recognition process according to the first modification of the first embodiment. Here, prior to the execution of the flowchart of FIG. 22, it is assumed that the recognition target for determining the recognition result φx output by the recognition unit 220 is set. When the information processing device 1b according to the first modification of the first embodiment is for in-vehicle use, for example, a person (pedestrian) can be considered as a recognition target. Not limited to this, it is also possible to recognize oncoming vehicles and road signs.

The processing of steps S200 to S203 is a loop processing. In step S200, the recognition processing unit 20b acquires the sampled image 36φx in which the image data 32 of the frame (t) is subsampled by the utilization area acquisition unit 211, for example, according to a preset pattern (for example, the division area 35). ..

In the next step S201, the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results φ1 to φ4 is to be output by the recognition result output setting unit 212. For example, the recognition result output setting unit 212 specifies to the recognition unit 220 to output the recognition result based on the sampled image 36φx acquired in step S200.

In the next step S202, the recognition processing unit 20b executes the recognition processing for the sampled image 36φx acquired in step S200 by the recognition unit 220. More specifically, the recognition unit 220 extracts the feature amount of the sampled image 36φx acquired in step S200, and executes the recognition process based on the extracted feature amount.

In the next step S203, the recognition processing unit 20b uses the recognition result output setting unit 212 to perform the recognition results φ1 to φ4 in the next frame (t + 1) of the frame (t) based on the result of the recognition processing in step S202. It is determined whether or not it is decided at which timing the recognition result is to be output. For example, the recognition result output setting unit 212 may use any of the recognition results φ1 to φ4 according to the recognition target, for example, when a recognition target designated in advance is detected based on the result of the recognition process in step S202. It is determined that it has been decided whether to output the timing recognition result.

When the recognition result output setting unit 212 determines in step S203 that the timing of which to output the recognition result has not been determined (step S203, “No”), the process is returned to step S200 and acquired in step S200. The sampled image 36φ (x + 1) of the next phase of the sampled image 36φx is acquired.

On the other hand, when the recognition result output setting unit 212 determines in step S203 that the timing of the recognition result to be output is determined (step S203, “Yes”), the process shifts to step S204.

In step S204, the recognition processing unit 20b acquires the sampled image 36φx according to the recognition result φx whose output is determined according to the determination in step S203 in the utilization area acquisition unit 211. Here, the sampled image 36φx to be acquired may be the sampled image 36φx of the image data 32 at the time (t), or may be the sampled image 36φx of the image data 32 at the next time (t + 1).

In the next step S205, the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results φ1 to φ4 is to be output by the recognition result output setting unit 212. For example, the recognition result output setting unit 212 specifies to the recognition unit 220 to output the recognition result based on the sampled image 36φx determined in step S203.

In the next step S206, the recognition processing unit 20b executes the recognition processing for the sampled image 36φx acquired in step S204 by the recognition unit 220.

In the next step S207, whether or not the recognition processing unit 20b has performed the recognition process based on the recognition result whose output is specified in step S205 by the recognition result output setting unit 212, for example, the sampling image 36φx determined in step S203. Is determined. If it is determined that the recognition process has not been performed (step S207, "No"), the process is returned to step S204. On the other hand, when the recognition result output setting unit 212 determines that the recognition process has been performed (step S207, “Yes”), the process shifts to step S208.

In step S208, the recognition processing unit 20b instructs the recognition unit 220 to output the recognition result φx by the recognition result output setting unit 212. The recognition unit 220 outputs the recognition result φx in response to this instruction.

The explanation will be given using a more specific example. Here, an obstacle (pedestrian, etc.) on the road is applied as a recognition target, and the preliminary report result (recognition result φ1) is within the braking target distance of the vehicle (own vehicle) on which the information processing device 1b is mounted. It is assumed that the recognition target is recognizable.

FIG. 23 is a diagram corresponding to FIG. 18 described above, and is a schematic diagram for explaining the recognition process according to the first modification of the first embodiment. At time (t), it is assumed that the recognition target is recognized by the recognition process (step S13a) based on the sampled image 36φ1 by the first subsampling of the image data 32 (step S202). In this case, since the recognition target is located within the range of the braking target distance of the own vehicle, the information processing device 1b performs communication for urging the own vehicle to brake.

It is not necessary to detect an object distant from the recognition target at time (t). Therefore, for example, the recognition result output setting unit 212 has at least one of the recognition results φ1 to φ4 based on the image data 32 of the next time (t + 1) (the image data 32 of the frame next to the image data 32 of the time (t)). Is determined as the recognition result for output. As a result, the recognition unit 220 does not execute the recognition process (step S13b to step S13d) based on the remaining sampled images 36φ2 to 36φ4 for the image data 32 of the time (t), and starts from step S204 to the next time (t + 1). ) Is executed.

FIG. 24 is an example time chart for explaining the effect of the processing of FIG. 23. In FIGS. 16 and 17A described above, when the sampled images 36φ1 to 36φ4 of the four phases φ1 to φ4 are acquired from the image data 32, each recognition process for each of the sampled images 36φ1 to 36φ4 is completed within one frame period. It is shown to do.

However, in reality, the one-frame period includes an exposure period and a period for performing pixel data transfer processing, and in fact, for example, the recognition processing for the first sampled image 36φ1 is delayed with respect to the _{frame start timing time t 0.} And start. Further, each recognition process is not always completed within 1/4 of the time of one frame period.

Considering these points, when each recognition process for each sampled image 36φ1 to 36φ4 is executed in the first frame period starting from time t0, the time t _{11 at} which the recognition process of the final recognition result φ4 is completed is the second. There is a possibility that _{the time t 1 at} which the one frame period ends may not be met. That is, in this case, the recognition process of the first frame period is applied to the next second frame period across the frames. Therefore, the next recognition process _{starts from the time t 2 at} which the second frame period ends, which may cause a problem in responsiveness.

On the other hand, as described with reference to FIG. 23, in the first frame period, for example, only the recognition process for the first sampled image 36φ1 is executed, and the recognition process for the subsequent sampled images φ2 to φ4 is not performed. The recognition process for the image data 32 can be started from _{the time t 1 at which the second frame period starts.} Therefore, it is superior in responsiveness as compared with the above example.

In the case of the recognition process, if it is not necessary to consider the visibility of the image is large, in this example, is possible recognition processing for the sampling image 36φ1 is to start the second frame period at the time t ₁₀ to end It is possible to further improve the responsiveness.

(2-5. Second modification of the first embodiment)
Next, a second modification of the first embodiment will be described. The second modification of the first embodiment corresponds to the processing of the timing Q described with reference to FIG. 19, and sets the recognition result φx to be output based on the recognition result in the predetermined period 100 across the frames.

FIG. 25 is a flowchart of an example showing the recognition process according to the second modification of the first embodiment. Here, prior to the execution of the flowchart of FIG. 25, it is assumed that the recognition target for determining the recognition result φx output by the recognition unit 220 is set. When the information processing device 1b according to the second modification of the first embodiment is for in-vehicle use, for example, a person (pedestrian) can be considered as a recognition target. Not limited to this, it is also possible to recognize oncoming vehicles and road signs.

The processing of steps S300 to S304 is a loop processing. In step S300, the recognition processing unit 20b acquires the sampled image 36φx in which the image data 32 of one frame is subsampled by the utilization area acquisition unit 211, for example, according to a preset pattern (for example, the division area 35).

In the next step S301, the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results φ1 to φ4 is to be output by the recognition result output setting unit 212. For example, the recognition result output setting unit 212 specifies to the recognition unit 220 to output the recognition result based on the sampled image 36φx acquired in step S300.

In the next step S302, the recognition processing unit 20b executes the recognition processing for the sampled image 36φx acquired in step S300 by the recognition unit 220. More specifically, the recognition unit 220 extracts the feature amount of the sampled image 36φx acquired in step S300, and executes the recognition process based on the extracted feature amount. In the next step S303, the storage unit 214 accumulates the recognition result φx by the recognition process executed in step S302.

In the next step S304, the recognition processing unit 20b determines whether or not the processing of steps S300 to S303 has been executed for a predetermined period (for example, several frame period) by the recognition result output setting unit 212. When the recognition result output setting unit 212 determines that the process has not been executed for a predetermined period (step S304, "No"), the process returns to step S300. On the other hand, when the recognition result output setting unit 212 determines that the process has been executed for a predetermined period (step S304, "Yes"), the process shifts to step S305.

In step S305, the recognition processing unit 20b outputs the recognition result of which timing of the recognition results φ1 to φ4 in the subsequent frames based on the recognition result φx accumulated in the storage unit 214 by the recognition result output setting unit 212. Decide if you want to. Here, the recognition result output setting unit 212 can determine one or more recognition results as the output recognition result φx from the recognition results φ1 to φ4 based on the image data 32 of one frame.

The next steps S306 to S309 are loop processing. In step S306, the recognition processing unit 20b acquires the sampled image 36φx according to the recognition result φx whose output is determined according to the determination in step S305 in the utilization area acquisition unit 211.

In the next step S307, the recognition processing unit 20b specifies to the recognition unit 220 which timing of the recognition results φ1 to φ4 is to be output by the recognition result output setting unit 212. For example, the recognition result output setting unit 212 designates the recognition unit 220 to output the recognition result based on the sampled image 36φx determined in step S303. When a plurality of recognition results are determined as the recognition results φx to be output in step S305, they are sequentially selected and determined in the loop of steps S306 to S309.

In the next step S308, the recognition processing unit 20b executes the recognition processing for the sampled image 36φx acquired in step S306 by the recognition unit 220.

In the next step S309, the recognition processing unit 20b determines whether or not all the recognition processing based on the recognition result whose output is specified in step S305 has been performed by the recognition result output setting unit 212. When it is determined that some of the recognition results whose output has been determined have not been recognized (step S309, "No"), the recognition result output setting unit 212 returns the process to step S306. On the other hand, when the recognition result output setting unit 212 determines that the recognition process has been performed for all the recognition results whose output has been determined (step S309, “Yes”), the process shifts to step S310.

In step S310, the recognition processing unit 20b instructs the recognition result output setting unit 212 to output each recognition result φx whose output has been determined to the recognition unit 220. The recognition unit 220 outputs each recognition result φx in response to this instruction. The process of step S310 may be executed between the process of step S308 and the process of step S309.

The explanation will be given using a more specific example. Similar to the above, an obstacle (pedestrian, etc.) on the road is applied as the recognition target, and the preliminary report result (recognition result φ1) shows the braking target distance of the vehicle (own vehicle) on which the information processing device 1b is mounted. It is assumed that the recognition target can be recognized within the range.

FIG. 26 is a diagram corresponding to FIG. 18 described above, and is a schematic diagram for explaining the recognition process according to the second modification of the first embodiment. First, as shown in the upper part of FIG. 26, the recognition result output setting unit 212 instructs the recognition unit 220 to output, for example, the recognition result φ4, that is, the integration result, and accumulates the recognition result φ4 for a predetermined period 100. Accumulate in unit 214. Based on the recognition result φ4 accumulated in the storage unit 214, the recognition result output setting unit 212 determines that the area where the own vehicle is currently located is an area with many pedestrians.

In this case, since it is considered that there are many pedestrians jumping out, the recognition result output setting unit 212 outputs the recognition result φ1 (breaking news result) and the recognition result φ4 (integration result) as shown in the lower part of FIG. 26. (Step S305). The recognition unit 220 outputs the recognition results φ1 and φ4 according to this determination. These recognition results φ1 and φ4 are sent to, for example, the braking system of the own vehicle.

As described above, in the second modification of the first embodiment, since the recognition result to be output is determined based on the recognition result for a predetermined period, it is possible to appropriately output the recognition result according to the situation. Become.

(2-6. Another modified example of the first embodiment)
Next, another modification of the first embodiment will be described. In the above description, the technique according to the present disclosure has been described as being applied to a recognition process for detecting an object, but this is not limited to this example. For example, the techniques according to the present disclosure can be applied to semantic segmentation and other similar tasks.

Further, in the above description, the technique according to the present disclosure has been described as being applied to the recognition process using DNN, but this is not limited to this example. For example, any architecture that expands and uses image information on the time axis can be applied to other technologies.

[3. Second Embodiment]
Next, a second embodiment of the present disclosure will be described. A second embodiment of the present disclosure is an example in which a sensor unit 10b including a pixel array unit 1001, a recognition unit 220, and a configuration corresponding to a preprocessing unit 210 are integrally incorporated into a layered CIS. ..

FIG. 28 is a block diagram showing a configuration of an example of the information processing device according to the second embodiment. In FIG. 28, the information processing device 1c includes a sensor unit 10c and a recognition unit 220. Further, the sensor unit 10c includes a pixel array unit 1001 and a read control unit 240. The read control unit 240 includes, for example, a function corresponding to the preprocessing unit 210 described in the first embodiment and a function of the control unit 1100 in the imaging unit 1200.

Note that, in FIG. 28, among the configurations described with reference to FIG. 5, the vertical scanning unit 1002, the AD conversion unit 1003, and the signal processing unit 1101 will be described as being included in the pixel array unit 1001.

The read control unit 240 supplies the pixel array unit 1001 with a control signal that specifies the pixel circuit 1000 that reads the pixel signal. For example, the read control unit 240 can selectively read a line including sampling pixels from the pixel array unit 1001. Not limited to this, the read control unit 240 can selectively specify the pixel circuit 1000 corresponding to the sampling pixel in the pixel circuit 1000 unit for the pixel array unit 1001. At this time, the read control unit 240 may specify to the pixel array unit 1001 the pixel circuit 1000 corresponding to the pixel position of the sampled pixel by subsampling performed while shifting the phase described in the first embodiment. can.

The pixel array unit 1001 converts the pixel signal read from the designated pixel circuit 1000 into digital pixel data, and passes this pixel data to the read control unit 240. The read control unit 240 passes the pixel data for one frame passed from the pixel array unit 1001 to the recognition unit 220 as image data. This image data is a sampled image by phase shift subsampling. The recognition unit 220 executes a recognition process on the passed image data.

In the second embodiment, the information processing apparatus 1c can be configured by the laminated CIS having a two-layer structure in which semiconductor chips are laminated in two layers, which is described with reference to FIG. 6A. With reference to FIG. 6A, the pixel portion 2020a is formed on the semiconductor chip of the first layer, and the memory + logic portion 2020b is formed on the semiconductor chip of the second layer. The pixel unit 2020a includes at least the sensor unit 10c in the information processing device 1c. The memory + logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit 1001, a read control unit 240, and a recognition unit 220. The memory + logic unit 2020b can further include a frame memory.

As another example, the information processing apparatus 1c can be configured by the laminated CIS having a three-layer structure in which semiconductor chips are laminated in three layers, which is described with reference to FIG. 6B. In this case, the pixel portion 2020a described above is formed on the semiconductor chip of the first layer, the memory portion 2020c including, for example, a frame memory is formed on the semiconductor chip of the second layer, and the memory + logic described above is formed on the semiconductor chip of the third layer. The logic unit 2020d corresponding to the unit 2020b is formed. In this case, the logic unit 2020d includes, for example, a drive circuit for driving the pixel array unit, a read control unit 240, and a recognition unit 220. Further, the memory unit 2020c can include a frame memory and a memory 1202.

As described above, in the second embodiment, the sensor unit 10c performs the subsampling process. Therefore, it is not necessary to read from the all-pixel circuit 1000 included in the pixel array unit 1001. Therefore, the delay in the recognition process can be further shortened as compared with the first embodiment described above. Further, since the pixel circuit 1000 of the line including the sampling pixels is selectively read from the all pixel circuits 1000, the amount of reading the pixel signal from the pixel array unit 1001 can be reduced, and the bus width can be reduced.

Further, in the second embodiment, the pixel array unit 1001 selectively reads out the lines including the sampling pixels, and reads out by thinning out the lines. Therefore, it is possible to reduce the distortion of the captured image by the rolling shutter. Further, it is possible to reduce the power consumption at the time of imaging in the pixel array unit 1001. Further, in the lines thinned out by subsampling, it is possible to change the imaging conditions such as exposure to the lines to be read out by subsampling to perform imaging.

(3-1. Modified example of the second embodiment)
Next, a modified example of the second embodiment will be described. A modification of the second embodiment is an example in which the sensor unit 10c and the recognition unit 220 are separated from each other in the information processing device 1c according to the second embodiment described above.

FIG. 29 is a block diagram showing a configuration of an example of an information processing device according to a modified example of the second embodiment. In FIG. 29, the information processing device 1d includes a sensor unit 10d and a recognition processing unit 20d, and the sensor unit 10d includes a pixel array unit 1001 and a read control unit 240. Further, the recognition processing unit 20d includes a recognition unit 220.

Here, the sensor unit 10d is formed by, for example, the laminated CIS having a two-layer structure in which semiconductor chips are laminated in two layers, which is described with reference to FIG. 6A. With reference to FIG. 6A, the pixel portion 2020a is formed on the semiconductor chip of the first layer, and the memory + logic portion 2020b is formed on the semiconductor chip of the second layer. The pixel unit 2020a includes at least the pixel array unit 1001 in the sensor unit 10d. The memory + logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit 1001 and a read control unit 240. The memory + logic unit 2020b can further include a frame memory.

The sensor unit 10d outputs the image data of the sampled image from the read control unit 240 and supplies it to the recognition processing unit 20d included in the hardware different from the sensor unit 10d. The recognition processing unit 20d inputs the image data supplied from the sensor unit 10d to the recognition unit 220. The recognition unit 220 executes the recognition process based on the input image data, and outputs the recognition result to the outside.

As another example, the sensor unit 10d can be formed by the laminated CIS having a three-layer structure in which semiconductor chips are laminated in three layers, which is described with reference to FIG. 6B. In this case, the pixel portion 2020a described above is formed on the semiconductor chip of the first layer, the memory portion 2020c including, for example, a frame memory is formed on the semiconductor chip of the second layer, and the memory + logic described above is formed on the semiconductor chip of the third layer. The logic portion 2020b corresponding to the portion 2020b is formed. In this case, the logic unit 2020b includes, for example, a drive circuit for driving the pixel array unit 1001 and a read control unit 240. Further, the memory unit 2020c can include a frame memory and a memory 1202.

In this way, by configuring the recognition processing unit 20d (recognition unit 220) with hardware different from the sensor unit 10d, it is possible to easily change the configuration of the recognition unit 220, for example, the recognition model.

Further, since the sensor unit 10d performs the recognition process based on the sub-sampled sampled image, the load of the recognition process can be reduced as compared with the case where the recognition process is performed by using the image data 32 of the captured image as it is. Can be done. Therefore, for example, in the recognition processing unit 20d, a CPU, DSP, or GPU having a low processing capacity can be used, and the cost of the information processing device 1d can be reduced.

[4. Third Embodiment]
(4-1. Application example of the technology of the present disclosure)
Next, as a third embodiment, the first embodiment and each modification thereof according to the present disclosure, and the application examples of the

information processing devices

1b, 1c, and 1d according to the second embodiment and the modification thereof. Will be described. FIG. 30 is a diagram showing a first embodiment and each modification thereof, and a usage example using the

information processing devices

1b, 1c, and 1d according to the second embodiment and the modification. In the following description, the

information processing devices

1b, 1c and 1d will be represented by the information processing device 1b when it is not necessary to distinguish them.

The information processing device 1b described above can be used in various cases where, for example, as shown below, light such as visible light, infrared light, ultraviolet light, and X-ray is sensed and recognition processing is performed based on the sensing result. can.

-A device that captures images used for viewing, such as digital cameras and mobile devices with camera functions.
・ For safe driving such as automatic stop and recognition of the driver's condition, in-vehicle sensors that photograph the front, rear, surroundings, inside of the vehicle, etc., surveillance cameras that monitor traveling vehicles and roads, inter-vehicle distance, etc. A device used for traffic, such as a distance measuring sensor that measures a distance.
-A device used for home appliances such as TVs, refrigerators, and air conditioners in order to take a picture of a user's gesture and operate the device according to the gesture.
-Devices used for medical treatment and healthcare, such as endoscopes and devices that perform angiography by receiving infrared light.
-Devices used for security, such as surveillance cameras for crime prevention and cameras for personal authentication.
-Devices used for cosmetology, such as a skin measuring device that photographs the skin and a microscope that photographs the scalp.
-Devices used for sports, such as action cameras and wearable cameras for sports applications.
-Agricultural equipment such as cameras for monitoring the condition of fields and crops.

(4-2. Application example to mobile body)
The technology according to the present disclosure (the present technology) can be applied to various products. For example, the technology according to the present disclosure is realized as a device mounted on a moving body of any kind such as an automobile, an electric vehicle, a hybrid electric vehicle, a motorcycle, a bicycle, a personal mobility, an airplane, a drone, a ship, and a robot. You may.

FIG. 31 is a block diagram showing a schematic configuration example of a vehicle control system, which is an example of a mobile control system to which the technique according to the present disclosure can be applied.

The vehicle control system 12000 includes a plurality of electronic control units connected via the communication network 12001. In the example shown in FIG. 31, the vehicle control system 12000 includes a drive system control unit 12010, a body system control unit 12020, an outside information detection unit 12030, an in-vehicle information detection unit 12040, and an integrated control unit 12050. Further, as a functional configuration of the integrated control unit 12050, a microcomputer 12051, an audio image output unit 12052, and an in-vehicle network I / F (interface) 12053 are shown.

The drive system control unit 12010 controls the operation of the device related to the drive system of the vehicle according to various programs. For example, the drive system control unit 12010 provides a driving force generator for generating the driving force of the vehicle such as an internal combustion engine or a driving motor, a driving force transmission mechanism for transmitting the driving force to the wheels, and a steering angle of the vehicle. It functions as a control device such as a steering mechanism for adjusting and a braking device for generating a braking force of a vehicle.

The body system control unit 12020 controls the operation of various devices mounted on the vehicle body according to various programs. For example, the body system control unit 12020 functions as a keyless entry system, a smart key system, a power window device, or a control device for various lamps such as a head lamp, a back lamp, a brake lamp, a winker, or a fog lamp. In this case, the body system control unit 12020 may be input with radio waves transmitted from a portable device that substitutes for the key or signals of various switches. The body system control unit 12020 receives inputs of these radio waves or signals and controls a vehicle door lock device, a power window device, a lamp, and the like.

The vehicle outside information detection unit 12030 detects information outside the vehicle equipped with the vehicle control system 12000. For example, the imaging unit 12031 is connected to the vehicle exterior information detection unit 12030. The vehicle outside information detection unit 12030 causes the image pickup unit 12031 to capture an image of the outside of the vehicle and receives the captured image. The vehicle exterior information detection unit 12030 may perform object detection processing or distance detection processing such as a person, a vehicle, an obstacle, a sign, or a character on the road surface based on the received image.

The imaging unit 12031 is an optical sensor that receives light and outputs an electric signal according to the amount of the light received. The image pickup unit 12031 can output an electric signal as an image or can output it as distance measurement information. Further, the light received by the imaging unit 12031 may be visible light or invisible light such as infrared light.

The in-vehicle information detection unit 12040 detects the in-vehicle information. For example, a driver state detection unit 12041 that detects the driver's state is connected to the in-vehicle information detection unit 12040. The driver state detection unit 12041 includes, for example, a camera that images the driver, and the in-vehicle information detection unit 12040 determines the degree of fatigue or concentration of the driver based on the detection information input from the driver state detection unit 12041. It may be calculated, or it may be determined whether the driver is dozing.

The microcomputer 12051 calculates the control target value of the driving force generator, the steering mechanism, or the braking device based on the information inside and outside the vehicle acquired by the outside information detection unit 12030 or the inside information detection unit 12040, and the drive system control unit. A control command can be output to 12010. For example, the microcomputer 12051 realizes ADAS (Advanced Driver Assistance System) functions including vehicle collision avoidance or impact mitigation, follow-up driving based on inter-vehicle distance, vehicle speed maintenance driving, vehicle collision warning, vehicle lane deviation warning, and the like. It is possible to perform cooperative control for the purpose of.

Further, the microcomputer 12051 controls the driving force generator, the steering mechanism, the braking device, and the like based on the information around the vehicle acquired by the vehicle exterior information detection unit 12030 or the vehicle interior information detection unit 12040, so that the driver can control the vehicle. It is possible to perform coordinated control for the purpose of automatic driving, etc., which runs autonomously without depending on the operation.

Further, the microcomputer 12051 can output a control command to the body system control unit 12020 based on the information outside the vehicle acquired by the vehicle exterior information detection unit 12030. For example, the microcomputer 12051 controls the headlamps according to the position of the preceding vehicle or the oncoming vehicle detected by the external information detection unit 12030, and performs coordinated control for the purpose of anti-glare such as switching the high beam to the low beam. It can be carried out.

The audio image output unit 12052 transmits the output signal of at least one of the audio and the image to the output device capable of visually or audibly notifying the passenger or the outside of the vehicle of the information. In the example of FIG. 31, an audio speaker 12061, a display unit 12062, and an instrument panel 12063 are exemplified as output devices. The display unit 12062 may include, for example, at least one of an onboard display and a heads-up display.

FIG. 32 is a diagram showing an example of the installation position of the imaging unit 12031.

In FIG. 32, the vehicle 12100 has

imaging units

12101, 12102, 12103, 12104, 12105 as imaging units 12031.

The

imaging units

12101, 12102, 12103, 12104, 12105 are provided at positions such as the front nose, side mirrors, rear bumpers, back doors, and the upper part of the windshield in the vehicle interior of the vehicle 12100, for example. The image pickup unit 12101 provided on the front nose and the image pickup section 12105 provided on the upper part of the windshield in the vehicle interior mainly acquire an image in front of the vehicle 12100. The

imaging units

12102 and 12103 provided in the side mirrors mainly acquire images of the side of the vehicle 12100. The imaging unit 12104 provided on the rear bumper or the back door mainly acquires an image of the rear of the vehicle 12100. The images in front acquired by the

imaging units

12101 and 12105 are mainly used for detecting a preceding vehicle or a pedestrian, an obstacle, a traffic light, a traffic sign, a lane, or the like.

Note that FIG. 32 shows an example of the photographing range of the imaging units 12101 to 12104. The imaging range 12111 indicates the imaging range of the imaging unit 12101 provided on the front nose, the imaging ranges 12112 and 12113 indicate the imaging ranges of the

imaging units

12102 and 12103 provided on the side mirrors, respectively, and the imaging range 12114 indicates the imaging range of the

imaging units

12102 and 12103. The imaging range of the imaging unit 12104 provided on the rear bumper or the back door is shown. For example, by superimposing the image data captured by the imaging units 12101 to 12104, a bird's-eye view image of the vehicle 12100 as viewed from above can be obtained.

At least one of the imaging units 12101 to 12104 may have a function of acquiring distance information. For example, at least one of the image pickup units 12101 to 12104 may be a stereo camera composed of a plurality of image pickup elements, or an image pickup element having pixels for phase difference detection.

For example, the microcomputer 12051 has a distance to each three-dimensional object within the imaging range 12111 to 12114 based on the distance information obtained from the imaging units 12101 to 12104, and a temporal change of this distance (relative velocity with respect to the vehicle 12100). By obtaining can. Further, the microcomputer 12051 can set an inter-vehicle distance to be secured in front of the preceding vehicle in advance, and can perform automatic braking control (including follow-up stop control), automatic acceleration control (including follow-up start control), and the like. In this way, it is possible to perform coordinated control for the purpose of automatic driving or the like in which the vehicle travels autonomously without depending on the operation of the driver.

For example, the microcomputer 12051 converts three-dimensional object data related to a three-dimensional object into two-wheeled vehicles, ordinary vehicles, large vehicles, pedestrians, electric poles, and other three-dimensional objects based on the distance information obtained from the imaging units 12101 to 12104. It can be classified and extracted and used for automatic avoidance of obstacles. For example, the microcomputer 12051 distinguishes obstacles around the vehicle 12100 into obstacles that can be seen by the driver of the vehicle 12100 and obstacles that are difficult to see. Then, the microcomputer 12051 determines the collision risk indicating the risk of collision with each obstacle, and when the collision risk is equal to or higher than the set value and there is a possibility of collision, the microcomputer 12051 is used via the audio speaker 12061 or the display unit 12062. By outputting an alarm to the driver and performing forced deceleration and avoidance steering via the drive system control unit 12010, driving support for collision avoidance can be provided.

At least one of the imaging units 12101 to 12104 may be an infrared camera that detects infrared rays. For example, the microcomputer 12051 can recognize a pedestrian by determining whether or not a pedestrian is present in the captured image of the imaging units 12101 to 12104. Such pedestrian recognition includes, for example, a procedure for extracting feature points in an image captured by an imaging unit 12101 to 12104 as an infrared camera, and a pattern matching process for a series of feature points indicating the outline of an object to determine whether or not the pedestrian is a pedestrian. It is done by the procedure to determine. When the microcomputer 12051 determines that a pedestrian is present in the captured images of the imaging units 12101 to 12104 and recognizes the pedestrian, the audio image output unit 12052 outputs a square contour line for emphasizing the recognized pedestrian. The display unit 12062 is controlled so as to superimpose and display. Further, the audio image output unit 12052 may control the display unit 12062 so as to display an icon or the like indicating a pedestrian at a desired position.

The above is an example of a vehicle control system to which the technology according to the present disclosure can be applied. The technique according to the present disclosure can be applied to the imaging unit 12031 and the vehicle exterior information detection unit 12030 among the configurations described above. Specifically, for example, the sensor unit 10b of the information processing device 1b is applied to the image pickup unit 12031, and the recognition processing unit 20b is applied to the vehicle exterior information detection unit 12030. The recognition result output from the recognition processing unit 20b is passed to the integrated control unit 12050 via, for example, the communication network 12001.

As described above, by applying the technique according to the present disclosure to the imaging unit 12031 and the vehicle exterior information detection unit 12030, it is possible to switch the pattern by subsampling according to a predetermined condition, and the recognizer used for the recognition process. And the parameters can be changed according to the switched pattern. Therefore, the breaking news result, which is the recognition result with an emphasis on breaking news, can be obtained with higher accuracy, and more reliable driving support becomes possible.

Note that the effects described in this specification are merely examples and are not limited, and other effects may be obtained.

The present technology can also have the following configurations.
(1)
A generation unit that generates a sampled image composed of sampling pixels obtained according to pixel positions set for each division area in which imaging information composed of pixels is divided in a predetermined pattern.
A calculation unit that calculates the feature amount of the sampled image,
A storage unit that accumulates the calculated features and
A recognition unit that performs recognition processing based on at least a part of the feature amounts accumulated in the storage unit and outputs a recognition result.
An output control unit that controls the recognition unit to output the recognition result based on a predetermined feature amount among the feature amounts stored in the storage unit.
Information processing device equipped with.
(2)
The predetermined feature amount is
The feature amount calculated using the sampled image of 1.
The information processing device according to (1) above.
(3)
The predetermined feature amount is
It is the feature amount calculated using a plurality of the sampled images including a part of the sampled pixels at all the pixel positions of the captured image in one frame.
The information processing device according to (1) above.
(4)
The predetermined feature amount is
It is the feature amount calculated by using the sampling pixels of all the pixel positions of the imaging information in one frame.
The information processing device according to (1) above.
(5)
The output control unit
The recognition unit
A first feature amount calculated using the sampled image of 1 as the predetermined feature amount, and
A second feature amount calculated using a plurality of the sampled images, which includes a part of sampled pixels at all pixel positions of the captured image in one frame as the predetermined feature amount, and a second feature amount.
As the predetermined feature amount, a third feature amount calculated by using sampling pixels at all pixel positions of the imaging information in one frame and a third feature amount.
To determine which of the above recognition results is to be output.
The information processing device according to (1) above.
(6)
The output control unit
Which of the first feature amount, the second feature amount, and the third feature amount to output the recognition result based on the feature amount is set in advance for the recognition process. Determined based on prior information
The information processing device according to (5) above.
(7)
The output control unit
Based on the intermediate result of the recognition process, it is determined which of the first feature amount, the second feature amount, and the third feature amount is used to output the recognition result. do,
The information processing device according to (5) above.
(8)
Further provided with a cumulative unit for accumulating the recognition results,
The output control unit
The recognition accumulated in the cumulative unit indicates which of the first feature amount, the second feature amount, and the third feature amount is used to output the recognition result. Determine based on the results,
The information processing device according to (5) above.
(9)
The output control unit
According to the recognition result based on the feature amount calculated using the sampled image of 1, one or more of the imaging information to be acquired later in time series with respect to the frame from which the sampled image was acquired. In the frame, it is determined whether or not at least one of the calculation process by the calculation unit, the accumulation process by the storage unit, and the recognition process by the recognition unit needs to be executed.
The information processing device according to (1) above.
(10)
The output control unit
Of the captured information, the The calculation process by the calculation unit, the accumulation process by the storage unit, and the recognition process by the recognition unit in one or more frames acquired later in time series with respect to the frame from which the sampled image was acquired. To determine the necessity of executing at least one of the processes,
The information processing device according to (1) above.
(11)
The recognition unit
The recognition process is performed based on the integrated feature amount that integrates the plurality of feature amounts accumulated in the storage unit.
The information processing device according to any one of (1) to (10).
(12)
The recognition unit
The feature amount calculated by the calculation unit in response to the acquisition of the imaging information is integrated with at least a part of the feature amount accumulated in the storage unit by the time immediately before the acquisition, and the integrated feature is integrated. Perform the recognition process based on the amount.
The information processing device according to (11) above.
(13)
The recognition unit
Based on the teacher data for each pixel corresponding to the pixel position in each of the divided regions, the recognition process for the feature amount of the sampled image is performed.
The information processing device according to any one of (1) to (12).
(14)
The recognition unit
Among the imaging information, the sampling pixels set in the first imaging information and the sampling pixels set in the second imaging information acquired next to the first imaging information in time series are included. Machine learning processing is executed by the used RNN (Recurrent Neural Network), and the recognition processing is performed based on the result of the machine learning processing.
The information processing device according to any one of (1) to (13).
(15)
Performed by Prossa,
A generation step of generating a sampling image composed of sampling pixels obtained according to a pixel position set for each division area in which imaging information composed of pixels is divided in a predetermined pattern, and a generation step.
A calculation step for calculating the feature amount of the sampled image and
Accumulation steps for accumulating the calculated features and
A recognition step that performs recognition processing based on at least a part of the features accumulated by the accumulation step and outputs a recognition result.
An output control step that controls the recognition step to output the recognition result based on a predetermined feature amount among the feature amounts accumulated by the accumulation step.
Information processing method having.
(16)
A generation step of generating a sampling image composed of sampling pixels obtained according to a pixel position set for each division area in which imaging information composed of pixels is divided in a predetermined pattern, and a generation step.
A calculation step for calculating the feature amount of the sampled image and
Accumulation steps for accumulating the calculated features and
A recognition step that performs recognition processing based on at least a part of the features accumulated by the accumulation step and outputs a recognition result.
An output control step that controls the recognition step to output the recognition result based on a predetermined feature amount among the feature amounts accumulated by the accumulation step.
An information processing program that allows a computer to execute.

1a, 1b, 1c, 1d

Information processing device

10a, 10b, 10c,

10d Sensor unit

20a, 20b, 20d

Recognition processing unit

30a, 30b Captured

image

32, 32a, 32a', 32b, 32c, 32d Image data 35, 35' Divided area 36, 36φ1, 36φ1', 36φ2, 36φ3, 36φ4, 36φ01,

36φx Sampling image

50a, 50a', 50b, 50c, 50d Feature 210 Preprocessing unit 211 Utilized area acquisition unit 212 Recognition result Output setting unit 213 Recognition result Output calculation unit 214 Accumulation unit 220 Recognition unit 221 Feature amount calculation unit 222 Feature amount accumulation control unit 223 Feature amount accumulation unit 224 Recognition processing execution unit 240 Read control unit 300, 300φ1, 300φ2, 300φ3, 300φ4 pixels

Claims

A generation unit that generates a sampled image composed of sampling pixels obtained according to pixel positions set for each division area in which imaging information composed of pixels is divided in a predetermined pattern.
A calculation unit that calculates the feature amount of the sampled image,
A storage unit that accumulates the calculated features and
A recognition unit that performs recognition processing based on at least a part of the feature amounts accumulated in the storage unit and outputs a recognition result.
An output control unit that controls the recognition unit to output the recognition result based on a predetermined feature amount among the feature amounts stored in the storage unit.
Information processing device equipped with.
The predetermined feature amount is
The feature amount calculated using the sampled image of 1.
The information processing device according to claim 1.
The predetermined feature amount is
It is the feature amount calculated using a plurality of the sampled images including a part of the sampled pixels at all the pixel positions of the captured image in one frame.
The information processing device according to claim 1.
The predetermined feature amount is
It is the feature amount calculated by using the sampling pixels of all the pixel positions of the imaging information in one frame.
The information processing device according to claim 1.
The output control unit
The recognition unit
A first feature amount calculated using the sampled image of 1 as the predetermined feature amount, and
A second feature amount calculated using a plurality of the sampled images, which includes a part of sampled pixels at all pixel positions of the captured image in one frame as the predetermined feature amount, and a second feature amount.
As the predetermined feature amount, a third feature amount calculated by using sampling pixels at all pixel positions of the imaging information in one frame and a third feature amount.
To determine which of the above recognition results is to be output.
The information processing device according to claim 1.
The output control unit
Which of the first feature amount, the second feature amount, and the third feature amount to output the recognition result based on the feature amount is set in advance for the recognition process. Determined based on prior information
The information processing device according to claim 5.
The output control unit
Based on the intermediate result of the recognition process, it is determined which of the first feature amount, the second feature amount, and the third feature amount is used to output the recognition result. do,
The information processing device according to claim 5.
Further provided with a cumulative unit for accumulating the recognition results,
The output control unit
The recognition accumulated in the cumulative unit indicates which of the first feature amount, the second feature amount, and the third feature amount is used to output the recognition result. Determine based on the results,
The information processing device according to claim 5.
The output control unit
According to the recognition result based on the feature amount calculated using the sampled image of 1, one or more of the imaging information to be acquired later in time series with respect to the frame from which the sampled image was acquired. In the frame, it is determined whether or not at least one of the calculation process by the calculation unit, the accumulation process by the storage unit, and the recognition process by the recognition unit needs to be executed.
The information processing device according to claim 1.
The output control unit
Of the captured information, the The calculation process by the calculation unit, the accumulation process by the storage unit, and the recognition process by the recognition unit in one or more frames acquired later in time series with respect to the frame from which the sampled image was acquired. To determine the necessity of executing at least one of the processes,
The information processing device according to claim 1.
The recognition unit
The recognition process is performed based on the integrated feature amount that integrates the plurality of feature amounts accumulated in the storage unit.
The information processing device according to claim 1.
The recognition unit
The feature amount calculated by the calculation unit in response to the acquisition of the imaging information is integrated with at least a part of the feature amount accumulated in the storage unit by the time immediately before the acquisition, and the integrated feature is integrated. Perform the recognition process based on the amount.
The information processing device according to claim 11.
The recognition unit
Based on the teacher data for each pixel corresponding to the pixel position in each of the divided regions, the recognition process for the feature amount of the sampled image is performed.
The information processing device according to claim 1.
The recognition unit
Among the imaging information, the sampling pixels set in the first imaging information and the sampling pixels set in the second imaging information acquired next to the first imaging information in time series are included. Machine learning processing is executed by the used RNN (Recurrent Neural Network), and the recognition processing is performed based on the result of the machine learning processing.
The information processing device according to claim 1.
Performed by Prossa,
A generation step of generating a sampling image composed of sampling pixels obtained according to a pixel position set for each division area in which imaging information composed of pixels is divided in a predetermined pattern, and a generation step.
A calculation step for calculating the feature amount of the sampled image and
Accumulation steps for accumulating the calculated features and
A recognition step that performs recognition processing based on at least a part of the features accumulated by the accumulation step and outputs a recognition result.
An output control step that controls the recognition step to output the recognition result based on a predetermined feature amount among the feature amounts accumulated by the accumulation step.
Information processing method having.
A generation step of generating a sampling image composed of sampling pixels obtained according to a pixel position set for each division area in which imaging information composed of pixels is divided in a predetermined pattern, and a generation step.
A calculation step for calculating the feature amount of the sampled image and
Accumulation steps for accumulating the calculated features and
A recognition step that performs recognition processing based on at least a part of the features accumulated by the accumulation step and outputs a recognition result.
An output control step that controls the recognition step to output the recognition result based on a predetermined feature amount among the feature amounts accumulated by the accumulation step.
An information processing program that allows a computer to execute.